summaryrefslogtreecommitdiff
path: root/flang/docs/BijectiveInternalNameUniquing.md
blob: 996c490e7e1948114b2e54374c165fc51f29aba9 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
<!--===- docs/Aliasing.md

   Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
   See https://llvm.org/LICENSE.txt for license information.
   SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

-->

# Bijective Internal Name Uniquing

```eval_rst
.. contents::
   :local:
```

FIR has a flat namespace. No two objects may have the same name at the module
level. (These would be functions, globals, etc.) This necessitates some sort
of encoding scheme to unique symbols from the front-end into FIR.

Another requirement is to be able to reverse these unique names and recover
the associated symbol in the symbol table.

Fortran is case insensitive, which allows the compiler to convert the user's
identifiers to all lower case. Such a universal conversion implies that all
upper case letters are available for use in uniquing.

## Prefix `_Q`

All uniqued names have the prefix sequence `_Q` to indicate the name has been
uniqued. (Q is chosen because it is a [low frequency letter](http://pi.math.cornell.edu/~mec/2003-2004/cryptography/subs/frequencies.html)
in English.)

## Scope Building

Symbols are scoped by any module, submodule, procedure, and block that
contains that symbol. After the `_Q` sigil, names are constructed from
outermost to innermost scope as

   * Module name prefixed with `M`
   * Submodule name/s prefixed with `S`
   * Procedure name/s prefixed with `F`
   * Innermost block index prefixed with `B`

Given:
```
    submodule (mod:s1mod) s2mod
      ...
      subroutine sub
        ...
      contains
        function fun
```

The uniqued name of `fun` becomes:
```
    _QMmodSs1modSs2modFsubPfun
```

## Prefix tag summary

| Tag | Description
| ----| --------------------------------------------------------- |
| B   | Block ("name" is a compiler generated integer index)
| C   | Common block
| D   | Dispatch table (compiler internal)
| E   | variable Entity
| EC  | Constant Entity
| F   | procedure/Function (as a prefix)
| K   | Kind
| KN  | Negative Kind
| M   | Module
| N   | Namelist group
| P   | Procedure/function (as itself)
| Q   | uniQue mangled name tag
| S   | Submodule
| T   | derived Type
| Y   | tYpe descriptor (compiler internal)
| YI  | tYpe descriptor for an Intrinsic type (compiler internal)

## Common blocks

   * A common block name will be prefixed with `C`

Given:
```
   common /work/ i, j
```

The uniqued name of `work` becomes:
```
    _QCwork
```

Given:
```
   common i, j
```

The uniqued name in case of `blank common block` becomes:
```
    _QC
```

## Module scope global data

   * A global data entity is prefixed with `E`
   * A global entity that is constant (parameter) will be prefixed with `EC`

Given:
```
    module mod
      integer :: intvar
      real, parameter :: pi = 3.14
    end module
```

The uniqued name of `intvar` becomes:
```
    _QMmodEintvar
```

The uniqued name of `pi` becomes:
```
    _QMmodECpi
```

## Procedures

   * A procedure/subprogram as itself is prefixed with `P`
   * A procedure/subprogram as an ancestor name is prefixed with `F`

Procedures are the only names that are themselves uniqued, as well as
appearing as a prefix component of other uniqued names.

Given:
```
    subroutine sub
      real, save :: x(1000)
      ...
```
The uniqued name of `sub` becomes:
```
    _QPsub
```
The uniqued name of `x` becomes:
```
    _QFsubEx
```

## Blocks

   * A block is prefixed with `B`; the block "name" is a compiler generated
     index

Each block has a per-procedure preorder index. The prefix for the immediately
containing block construct is unique within the procedure.

Given:
```
    subroutine sub
    block
      block
        real, save :: x(1000)
        ...
      end block
      ...
    end block
```
The uniqued name of `x` becomes:
```
    _QFsubB2Ex
```

## Namelist groups

   * A namelist group is prefixed with `N`

Given:
```
    subroutine sub
      real, save :: x(1000)
      namelist /temps/ x
      ...
```
The uniqued name of `temps` becomes:
```
    _QFsubNtemps
```

## Derived types

   * A derived type is prefixed with `T`
   * If a derived type has KIND parameters, they are listed in a consistent
     canonical order where each takes the form `Ki` and where _i_ is the
     compile-time constant value. (All type parameters are integer.)  If _i_
     is a negative value, the prefix `KN` will be used and _i_ will reflect
     the magnitude of the value.

Given:
```
    module mymodule
      type mytype
        integer :: member
      end type
      ...
```
The uniqued name of `mytype` becomes:
```
    _QMmymoduleTmytype
```

Given:
```
    type yourtype(k1,k2)
      integer, kind :: k1, k2
      real :: mem1
      complex :: mem2
    end type
```

The uniqued name of `yourtype` where `k1=4` and `k2=-6` (at compile-time):
```
    _QTyourtypeK4KN6
```

   * A derived type dispatch table is prefixed with `D`. The dispatch table
     for `type t` would be `_QDTt`
   * A type descriptor instance is prefixed with `C`. Intrinsic types can
     be encoded with their names and kinds. The type descriptor for the
     type `yourtype` above would be `_QCTyourtypeK4KN6`. The type
     descriptor for `REAL(4)` would be `_QCrealK4`.

## Compiler internal names

Compiler generated names do not have to be mapped back to Fortran. This
includes names prefixed with `_QQ`, tag `D` for a type bound procedure
dispatch table, and tags `Y` and `YI` for runtime type descriptors.