llvm/docs/OpaquePointers.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301

===============
Opaque Pointers
===============

The Opaque Pointer Type
=======================

Traditionally, LLVM IR pointer types have contained a pointee type. For example,
``i32*`` is a pointer that points to an ``i32`` somewhere in memory. However,
due to a lack of pointee type semantics and various issues with having pointee
types, there is a desire to remove pointee types from pointers.

The opaque pointer type project aims to replace all pointer types containing
pointee types in LLVM with an opaque pointer type. The new pointer type is
represented textually as ``ptr``.

Some instructions still need to know what type to treat the memory pointed to by
the pointer as. For example, a load needs to know how many bytes to load from
memory and what type to treat the resulting value as. In these cases,
instructions themselves contain a type argument. For example the load
instruction from older versions of LLVM

.. code-block:: llvm

  load i64* %p

becomes

.. code-block:: llvm

  load i64, ptr %p

Address spaces are still used to distinguish between different kinds of pointers
where the distinction is relevant for lowering (e.g. data vs function pointers
have different sizes on some architectures). Opaque pointers are not changing
anything related to address spaces and lowering. For more information, see
`DataLayout <LangRef.html#langref-datalayout>`_. Opaque pointers in non-default
address space are spelled ``ptr addrspace(N)``.

This was proposed all the way back in
`2015 <https://lists.llvm.org/pipermail/llvm-dev/2015-February/081822.html>`_.

Issues with explicit pointee types
==================================

LLVM IR pointers can be cast back and forth between pointers with different
pointee types. The pointee type does not necessarily represent the actual
underlying type in memory. In other words, the pointee type carries no real
semantics.

Historically LLVM was some sort of type-safe subset of C. Having pointee types
provided an extra layer of checks to make sure that the Clang frontend matched
its frontend values/operations with the corresponding LLVM IR. However, as other
languages like C++ adopted LLVM, the community realized that pointee types were
more of a hindrance for LLVM development and that the extra type checking with
some frontends wasn't worth it.

LLVM's type system was `originally designed
<https://llvm.org/pubs/2003-05-01-GCCSummit2003.html>`_ to support high-level
optimization. However, years of LLVM implementation experience have demonstrated
that the pointee type system design does not effectively support
optimization. Memory optimization algorithms, such as SROA, GVN, and AA,
generally need to look through LLVM's struct types and reason about the
underlying memory offsets. The community realized that pointee types hinder LLVM
development, rather than helping it. Some of the initially proposed high-level
optimizations have evolved into `TBAA
<https://llvm.org/docs/LangRef.html#tbaa-metadata>`_ due to limitations with
representing higher-level language information directly via SSA values.

Pointee types provide some value to frontends because the IR verifier uses types
to detect straightforward type confusion bugs. However, frontends also have to
deal with the complexity of inserting bitcasts everywhere that they might be
required. The community consensus is that the costs of pointee types
outweight the benefits, and that they should be removed.

Many operations do not actually care about the underlying type. These
operations, typically intrinsics, usually end up taking an arbitrary pointer
type ``i8*`` and sometimes a size. This causes lots of redundant no-op bitcasts
in the IR to and from a pointer with a different pointee type.

No-op bitcasts take up memory/disk space and also take up compile time to look
through. However, perhaps the biggest issue is the code complexity required to
deal with bitcasts. When looking up through def-use chains for pointers it's
easy to forget to call `Value::stripPointerCasts()` to find the true underlying
pointer obfuscated by bitcasts. And when looking down through def-use chains
passes need to iterate through bitcasts to handle uses. Removing no-op pointer
bitcasts prevents a category of missed optimizations and makes writing LLVM
passes a little bit easier.

Fewer no-op pointer bitcasts also reduces the chances of incorrect bitcasts in
regards to address spaces. People maintaining backends that care a lot about
address spaces have complained that frontends like Clang often incorrectly
bitcast pointers, losing address space information.

An analogous transition that happened earlier in LLVM is integer signedness.
Currently there is no distinction between signed and unsigned integer types, but
rather each integer operation (e.g. add) contains flags to signal how to treat
the integer. Previously LLVM IR distinguished between unsigned and signed
integer types and ran into similar issues of no-op casts. The transition from
manifesting signedness in types to instructions happened early on in LLVM's
timeline to make LLVM easier to work with.

Opaque Pointers Mode
====================

During the transition phase, LLVM can be used in two modes: In typed pointer
mode all pointer types have a pointee type and opaque pointers cannot be used.
In opaque pointers mode (the default), all pointers are opaque. The opaque
pointer mode can be disabled using ``-opaque-pointers=0`` in
LLVM tools like ``opt``, or ``-Xclang -no-opaque-pointers`` in clang.
Additionally, opaque pointer mode is automatically disabled for IR and bitcode
files that explicitly mention ``i8*`` style typed pointers.

In opaque pointer mode, all typed pointers used in IR, bitcode, or created
using ``PointerType::get()`` and similar APIs are automatically converted into
opaque pointers. This simplifies migration and allows testing existing IR with
opaque pointers.

.. code-block:: llvm

   define i8* @test(i8* %p) {
     %p2 = getelementptr i8, i8* %p, i64 1
     ret i8* %p2
   }

   ; Is automatically converted into the following if -opaque-pointers
   ; is enabled:

   define ptr @test(ptr %p) {
     %p2 = getelementptr i8, ptr %p, i64 1
     ret ptr %p2
   }

Migration Instructions
======================

In order to support opaque pointers, two types of changes tend to be necessary.
The first is the removal of all calls to ``PointerType::getElementType()`` and
``Type::getPointerElementType()``.

In the LLVM middle-end and backend, this is usually accomplished by inspecting
the type of relevant operations instead. For example, memory access related
analyses and optimizations should use the types encoded in the load and store
instructions instead of querying the pointer type.

Here are some common ways to avoid pointer element type accesses:

* For loads, use ``getType()``.
* For stores, use ``getValueOperand()->getType()``.
* Use ``getLoadStoreType()`` to handle both of the above in one call.
* For getelementptr instructions, use ``getSourceElementType()``.
* For calls, use ``getFunctionType()``.
* For allocas, use ``getAllocatedType()``.
* For globals, use ``getValueType()``.
* For consistency assertions, use
  ``PointerType::isOpaqueOrPointeeTypeEquals()``.
* To create a pointer type in a different address space, use
  ``PointerType::getWithSamePointeeType()``.
* To check that two pointers have the same element type, use
  ``PointerType::hasSameElementTypeAs()``.
* While it is preferred to write code in a way that accepts both typed and
  opaque pointers, ``Type::isOpaquePointerTy()`` and
  ``PointerType::isOpaque()`` can be used to handle opaque pointers specially.
  ``PointerType::getNonOpaquePointerElementType()`` can be used as a marker in
  code-paths where opaque pointers have been explicitly excluded.
* To get the type of a byval argument, use ``getParamByValType()``. Similar
  method exists for other ABI-affecting attributes that need to know the
  element type, such as byref, sret, inalloca and preallocated.
* Some intrinsics require an ``elementtype`` attribute, which can be retrieved
  using ``getParamElementType()``. This attribute is required in cases where
  the intrinsic does not naturally encode a needed element type. This is also
  used for inline assembly.

Note that some of the methods mentioned above only exist to support both typed
and opaque pointers at the same time, and will be dropped once the migration
has completed. For example, ``isOpaqueOrPointeeTypeEquals()`` becomes
meaningless once all pointers are opaque.

While direct usage of pointer element types is immediately apparent in code,
there is a more subtle issue that opaque pointers need to contend with: A lot
of code assumes that pointer equality also implies that the used load/store
type or GEP source element type is the same. Consider the following examples
with typed an opaque pointers:

.. code-block:: llvm

    define i32 @test(i32* %p) {
      store i32 0, i32* %p
      %bc = bitcast i32* %p to i64*
      %v = load i64, i64* %bc
      ret i64 %v
    }

    define i32 @test(ptr %p) {
      store i32 0, ptr %p
      %v = load i64, ptr %p
      ret i64 %v
    }

Without opaque pointers, a check that the pointer operand of the load and
store are the same also ensures that the accessed type is the same. Using a
different type requires a bitcast, which will result in distinct pointer
operands.

With opaque pointers, the bitcast is not present, and this check is no longer
sufficient. In the above example, it could result in store to load forwarding
of an incorrect type. Code making such assumptions needs to be adjusted to
check the accessed type explicitly:
``LI->getType() == SI->getValueOperand()->getType()``.

Frontends
---------

Frontends need to be adjusted to track pointee types independently of LLVM,
insofar as they are necessary for lowering. For example, clang now tracks the
pointee type in the ``Address`` structure.

Frontends using the C API through an FFI interface should be aware that a
number of C API functions are deprecated and will be removed as part of the
opaque pointer transition::

    LLVMBuildLoad -> LLVMBuildLoad2
    LLVMBuildCall -> LLVMBuildCall2
    LLVMBuildInvoke -> LLVMBuildInvoke2
    LLVMBuildGEP -> LLVMBuildGEP2
    LLVMBuildInBoundsGEP -> LLVMBuildInBoundsGEP2
    LLVMBuildStructGEP -> LLVMBuildStructGEP2
    LLVMBuildPtrDiff -> LLVMBuildPtrDiff2
    LLVMConstGEP -> LLVMConstGEP2
    LLVMConstInBoundsGEP -> LLVMConstInBoundsGEP2
    LLVMAddAlias -> LLVMAddAlias2

Additionally, it will no longer be possible to call ``LLVMGetElementType()``
on a pointer type.

It is possible to control whether opaque pointers are used (if you want to
override the default) using ``LLVMContext::setOpaquePointers``.

Temporarily disabling opaque pointers
=====================================

In LLVM 15, opaque pointers are enabled by default, but it it still possible to
use typed pointers using a number of opt-in flags.

For users of the clang driver interface, it is possible to temporarily restore
the old default using the ``-DCLANG_ENABLE_OPAQUE_POINTERS=OFF`` cmake option,
or by passing ``-Xclang -no-opaque-pointers`` to a single clang invocation.

For users of the clang cc1 interface, ``-no-opaque-pointers`` can be passed.
Note that the ``CLANG_ENABLE_OPAQUE_POINTERS`` cmake option has no effect on
the cc1 interface.

Usage for LTO can be disabled by passing ``-Wl,-plugin-opt=no-opaque-pointers``
to the clang driver.

For users of LLVM as a library, opaque pointers can be disabled by calling
``setOpaquePointers(false)`` on the ``LLVMContext``.

For users of LLVM tools like opt, opaque pointers can be disabled by passing
``-opaque-pointers=0``.

Version Support
===============

**LLVM 14:** Supports all necessary APIs for migrating to opaque pointers and deprecates/removes incompatible APIs. However, using opaque pointers in the optimization pipeline is **not** fully supported. This release can be used to make out-of-tree code compatible with opaque pointers, but opaque pointers should **not** be enabled in production.

**LLVM 15:** Opaque pointers are enabled by default. Typed pointers are still
supported.

**LLVM 16:** Opaque pointers are enabled by default. Typed pointers are
supported on a best-effort basis only and not tested.

**LLVM 17:** Only opaque pointers are supported. Typed pointers are not
supported.

Transition State
================

As of January 2023:

Typed pointers are **not** supported on the ``main`` branch as a matter of
policy. Fixes for typed pointer support are not accepted. Typed pointer
support code may be removed without notice at any time.

However, tests are still in the process of being converted to opaque pointers.
As such, care must be taken when actively removing typed pointer support, to
avoid breaking remaining tests.

The following typed pointer functionality has already been removed:

* The ``CLANG_ENABLE_OPAQUE_POINTERS`` cmake flag is no longer supported.
* C APIs that do not support opaque pointers (like ``LLVMBuildLoad``) are no
  longer supported.
* Typed pointer IR and bitcode is implicitly upgraded to use opaque pointers,
  unless ``-opaque-pointers=0`` is passed.

The following typed pointer functionality is still to be removed:

* The ``-no-opaque-pointers`` cc1 flag, ``-opaque-pointers=0`` opt flag and
  ``-plugin-opt=no-opaque-pointers`` lto flag.
* Support for typed pointers in LLVM libraries.