summaryrefslogtreecommitdiff
path: root/docs/porting.md
blob: 6025e53fcf0d1c22114d47af53859b831d409469 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
# Conservative Garbage Collector Porting Directions

The collector is designed to be relatively easy to port, but is not portable
code per se. The collector inherently has to perform operations, such
as scanning the stack(s), that are not possible in portable C code.

All of the following assumes that the collector is being ported to
a byte-addressable 32- or 64-bit machine. Currently all successful ports
to 64-bit machines involve LP64 and LLP64 targets (notably Win64). You
are hereby discouraged from attempting a port to non-byte-addressable,
or 8-bit, or 16-bit machines.

The difficulty of porting the collector varies greatly depending on the needed
functionality. In the simplest case, only some small additions are needed for
the `include/private/gcconfig.h` file. This is described in the following
section. Later sections discuss some of the optional features, which typically
involve more porting effort.

Note that the collector makes heavy use of `ifdef`s. Unlike some other
software projects, we have concluded repeatedly that this is preferable
to system dependent files, with code duplicated between the files. However,
to keep this manageable, we do strongly believe in indenting `ifdef`s
correctly (for historical reasons usually without the leading sharp sign).
(Separate source files are of course fine if they do not result in code
duplication.)

## Adding Platforms to gcconfig.h

If neither thread support, nor tracing of dynamic library data is required,
these are often the only changes you will need to make.

The `gcconfig.h` file consists of three sections:

  1. A section that defines GC-internal macros that identify the architecture
  (e.g. `IA64` or `I386`) and operating system (e.g. `LINUX` or `MSWIN32`).
  This is usually done by testing predefined macros. By defining our own
  macros instead of using the predefined ones directly, we can impose a bit
  more consistency, and somewhat isolate ourselves from compiler differences.
  It is relatively straightforward to add a new entry here. But please try
  to be consistent with the existing code. In particular, 64-bit variants
  of 32-bit architectures general are _not_ treated as a new architecture.
  Instead we explicitly test for 64-bit-ness in the few places in which
  it matters. (The notable exception here is `I386` and `X86_64`. This
  is partially historical, and partially justified by the fact that there are
  arguably more substantial architecture and ABI differences here than for
  RISC variants.) On GNU-based systems, `cpp -dM empty_source_file.c` seems
  to generate a set of predefined macros. On some other systems, the "verbose"
  compiler option may do so, or the manual page may list them.

  2. A section that defines a small number of platform-specific macros, which
  are then used directly by the collector. For simple ports, this is where
  most of the effort is required. We describe the macros below. This section
  contains a subsection for each architecture (enclosed in a suitable `ifdef`.
  Each subsection usually contains some architecture-dependent defines,
  followed by several sets of OS-dependent defines, again enclosed in
  `ifdef`s.

  3. A section that fills in defaults for some macros left undefined in the
  preceding section, and defines some other macros that rarely need adjustment
  for new platforms. You will typically not have to touch these. If you are
  porting to an OS that was previously completely unsupported, it is likely
  that you will need to add another clause to the definition of `GET_MEM`.

The following macros must be defined correctly for each architecture and
operating system:

  * `MACH_TYPE` - Defined to a string that represents the machine
  architecture. Usually just the macro name used to identify the architecture,
  but enclosed in quotes.
  * `OS_TYPE` - Defined to a string that represents the operating system name.
  Usually just the macro name used to identify the operating system, but
  enclosed in quotes.
  * `CPP_WORDSZ` - The word size in bits as a constant suitable for
  preprocessor tests, i.e. without casts or `sizeof` expressions. Currently
  always defined as either 64 or 32. For platforms supporting both 32- and
  64-bit ABIs, this should be conditionally defined depending on the current
  ABI. There is a default of 32.
  * `ALIGNMENT` - Defined to be the largest _N_ such that all pointer
  are guaranteed to be aligned on _N_-byte boundaries. Defining it to be _1_
  will always work, but perform poorly. For all modern 32-bit platforms, this
  is 4. For all modern 64-bit platforms, this is 8. Whether or not x86
  qualifies as a modern architecture here is compiler- and OS-dependent.
  * `DATASTART` - The beginning of the main data segment. The collector will
  trace all memory between `DATASTART` and `DATAEND` for root pointers.
  On some platforms, this can be defined to a constant address, though
  experience has shown that to be risky. Ideally the linker will define
  a symbol (e.g. `_data`) whose address is the beginning of the data segment.
  Sometimes the value can be computed using the `GC_SysVGetDataStart`
  function. Not used if either the next macro is defined, or if dynamic
  loading is supported, and the dynamic loading support defines a function
  `GC_register_main_static_data` which returns false.
  * `SEARCH_FOR_DATA_START` - If this is defined `DATASTART` will be defined
  to a dynamically computed value which is obtained by starting with the
  address of `_end` and walking backwards until non-addressable memory
  is found. This often works on Posix-like platforms. It makes it harder
  to debug client programs, since startup involves generating and catching
  a segmentation fault, which tends to confuse users.
  * `DATAEND` - Set to the end of the main data segment. Defaults to `_end`,
  where that is declared as an array. This works in some cases, since the
  linker introduces a suitable symbol.
  * `DATASTART2`, `DATAEND2` - Some platforms have two discontiguous main data
  segments, e.g. for initialized and uninitialized data. If so, these two
  macros should be defined to the limits of the second main data segment.
  * `STACK_GROWS_UP` - Should be defined if the stack (or thread stacks) grow
  towards higher addresses. (This appears to be true only on PA-RISC. If your
  architecture has more than one stack per thread, and is not supported yet,
  you will need to do more work. Grep for "IA64" in the source for an
  example.)
  * `STACKBOTTOM` - Defined to be the cold end of the stack, which is usually
  (i.e. when the stacks grow down) the highest address in the stack. It must
  bound the region of the stack that contains pointers into the GC heap. With
  thread support, this must be the cold end of the main stack, which typically
  cannot be found in the same way as the other thread stacks. If this is not
  defined and none of the following three macros is defined, client code must
  explicitly set `GC_stackbottom` to an appropriate value before calling
  `GC_INIT` or any other `GC_` routine.
  * `LINUX_STACKBOTTOM` - May be defined instead of `STACKBOTTOM`. If defined,
  then the cold end of the stack will be determined, we usually read it from
  `/proc`.
  * `HEURISTIC1` - May be defined instead of `STACKBOTTOM`. `STACK_GRAN`
  should generally also be redefined. The cold end of the stack is determined
  by taking an address inside `GC_init`s frame, and rounding it up to the next
  multiple of `STACK_GRAN`. This works well if the stack bottom is always
  aligned to a large power of two. (`STACK_GRAN` is predefined to 0x1000000,
  which is rarely optimal.)
  * `HEURISTIC2` - May be defined instead of `STACKBOTTOM`. The cold end
  of the stack is determined by taking an address inside `GC_init`s frame,
  incrementing it repeatedly in small steps (decrement if `STACK_GROWS_UP`),
  and reading the value at each location. We remember the value when the first
  Segmentation violation or Bus error is signaled, round that to the nearest
  plausible page boundary, and use that as the stack bottom.
  * `DYNAMIC_LOADING` - Should be defined if `dyn_load.c` has been updated for
  this platform and tracing of dynamic library roots is supported.
  * `GWW_VDB`, `MPROTECT_VDB`, `PROC_VDB`, `SOFT_VDB` - May be defined if the
  corresponding _virtual dirty bit_ implementation in `os_dep.c` is usable on
  this platform. This allows incremental/generational garbage collection.
  (`GWW_VDB` uses the Win32 `GetWriteWatch` function to read dirty bits,
  `MPROTECT_VDB` identifies modified pages by write protecting the heap and
  catching faults. `PROC_VDB` and `SOFT_VDB` use the /proc pseudo-files to
  read dirty bits.)
  * `PREFETCH`, `GC_PREFETCH_FOR_WRITE` - The collector uses `PREFETCH(x)`
  to preload the cache with the data at _x_ address. This defaults to a no-op.
  * `CLEAR_DOUBLE` - If `CLEAR_DOUBLE` is defined, then `CLEAR_DOUBLE(x)`
  is used as a fast way to clear the two words at `GC_malloc`-aligned address
  _x_. By default, word stores of 0 are used instead.
  * `HEAP_START` - May be defined as the initial address hint for mmap-based
  allocation.

## Additional requirements for a basic port

In some cases, you may have to add additional platform-specific code to other
files. A likely candidate is the implementation
of `GC_with_callee_saves_pushed` in `mach_dep.c`. This ensure that register
contents that the collector must trace from are copied to the stack. Typically
this can be done portably, but on some platforms it may require assembly code,
or just tweaking of conditional compilation tests.

If your platform supports `getcontext` then defining the macro
`UNIX_LIKE` for your OS in `gcconfig.h` (if it is not defined there yet)
is likely to solve the problem. Otherwise, if you are using gcc,
`_builtin_unwind_init` will be used, and should work fine. If that is not
applicable either, the implementation will try to use `setjmp`. This will work
if your `setjmp` implementation saves all possibly pointer-valued registers
into the buffer, as opposed to trying to unwind the stack at `longjmp` time.
The `setjmp_test` test tries to determine this, but often does not get it
right. Registers tracing handled with an assembly code is generally to be
avoided.

Most commonly `os_dep.c` will not require attention, but see below.

## Thread support

Supporting threads requires that the collector be able to find and suspend all
threads potentially accessing the garbage-collected heap, and locate any state
associated with each thread that must be traced.

The functionality needed for thread support is generally implemented in one or
more files specific to the particular thread interface. For example, somewhat
portable pthread support is implemented in `pthread_support.c` and
`pthread_stop_world.c`. The essential functionality consists of:

  * `GC_stop_world` - Stops all threads which may access the garbage collected
  heap, other than the caller;
  * `GC_start_world` - Restart other threads;
  * `GC_push_all_stacks` - Push the contents of all thread stacks (or,
  at least, of pointer-containing regions in the thread stacks) onto the mark
  stack.

These very often require that the garbage collector maintain its own data
structures to track active threads.

In addition, `LOCK` and `UNLOCK` must be implemented in `gc_locks.h`.

The easiest case is probably a new pthreads platform on which threads can be
stopped with signals. In this case, the changes involve:

  1. Introducing a suitable `GC_xxx_THREADS` macro, which should
  be automatically defined by `gc_config_macros.h` in the right cases.
  It should also result in a definition of `GC_PTHREADS`, as for the existing
  cases.
  2. Ensuring that the `atomic_ops` package at least minimally
  supports the platform. If incremental GC is needed, or if pthread locks
  do not perform adequately as the allocation lock, you will probably need
  to ensure that a sufficient `atomic_ops` port exists for the platform
  to provided an atomic test and set operation. The latest GC code can use
  GCC atomic intrinsics instead of `atomic_ops` package (see
  `include/private/gc_atomic_ops.h`).
  3. Making any needed adjustments to `pthread_stop_world.c` and
  `pthread_support.c`. Ideally none should be needed. In fact, not all of this
  is as well standardized as one would like, and outright bugs requiring
  workarounds are common.  Non-preemptive threads packages will probably
  require further work. Similarly thread-local allocation and parallel marking
  requires further work in `pthread_support.c`, and may require better
  `atomic_ops` support for the designed platform.

## Dynamic library support

So long as `DATASTART` and `DATAEND` are defined correctly, the collector will
trace memory reachable from file scope or `static` variables defined as part
of the main executable. This is sufficient if either the program is statically
linked, or if pointers to the garbage-collected heap are never stored
in non-stack variables defined in dynamic libraries.

If dynamic library data sections must also be traced, then:

  * `DYNAMIC_LOADING` must be defined in the appropriate section of
  `gcconfig.h`.
  * An appropriate versions of the functions `GC_register_dynamic_libraries`
  should be defined in `dyn_load.c`. This function should invoke
  `GC_cond_add_roots(region_start, region_end, TRUE)` on each dynamic
  library data section.

Implementations that scan for writable data segments are error prone,
particularly in the presence of threads. They frequently result in race
conditions when threads exit and stacks disappear. They may also accidentally
trace large regions of graphics memory, or mapped files. On at least one
occasion they have been known to try to trace device memory that could not
safely be read in the manner the GC wanted to read it.

It is usually safer to walk the dynamic linker data structure, especially
if the linker exports an interface to do so. But beware of poorly documented
locking behavior in this case.

## Incremental GC support

For incremental and generational collection to work, `os_dep.c` must contain
a suitable _virtual dirty bit_ implementation, which allows the collector
to track which heap pages (assumed to be a multiple of the collector's block
size) have been written during a certain time interval. The collector provides
several implementations, which might be adapted. The default (`DEFAULT_VDB`)
is a placeholder which treats all pages as having been written. This ensures
correctness, but renders incremental and generational collection essentially
useless.

## Stack traces for debug support

If stack traces in objects are needed for debug support, `GC_save_callers` and
`GC_print_callers` must be implemented.

## Disclaimer

This is an initial pass at porting guidelines. Some things have no doubt been
overlooked.