From 966a86268d441ec00e098057bc8cecbd7d4da30b Mon Sep 17 00:00:00 2001 From: Ivan Maidanski Date: Tue, 16 May 2017 15:15:55 +0300 Subject: Rename doc/README.txt to doc/README_details.txt This is to differentiate from README.md when the documents are installed. * doc/Makefile.am (dist_doc_DATA): Rename README.txt item to README_details.txt. * doc/README.txt: Rename to README_details.txt. --- doc/Makefile.am | 2 +- doc/README.txt | 249 ------------------------------------------------- doc/README_details.txt | 249 +++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 250 insertions(+), 250 deletions(-) delete mode 100644 doc/README.txt create mode 100644 doc/README_details.txt diff --git a/doc/Makefile.am b/doc/Makefile.am index b4a1bb6..66af2bd 100644 --- a/doc/Makefile.am +++ b/doc/Makefile.am @@ -1,4 +1,4 @@ # installed documentation # -dist_doc_DATA = LICENSING.txt README.txt README_stack.txt \ +dist_doc_DATA = LICENSING.txt README_details.txt README_stack.txt \ README_malloc.txt README_win32.txt diff --git a/doc/README.txt b/doc/README.txt deleted file mode 100644 index 2b96cd2..0000000 --- a/doc/README.txt +++ /dev/null @@ -1,249 +0,0 @@ -Usage: - -0) If possible, do this on a multiprocessor, especially if you are planning -on modifying or enhancing the package. It will work on a uniprocessor, -but the tests are much more likely to pass in the presence of serious problems. - -1) Type ./configure --prefix=; make; make check -in the directory containing unpacked source. The usual GNU build machinery -is used, except that only static, but position-independent, libraries -are normally built. On Windows, read README_win32.txt instead. - -2) Applications should include atomic_ops.h. Nearly all operations -are implemented by header files included from it. It is sometimes -necessary, and always recommended to also link against libatomic_ops.a. -To use the almost non-blocking stack or malloc implementations, -see the corresponding README files, and also link against libatomic_gpl.a -before linking against libatomic_ops.a. - -OVERVIEW: -Atomic_ops.h defines a large collection of operations, each one of which is -a combination of an (optional) atomic memory operation, and a memory barrier. -Also defines associated feature-test macros to determine whether a particular -operation is available on the current target hardware (either directly or -by synthesis). This is an attempt to replace various existing files with -similar goals, since they usually do not handle differences in memory -barrier styles with sufficient generality. - -If this is included after defining AO_REQUIRE_CAS, then the package makes -an attempt to emulate AO_compare_and_swap* (single-width) in a way that (at -least on Linux) should still be async-signal-safe. As a result, most other -atomic operations will then be defined using the compare-and-swap -emulation. This emulation is slow, since it needs to disable signals. -And it needs to block in case of contention. If you care about performance -on a platform that can't directly provide compare-and-swap, there are -probably better alternatives. But this allows easy ports to some such -platforms (e.g. PA_RISC). The option is ignored if compare-and-swap -can be implemented directly. - -If atomic_ops.h is included after defining AO_USE_PTHREAD_DEFS, then all -atomic operations will be emulated with pthread locking. This is NOT -async-signal-safe. And it is slow. It is intended primarily for debugging -of the atomic_ops package itself. - -Note that the implementation reflects our understanding of real processor -behavior. This occasionally diverges from the documented behavior. (E.g. -the documented X86 behavior seems to be weak enough that it is impractical -to use. Current real implementations appear to be much better behaved.) -We of course are in no position to guarantee that future processors -(even HPs) will continue to behave this way, though we hope they will. - -This is a work in progress. Corrections/additions for other platforms are -greatly appreciated. It passes rudimentary tests on X86, Itanium, and -Alpha. - -OPERATIONS: - -Most operations operate on values of type AO_t, which are unsigned integers -whose size matches that of pointers on the given architecture. Exceptions -are: - -- AO_test_and_set operates on AO_TS_t, which is whatever size the hardware -supports with good performance. In some cases this is the length of a cache -line. In some cases it is a byte. In many cases it is equivalent to AO_t. - -- A few operations are implemented on smaller or larger size integers. -Such operations are indicated by the appropriate prefix: - -AO_char_... Operates on unsigned char values. -AO_short_... Operates on unsigned short values. -AO_int_... Operates on unsigned int values. - -(Currently a very limited selection of these is implemented. We're -working on it.) - -The defined operations are all of the form AO_[_](). - -The component specifies an atomic memory operation. It may be -one of the following, where the corresponding argument and result types -are also specified: - -void nop() - No atomic operation. The barrier may still be useful. -AO_t load(const volatile AO_t * addr) - Atomic load of *addr. -void store(volatile AO_t * addr, AO_t new_val) - Atomically store new_val to *addr. -AO_t fetch_and_add(volatile AO_t *addr, AO_t incr) - Atomically add incr to *addr, and return the original value of *addr. -AO_t fetch_and_add1(volatile AO_t *addr) - Equivalent to AO_fetch_and_add(addr, 1). -AO_t fetch_and_sub1(volatile AO_t *addr) - Equivalent to AO_fetch_and_add(addr, (AO_t)(-1)). -void and(volatile AO_t *addr, AO_t value) - Atomically 'and' value into *addr. -void or(volatile AO_t *addr, AO_t value) - Atomically 'or' value into *addr. -void xor(volatile AO_t *addr, AO_t value) - Atomically 'xor' value into *addr. -int compare_and_swap(volatile AO_t * addr, AO_t old_val, AO_t new_val) - Atomically compare *addr to old_val, and replace *addr by new_val - if the first comparison succeeds. Returns nonzero if the comparison - succeeded and *addr was updated. -AO_t fetch_compare_and_swap(volatile AO_t * addr, AO_t old_val, AO_t new_val) - Atomically compare *addr to old_val, and replace *addr by new_val - if the first comparison succeeds; returns the original value of *addr. -AO_TS_VAL_t test_and_set(volatile AO_TS_t * addr) - Atomically read the binary value at *addr, and set it. AO_TS_VAL_t - is an enumeration type which includes two values AO_TS_SET and - AO_TS_CLEAR. An AO_TS_t location is capable of holding an - AO_TS_VAL_t, but may be much larger, as dictated by hardware - constraints. Test_and_set logically sets the value to AO_TS_SET. - It may be reset to AO_TS_CLEAR with the AO_CLEAR(AO_TS_t *) macro. - AO_TS_t locations should be initialized to AO_TS_INITIALIZER. - The values of AO_TS_SET and AO_TS_CLEAR are hardware dependent. - (On PA-RISC, AO_TS_SET is zero!) - -Test_and_set is a more limited version of compare_and_swap. Its only -advantage is that it is more easily implementable on some hardware. It -should thus be used if only binary test-and-set functionality is needed. - -If available, we also provide compare_and_swap operations that operate -on wider values. Since standard data types for double width values -may not be available, these explicitly take pairs of arguments for the -new and/or old value. Unfortunately, there are two common variants, -neither of which can easily and efficiently emulate the other. -The first performs a comparison against the entire value being replaced, -where the second replaces a double-width replacement, but performs -a single-width comparison: - -int compare_double_and_swap_double(volatile AO_double_t * addr, - AO_t old_val1, AO_t old_val2, - AO_t new_val1, AO_t new_val2); - -int compare_and_swap_double(volatile AO_double_t * addr, - AO_t old_val1, - AO_t new_val1, AO_t new_val2); - -where AO_double_t is a structure containing AO_val1 and AO_val2 fields, -both of type AO_t. For compare_and_swap_double, we compare against -the val1 field. AO_double_t exists only if AO_HAVE_double_t -is defined. - -Please note that AO_double_t (and AO_stack_t) variables should be properly -aligned (8-byte alignment on 32-bit targets, 16-byte alignment on 64-bit ones) -otherwise the behavior of a double-wide atomic primitive might be undefined -(or an assertion violation might occur) if such a misaligned variable is -passed (as a reference) to the primitive. Global and static variables should -already have proper alignment automatically but automatic variables (i.e. -located on the stack) might be misaligned because the stack might be -word-aligned (e.g. 4-byte stack alignment is the default one for x86). -Luckily, stack-allocated AO variables is a rare case in practice. - -ORDERING CONSTRAINTS: - -Each operation name also includes a suffix that specifies the associated -ordering semantics. The ordering constraint limits reordering of this -operation with respect to other atomic operations and ordinary memory -references. The current implementation assumes that all memory references -are to ordinary cacheable memory; the ordering guarantee is with respect -to other threads or processes, not I/O devices. (Whether or not this -distinction is important is platform-dependent.) - -Ordering suffixes are one of the following: - -: No memory barrier. A plain AO_nop() really does nothing. -_release: Earlier operations must become visible to other threads - before the atomic operation. -_acquire: Later operations must become visible after this operation. -_read: Subsequent reads must become visible after reads included in - the atomic operation or preceding it. Rarely useful for clients? -_write: Earlier writes become visible before writes during or after - the atomic operation. Rarely useful for clients? -_full: Ordered with respect to both earlier and later memory ops. - AO_store_full or AO_nop_full are the normal ways to force a store - to be ordered with respect to a later load. -_release_write: Ordered with respect to earlier writes. This is - normally implemented as either a _write or _release - barrier. -_acquire_read: Ordered with respect to later reads. This is - normally implemented as either a _read or _acquire barrier. -_dd_acquire_read: Ordered with respect to later reads that are data - dependent on this one. This is needed on - a pointer read, which is later dereferenced to read a - second value, with the expectation that the second - read is ordered after the first one. On most architectures, - this is equivalent to no barrier. (This is very - hard to define precisely. It should probably be avoided. - A major problem is that optimizers tend to try to - eliminate dependencies from the generated code, since - dependencies force the hardware to execute the code - serially.) - -We assume that if a store is data-dependent on a previous load, then -the two are always implicitly ordered. - -It is possible to test whether AO_ is available on the -current platform by checking whether AO_HAVE__ is defined -as a macro. - -Note that we generally don't implement operations that are either -meaningless (e.g. AO_nop_acquire, AO_nop_release) or which appear to -have no clear use (e.g. AO_load_release, AO_store_acquire, AO_load_write, -AO_store_read). On some platforms (e.g. PA-RISC) many operations -will remain undefined unless AO_REQUIRE_CAS is defined before including -the package. - -When typed in the package build directory, the following command -will print operations that are unimplemented on the platform: - -make test_atomic; ./test_atomic - -The following command generates a file "list_atomic.i" containing the -macro expansions of all implemented operations on the platform: - -make list_atomic.i - -Known issues include: - -We should be more precise in defining the semantics of the ordering -constraints, and if and how we can guarantee sequential consistency. - -Dd_acquire_read is very hard or impossible to define in a way that cannot -be invalidated by reasonably standard compiler transformations. - -There is probably no good reason to provide operations on standard -integer types, since those may have the wrong alignment constraints. - - -Example: - -If you want to initialize an object, and then "publish" a pointer to it -in a global location p, such that other threads reading the new value of -p are guaranteed to see an initialized object, it suffices to use -AO_release_write(p, ...) to write the pointer to the object, and to -retrieve it in other threads with AO_acquire_read(p). - -Platform notes: - -All X86: We quietly assume 486 or better. - -Microsoft compilers: -Define AO_ASSUME_WINDOWS98 to get access to hardware compare-and-swap -functionality. This relies on the InterlockedCompareExchange() function -which was apparently not supported in Windows95. (There may be a better -way to get access to this.) - -Gcc on x86: -Define AO_USE_PENTIUM4_INSTRS to use the Pentium 4 mfence instruction. -Currently this is appears to be of marginal benefit. diff --git a/doc/README_details.txt b/doc/README_details.txt new file mode 100644 index 0000000..2b96cd2 --- /dev/null +++ b/doc/README_details.txt @@ -0,0 +1,249 @@ +Usage: + +0) If possible, do this on a multiprocessor, especially if you are planning +on modifying or enhancing the package. It will work on a uniprocessor, +but the tests are much more likely to pass in the presence of serious problems. + +1) Type ./configure --prefix=; make; make check +in the directory containing unpacked source. The usual GNU build machinery +is used, except that only static, but position-independent, libraries +are normally built. On Windows, read README_win32.txt instead. + +2) Applications should include atomic_ops.h. Nearly all operations +are implemented by header files included from it. It is sometimes +necessary, and always recommended to also link against libatomic_ops.a. +To use the almost non-blocking stack or malloc implementations, +see the corresponding README files, and also link against libatomic_gpl.a +before linking against libatomic_ops.a. + +OVERVIEW: +Atomic_ops.h defines a large collection of operations, each one of which is +a combination of an (optional) atomic memory operation, and a memory barrier. +Also defines associated feature-test macros to determine whether a particular +operation is available on the current target hardware (either directly or +by synthesis). This is an attempt to replace various existing files with +similar goals, since they usually do not handle differences in memory +barrier styles with sufficient generality. + +If this is included after defining AO_REQUIRE_CAS, then the package makes +an attempt to emulate AO_compare_and_swap* (single-width) in a way that (at +least on Linux) should still be async-signal-safe. As a result, most other +atomic operations will then be defined using the compare-and-swap +emulation. This emulation is slow, since it needs to disable signals. +And it needs to block in case of contention. If you care about performance +on a platform that can't directly provide compare-and-swap, there are +probably better alternatives. But this allows easy ports to some such +platforms (e.g. PA_RISC). The option is ignored if compare-and-swap +can be implemented directly. + +If atomic_ops.h is included after defining AO_USE_PTHREAD_DEFS, then all +atomic operations will be emulated with pthread locking. This is NOT +async-signal-safe. And it is slow. It is intended primarily for debugging +of the atomic_ops package itself. + +Note that the implementation reflects our understanding of real processor +behavior. This occasionally diverges from the documented behavior. (E.g. +the documented X86 behavior seems to be weak enough that it is impractical +to use. Current real implementations appear to be much better behaved.) +We of course are in no position to guarantee that future processors +(even HPs) will continue to behave this way, though we hope they will. + +This is a work in progress. Corrections/additions for other platforms are +greatly appreciated. It passes rudimentary tests on X86, Itanium, and +Alpha. + +OPERATIONS: + +Most operations operate on values of type AO_t, which are unsigned integers +whose size matches that of pointers on the given architecture. Exceptions +are: + +- AO_test_and_set operates on AO_TS_t, which is whatever size the hardware +supports with good performance. In some cases this is the length of a cache +line. In some cases it is a byte. In many cases it is equivalent to AO_t. + +- A few operations are implemented on smaller or larger size integers. +Such operations are indicated by the appropriate prefix: + +AO_char_... Operates on unsigned char values. +AO_short_... Operates on unsigned short values. +AO_int_... Operates on unsigned int values. + +(Currently a very limited selection of these is implemented. We're +working on it.) + +The defined operations are all of the form AO_[_](). + +The component specifies an atomic memory operation. It may be +one of the following, where the corresponding argument and result types +are also specified: + +void nop() + No atomic operation. The barrier may still be useful. +AO_t load(const volatile AO_t * addr) + Atomic load of *addr. +void store(volatile AO_t * addr, AO_t new_val) + Atomically store new_val to *addr. +AO_t fetch_and_add(volatile AO_t *addr, AO_t incr) + Atomically add incr to *addr, and return the original value of *addr. +AO_t fetch_and_add1(volatile AO_t *addr) + Equivalent to AO_fetch_and_add(addr, 1). +AO_t fetch_and_sub1(volatile AO_t *addr) + Equivalent to AO_fetch_and_add(addr, (AO_t)(-1)). +void and(volatile AO_t *addr, AO_t value) + Atomically 'and' value into *addr. +void or(volatile AO_t *addr, AO_t value) + Atomically 'or' value into *addr. +void xor(volatile AO_t *addr, AO_t value) + Atomically 'xor' value into *addr. +int compare_and_swap(volatile AO_t * addr, AO_t old_val, AO_t new_val) + Atomically compare *addr to old_val, and replace *addr by new_val + if the first comparison succeeds. Returns nonzero if the comparison + succeeded and *addr was updated. +AO_t fetch_compare_and_swap(volatile AO_t * addr, AO_t old_val, AO_t new_val) + Atomically compare *addr to old_val, and replace *addr by new_val + if the first comparison succeeds; returns the original value of *addr. +AO_TS_VAL_t test_and_set(volatile AO_TS_t * addr) + Atomically read the binary value at *addr, and set it. AO_TS_VAL_t + is an enumeration type which includes two values AO_TS_SET and + AO_TS_CLEAR. An AO_TS_t location is capable of holding an + AO_TS_VAL_t, but may be much larger, as dictated by hardware + constraints. Test_and_set logically sets the value to AO_TS_SET. + It may be reset to AO_TS_CLEAR with the AO_CLEAR(AO_TS_t *) macro. + AO_TS_t locations should be initialized to AO_TS_INITIALIZER. + The values of AO_TS_SET and AO_TS_CLEAR are hardware dependent. + (On PA-RISC, AO_TS_SET is zero!) + +Test_and_set is a more limited version of compare_and_swap. Its only +advantage is that it is more easily implementable on some hardware. It +should thus be used if only binary test-and-set functionality is needed. + +If available, we also provide compare_and_swap operations that operate +on wider values. Since standard data types for double width values +may not be available, these explicitly take pairs of arguments for the +new and/or old value. Unfortunately, there are two common variants, +neither of which can easily and efficiently emulate the other. +The first performs a comparison against the entire value being replaced, +where the second replaces a double-width replacement, but performs +a single-width comparison: + +int compare_double_and_swap_double(volatile AO_double_t * addr, + AO_t old_val1, AO_t old_val2, + AO_t new_val1, AO_t new_val2); + +int compare_and_swap_double(volatile AO_double_t * addr, + AO_t old_val1, + AO_t new_val1, AO_t new_val2); + +where AO_double_t is a structure containing AO_val1 and AO_val2 fields, +both of type AO_t. For compare_and_swap_double, we compare against +the val1 field. AO_double_t exists only if AO_HAVE_double_t +is defined. + +Please note that AO_double_t (and AO_stack_t) variables should be properly +aligned (8-byte alignment on 32-bit targets, 16-byte alignment on 64-bit ones) +otherwise the behavior of a double-wide atomic primitive might be undefined +(or an assertion violation might occur) if such a misaligned variable is +passed (as a reference) to the primitive. Global and static variables should +already have proper alignment automatically but automatic variables (i.e. +located on the stack) might be misaligned because the stack might be +word-aligned (e.g. 4-byte stack alignment is the default one for x86). +Luckily, stack-allocated AO variables is a rare case in practice. + +ORDERING CONSTRAINTS: + +Each operation name also includes a suffix that specifies the associated +ordering semantics. The ordering constraint limits reordering of this +operation with respect to other atomic operations and ordinary memory +references. The current implementation assumes that all memory references +are to ordinary cacheable memory; the ordering guarantee is with respect +to other threads or processes, not I/O devices. (Whether or not this +distinction is important is platform-dependent.) + +Ordering suffixes are one of the following: + +: No memory barrier. A plain AO_nop() really does nothing. +_release: Earlier operations must become visible to other threads + before the atomic operation. +_acquire: Later operations must become visible after this operation. +_read: Subsequent reads must become visible after reads included in + the atomic operation or preceding it. Rarely useful for clients? +_write: Earlier writes become visible before writes during or after + the atomic operation. Rarely useful for clients? +_full: Ordered with respect to both earlier and later memory ops. + AO_store_full or AO_nop_full are the normal ways to force a store + to be ordered with respect to a later load. +_release_write: Ordered with respect to earlier writes. This is + normally implemented as either a _write or _release + barrier. +_acquire_read: Ordered with respect to later reads. This is + normally implemented as either a _read or _acquire barrier. +_dd_acquire_read: Ordered with respect to later reads that are data + dependent on this one. This is needed on + a pointer read, which is later dereferenced to read a + second value, with the expectation that the second + read is ordered after the first one. On most architectures, + this is equivalent to no barrier. (This is very + hard to define precisely. It should probably be avoided. + A major problem is that optimizers tend to try to + eliminate dependencies from the generated code, since + dependencies force the hardware to execute the code + serially.) + +We assume that if a store is data-dependent on a previous load, then +the two are always implicitly ordered. + +It is possible to test whether AO_ is available on the +current platform by checking whether AO_HAVE__ is defined +as a macro. + +Note that we generally don't implement operations that are either +meaningless (e.g. AO_nop_acquire, AO_nop_release) or which appear to +have no clear use (e.g. AO_load_release, AO_store_acquire, AO_load_write, +AO_store_read). On some platforms (e.g. PA-RISC) many operations +will remain undefined unless AO_REQUIRE_CAS is defined before including +the package. + +When typed in the package build directory, the following command +will print operations that are unimplemented on the platform: + +make test_atomic; ./test_atomic + +The following command generates a file "list_atomic.i" containing the +macro expansions of all implemented operations on the platform: + +make list_atomic.i + +Known issues include: + +We should be more precise in defining the semantics of the ordering +constraints, and if and how we can guarantee sequential consistency. + +Dd_acquire_read is very hard or impossible to define in a way that cannot +be invalidated by reasonably standard compiler transformations. + +There is probably no good reason to provide operations on standard +integer types, since those may have the wrong alignment constraints. + + +Example: + +If you want to initialize an object, and then "publish" a pointer to it +in a global location p, such that other threads reading the new value of +p are guaranteed to see an initialized object, it suffices to use +AO_release_write(p, ...) to write the pointer to the object, and to +retrieve it in other threads with AO_acquire_read(p). + +Platform notes: + +All X86: We quietly assume 486 or better. + +Microsoft compilers: +Define AO_ASSUME_WINDOWS98 to get access to hardware compare-and-swap +functionality. This relies on the InterlockedCompareExchange() function +which was apparently not supported in Windows95. (There may be a better +way to get access to this.) + +Gcc on x86: +Define AO_USE_PENTIUM4_INSTRS to use the Pentium 4 mfence instruction. +Currently this is appears to be of marginal benefit. -- cgit v1.2.1