diff options
author | gregrodgers <Gregory.Rodgers@amd.com> | 2023-04-19 16:14:40 -0500 |
---|---|---|
committer | JP Lehr <JanPatrick.Lehr@amd.com> | 2023-05-04 06:01:14 -0400 |
commit | f238a98e844752b955dcf3d7b95b9c76c75a0017 (patch) | |
tree | 1be2db77947855a21aa4bd05c387175cadf6335d /openmp/docs | |
parent | f3dcd3ad992c82be4f652fd2aac6b0ef414566a2 (diff) | |
download | llvm-f238a98e844752b955dcf3d7b95b9c76c75a0017.tar.gz |
[OpenMP][libomptarget][AMDGPU] Enable active HSA wait state
Adds HSA timeout hint of 2 seconds to the AMDGPU nextgen-plugin to improve
performance of small kernels.
The HSA runtime may stay in HSA_WAIT_STATE_ACTIVE for up to the timeout
value before switching to HSA_WAIT_STATE_BLOCKED. This can improve
latency from which small kernels can benefit.
The value was determined via experimentation w/ different benchmarks.
The timeout value can be overriden using the environment variable
LIBOMPTARGET_AMDGPU_STREAM_BUSYWAIT with a value in microseconds.
Original author: Greg Rodgers <Gregory.Rodgers@amd.com>
Contributions from: JP Lehr <JanPatrick.Lehr@amd.com>
Differential Revision: https://reviews.llvm.org/D148808
Diffstat (limited to 'openmp/docs')
-rw-r--r-- | openmp/docs/design/Runtimes.rst | 9 |
1 files changed, 9 insertions, 0 deletions
diff --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst index 98f47bc1c632..1402192581d3 100644 --- a/openmp/docs/design/Runtimes.rst +++ b/openmp/docs/design/Runtimes.rst @@ -1160,6 +1160,7 @@ There are several environment variables to change the behavior of the plugins: * ``LIBOMPTARGET_AMDGPU_TEAMS_PER_CU`` * ``LIBOMPTARGET_AMDGPU_MAX_ASYNC_COPY_BYTES`` * ``LIBOMPTARGET_AMDGPU_NUM_INITIAL_HSA_SIGNALS`` +* ``LIBOMPTARGET_AMDGPU_STREAM_BUSYWAIT`` The environment variables ``LIBOMPTARGET_SHARED_MEMORY_SIZE``, ``LIBOMPTARGET_STACK_SIZE`` and ``LIBOMPTARGET_HEAP_SIZE`` are described in @@ -1238,6 +1239,14 @@ managing several pre-created signals. These signals are mainly used by AMDGPU streams. More HSA signals will be created dynamically throughout the execution if needed. The default value is ``64``. +LIBOMPTARGET_AMDGPU_STREAM_BUSYWAIT +""""""""""""""""""""""""""""""""""" + +This environment variable controls the timeout hint in microseconds for the +HSA wait state within the AMDGPU plugin. For the duration of this value +the HSA runtime may busy wait. This can reduce overall latency. +The default value is ``2000000``. + .. _remote_offloading_plugin: Remote Offloading Plugin: |