summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorTim Renouf <tim.renouf@amd.com>2021-03-30 08:33:07 +0100
committerTim Renouf <tim.renouf@amd.com>2021-03-30 08:33:18 +0100
commit083b0f1b40fbb2f73eded4783654ff003b9c6603 (patch)
treea5a81aa81e1fdb442524cced0727afd7b2f875d8
parentc352a2b8290b0a088ac3442aca89380248f02381 (diff)
downloadllvm-083b0f1b40fbb2f73eded4783654ff003b9c6603.tar.gz
[AMDGPU] Update AMDGPU PAL usage documentation
Change-Id: I65f3edcfe5063551cad5aab0da1374c3a6ccd3a2
-rw-r--r--llvm/docs/AMDGPUUsage.rst480
1 files changed, 362 insertions, 118 deletions
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 51fd90e058ab..cbce156510ad 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -10959,140 +10959,384 @@ AMDPAL
------
This section provides code conventions used when the target triple OS is
-``amdpal`` (see :ref:`amdgpu-target-triples`) for passing runtime parameters
-from the application/runtime to each invocation of a hardware shader. These
-parameters include both generic, application-controlled parameters called
-*user data* as well as system-generated parameters that are a product of the
-draw or dispatch execution.
+``amdpal`` (see :ref:`amdgpu-target-triples`).
-User Data
-~~~~~~~~~
+.. _amdgpu-amdpal-code-object-metadata-section:
-Each hardware stage has a set of 32-bit *user data registers* which can be
-written from a command buffer and then loaded into SGPRs when waves are launched
-via a subsequent dispatch or draw operation. This is the way most arguments are
-passed from the application/runtime to a hardware shader.
+Code Object Metadata
+~~~~~~~~~~~~~~~~~~~~
-Compute User Data
-~~~~~~~~~~~~~~~~~
+.. note::
-Compute shader user data mappings are simpler than graphics shaders and have a
-fixed mapping.
+ The metadata is currently in development and is subject to major
+ changes. Only the current version is supported. *When this document
+ was generated the version was 2.6.*
-Note that there are always 10 available *user data entries* in registers -
-entries beyond that limit must be fetched from memory (via the spill table
-pointer) by the shader.
+Code object metadata is specified by the ``NT_AMDGPU_METADATA`` note
+record (see :ref:`amdgpu-note-records-v3-v4`).
- .. table:: PAL Compute Shader User Data Registers
- :name: pal-compute-user-data-registers
+The metadata is represented as Message Pack formatted binary data (see
+[MsgPack]_). The top level is a Message Pack map that includes the keys
+defined in table :ref:`amdgpu-amdpal-code-object-metadata-map-table`
+and referenced tables.
- ============= ================================
- User Register Description
- ============= ================================
- 0 Global Internal Table (32-bit pointer)
- 1 Per-Shader Internal Table (32-bit pointer)
- 2 - 11 Application-Controlled User Data (10 32-bit values)
- 12 Spill Table (32-bit pointer)
- 13 - 14 Thread Group Count (64-bit pointer)
- 15 GDS Range
- ============= ================================
+Additional information can be added to the maps. To avoid conflicts, any
+key names should be prefixed by "*vendor-name*." where ``vendor-name``
+can be the name of the vendor and specific vendor tool that generates the
+information. The prefix is abbreviated to simply "." when it appears
+within a map that has been added by the same *vendor-name*.
-Graphics User Data
-~~~~~~~~~~~~~~~~~~
+ .. table:: AMDPAL Code Object Metadata Map
+ :name: amdgpu-amdpal-code-object-metadata-map-table
-Graphics pipelines support a much more flexible user data mapping:
-
- .. table:: PAL Graphics Shader User Data Registers
- :name: pal-graphics-user-data-registers
-
- ============= ================================
- User Register Description
- ============= ================================
- 0 Global Internal Table (32-bit pointer)
- + Per-Shader Internal Table (32-bit pointer)
- + 1-15 Application Controlled User Data
- (1-15 Contiguous 32-bit Values in Registers)
- + Spill Table (32-bit pointer)
- + Draw Index (First Stage Only)
- + Vertex Offset (First Stage Only)
- + Instance Offset (First Stage Only)
- ============= ================================
-
- The placement of the global internal table remains fixed in the first *user
- data SGPR register*. Otherwise all parameters are optional, and can be mapped
- to any desired *user data SGPR register*, with the following restrictions:
-
- * Draw Index, Vertex Offset, and Instance Offset can only be used by the first
- active hardware stage in a graphics pipeline (i.e. where the API vertex
- shader runs).
-
- * Application-controlled user data must be mapped into a contiguous range of
- user data registers.
-
- * The application-controlled user data range supports compaction remapping, so
- only *entries* that are actually consumed by the shader must be assigned to
- corresponding *registers*. Note that in order to support an efficient runtime
- implementation, the remapping must pack *registers* in the same order as
- *entries*, with unused *entries* removed.
-
-.. _pal_global_internal_table:
-
-Global Internal Table
-~~~~~~~~~~~~~~~~~~~~~
-
-The global internal table is a table of *shader resource descriptors* (SRDs)
-that define how certain engine-wide, runtime-managed resources should be
-accessed from a shader. The majority of these resources have HW-defined formats,
-and it is up to the compiler to write/read data as required by the target
-hardware.
-
-The following table illustrates the required format:
-
- .. table:: PAL Global Internal Table
- :name: pal-git-table
-
- ============= ================================
- Offset Description
- ============= ================================
- 0-3 Graphics Scratch SRD
- 4-7 Compute Scratch SRD
- 8-11 ES/GS Ring Output SRD
- 12-15 ES/GS Ring Input SRD
- 16-19 GS/VS Ring Output #0
- 20-23 GS/VS Ring Output #1
- 24-27 GS/VS Ring Output #2
- 28-31 GS/VS Ring Output #3
- 32-35 GS/VS Ring Input SRD
- 36-39 Tessellation Factor Buffer SRD
- 40-43 Off-Chip LDS Buffer SRD
- 44-47 Off-Chip Param Cache Buffer SRD
- 48-51 Sample Position Buffer SRD
- 52 vaRange::ShadowDescriptorTable High Bits
- ============= ================================
-
- The pointer to the global internal table passed to the shader as user data
- is a 32-bit pointer. The top 32 bits should be assumed to be the same as
- the top 32 bits of the pipeline, so the shader may use the program
- counter's top 32 bits.
-
-.. _pal_call-convention:
+ =================== ============== ========= ======================================================================
+ String Key Value Type Required? Description
+ =================== ============== ========= ======================================================================
+ "amdpal.version" sequence of Required PAL code object metadata (major, minor) version. The current values
+ 2 integers are defined by *Util::Abi::PipelineMetadata(Major|Minor)Version*.
+ "amdpal.pipelines" sequence of Required Per-pipeline metadata. See
+ map :ref:`amdgpu-amdpal-code-object-pipeline-metadata-map-table` for the
+ definition of the keys included in that map.
+ =================== ============== ========= ======================================================================
-Call Convention
-~~~~~~~~~~~~~~~
+..
+
+ .. table:: AMDPAL Code Object Pipeline Metadata Map
+ :name: amdgpu-amdpal-code-object-pipeline-metadata-map-table
+
+ ====================================== ============== ========= ===================================================
+ String Key Value Type Required? Description
+ ====================================== ============== ========= ===================================================
+ ".name" string Source name of the pipeline.
+ ".type" string Pipeline type, e.g. VsPs. Values include:
+
+ - "VsPs"
+ - "Gs"
+ - "Cs"
+ - "Ngg"
+ - "Tess"
+ - "GsTess"
+ - "NggTess"
+
+ ".internal_pipeline_hash" sequence of Required Internal compiler hash for this pipeline. Lower
+ 2 integers 64 bits is the "stable" portion of the hash, used
+ for e.g. shader replacement lookup. Upper 64 bits
+ is the "unique" portion of the hash, used for
+ e.g. pipeline cache lookup. The value is
+ implementation defined, and can not be relied on
+ between different builds of the compiler.
+ ".shaders" map Per-API shader metadata. See
+ :ref:`amdgpu-amdpal-code-object-shader-map-table`
+ for the definition of the keys included in that
+ map.
+ ".hardware_stages" map Per-hardware stage metadata. See
+ :ref:`amdgpu-amdpal-code-object-hardware-stage-map-table`
+ for the definition of the keys included in that
+ map.
+ ".shader_functions" map Per-shader function metadata. See
+ :ref:`amdgpu-amdpal-code-object-shader-function-map-table`
+ for the definition of the keys included in that
+ map.
+ ".registers" map Required Hardware register configuration. See
+ :ref:`amdgpu-amdpal-code-object-register-map-table`
+ for the definition of the keys included in that
+ map.
+ ".user_data_limit" integer Number of user data entries accessed by this
+ pipeline.
+ ".spill_threshold" integer The user data spill threshold. 0xFFFF for
+ NoUserDataSpilling.
+ ".uses_viewport_array_index" boolean Indicates whether or not the pipeline uses the
+ viewport array index feature. Pipelines which use
+ this feature can render into all 16 viewports,
+ whereas pipelines which do not use it are
+ restricted to viewport #0.
+ ".es_gs_lds_size" integer Size in bytes of LDS space used internally for
+ handling data-passing between the ES and GS
+ shader stages. This can be zero if the data is
+ passed using off-chip buffers. This value should
+ be used to program all user-SGPRs which have been
+ marked with "UserDataMapping::EsGsLdsSize"
+ (typically only the GS and VS HW stages will ever
+ have a user-SGPR so marked).
+ ".nggSubgroupSize" integer Explicit maximum subgroup size for NGG shaders
+ (maximum number of threads in a subgroup).
+ ".num_interpolants" integer Graphics only. Number of PS interpolants.
+ ".mesh_scratch_memory_size" integer Max mesh shader scratch memory used.
+ ".api" string Name of the client graphics API.
+ ".api_create_info" binary Graphics API shader create info binary blob. Can
+ be defined by the driver using the compiler if
+ they want to be able to correlate API-specific
+ information used during creation at a later time.
+ ====================================== ============== ========= ===================================================
+
+..
+
+ .. table:: AMDPAL Code Object Shader Map
+ :name: amdgpu-amdpal-code-object-shader-map-table
+
+
+ +-------------+--------------+-------------------------------------------------------------------+
+ |String Key |Value Type |Description |
+ +=============+==============+===================================================================+
+ |- ".compute" |map |See :ref:`amdgpu-amdpal-code-object-api-shader-metadata-map-table` |
+ |- ".vertex" | |for the definition of the keys included in that map. |
+ |- ".hull" | | |
+ |- ".domain" | | |
+ |- ".geometry"| | |
+ |- ".pixel" | | |
+ +-------------+--------------+-------------------------------------------------------------------+
+
+..
+
+ .. table:: AMDPAL Code Object API Shader Metadata Map
+ :name: amdgpu-amdpal-code-object-api-shader-metadata-map-table
+
+ ==================== ============== ========= =====================================================================
+ String Key Value Type Required? Description
+ ==================== ============== ========= =====================================================================
+ ".api_shader_hash" sequence of Required Input shader hash, typically passed in from the client. The value
+ 2 integers is implementation defined, and can not be relied on between
+ different builds of the compiler.
+ ".hardware_mapping" sequence of Required Flags indicating the HW stages this API shader maps to. Values
+ string include:
+
+ - ".ls"
+ - ".hs"
+ - ".es"
+ - ".gs"
+ - ".vs"
+ - ".ps"
+ - ".cs"
-For graphics use cases, the calling convention is `amdgpu_gfx`.
+ ==================== ============== ========= =====================================================================
+
+..
+
+ .. table:: AMDPAL Code Object Hardware Stage Map
+ :name: amdgpu-amdpal-code-object-hardware-stage-map-table
+
+ +-------------+--------------+-----------------------------------------------------------------------+
+ |String Key |Value Type |Description |
+ +=============+==============+=======================================================================+
+ |- ".ls" |map |See :ref:`amdgpu-amdpal-code-object-hardware-stage-metadata-map-table` |
+ |- ".hs" | |for the definition of the keys included in that map. |
+ |- ".es" | | |
+ |- ".gs" | | |
+ |- ".vs" | | |
+ |- ".ps" | | |
+ |- ".cs" | | |
+ +-------------+--------------+-----------------------------------------------------------------------+
+
+..
+
+ .. table:: AMDPAL Code Object Hardware Stage Metadata Map
+ :name: amdgpu-amdpal-code-object-hardware-stage-metadata-map-table
+
+ ========================== ============== ========= ===============================================================
+ String Key Value Type Required? Description
+ ========================== ============== ========= ===============================================================
+ ".entry_point" string The ELF symbol pointing to this pipeline's stage entry point.
+ ".scratch_memory_size" integer Scratch memory size in bytes.
+ ".lds_size" integer Local Data Share size in bytes.
+ ".perf_data_buffer_size" integer Performance data buffer size in bytes.
+ ".vgpr_count" integer Number of VGPRs used.
+ ".sgpr_count" integer Number of SGPRs used.
+ ".vgpr_limit" integer If non-zero, indicates the shader was compiled with a
+ directive to instruct the compiler to limit the VGPR usage to
+ be less than or equal to the specified value (only set if
+ different from HW default).
+ ".sgpr_limit" integer SGPR count upper limit (only set if different from HW
+ default).
+ ".threadgroup_dimensions" sequence of Thread-group X/Y/Z dimensions (Compute only).
+ 3 integers
+ ".wavefront_size" integer Wavefront size (only set if different from HW default).
+ ".uses_uavs" boolean The shader reads or writes UAVs.
+ ".uses_rovs" boolean The shader reads or writes ROVs.
+ ".writes_uavs" boolean The shader writes to one or more UAVs.
+ ".writes_depth" boolean The shader writes out a depth value.
+ ".uses_append_consume" boolean The shader uses append and/or consume operations, either
+ memory or GDS.
+ ".uses_prim_id" boolean The shader uses PrimID.
+ ========================== ============== ========= ===============================================================
+
+..
+
+ .. table:: AMDPAL Code Object Shader Function Map
+ :name: amdgpu-amdpal-code-object-shader-function-map-table
+
+ =============== ============== ====================================================================
+ String Key Value Type Description
+ =============== ============== ====================================================================
+ *symbol name* map *symbol name* is the ELF symbol name of the shader function code
+ entry address. The value is the function's metadata. See
+ :ref:`amdgpu-amdpal-code-object-shader-function-metadata-map-table`.
+ =============== ============== ====================================================================
+
+..
+
+ .. table:: AMDPAL Code Object Shader Function Metadata Map
+ :name: amdgpu-amdpal-code-object-shader-function-metadata-map-table
+
+ ============================= ============== =================================================================
+ String Key Value Type Description
+ ============================= ============== =================================================================
+ ".api_shader_hash" sequence of Input shader hash, typically passed in from the client. The value
+ 2 integers is implementation defined, and can not be relied on between
+ different builds of the compiler.
+ ".scratch_memory_size" sequence of Size in bytes of scratch memory used by the shader.
+ 2 integers
+ ".lds_size" sequence of Size in bytes of LDS memory.
+ 2 integers
+ ".vgpr_count" integer Number of VGPRs used by the shader.
+ ".sgpr_count" integer Number of SGPRs used by the shader.
+ ".stack_frame_size_in_bytes" integer Amount of stack size used by the shader.
+ ".shader_subtype" string Shader subtype/kind. Values include:
+
+ - "Unknown"
+
+ ============================= ============== =================================================================
+
+..
+
+ .. table:: AMDPAL Code Object Register Map
+ :name: amdgpu-amdpal-code-object-register-map-table
+
+ ========================== ============== ====================================================================
+ 32-bit Integer Key Value Type Description
+ ========================== ============== ====================================================================
+ ``reg offset`` 32-bit integer ``reg offset`` is the dword offset into the GFXIP register space of
+ a GRBM register (i.e., driver accessible GPU register number, not
+ shader GPR register number). The driver is required to program each
+ specified register to the corresponding specified value when
+ executing this pipeline. Typically, the ``reg offsets`` are the
+ ``uint16_t`` offsets to each register as defined by the hardware
+ chip headers. The register is set to the provided value. However, a
+ ``reg offset`` that specifies a user data register (e.g.,
+ COMPUTE_USER_DATA_0) needs special treatment. See
+ :ref:`amdgpu-amdpal-code-object-user-data-section` section for more
+ information.
+ ========================== ============== ====================================================================
+
+.. _amdgpu-amdpal-code-object-user-data-section:
+
+User Data
++++++++++
+
+Each hardware stage has a set of 32-bit physical SPI *user data registers*
+(either 16 or 32 based on graphics IP and the stage) which can be
+written from a command buffer and then loaded into SGPRs when waves are
+launched via a subsequent dispatch or draw operation. This is the way
+most arguments are passed from the application/runtime to a hardware
+shader.
+
+PAL abstracts this functionality by exposing a set of 128 *user data
+entries* per pipeline a client can use to pass arguments from a command
+buffer to one or more shaders in that pipeline. The ELF code object must
+specify a mapping from virtualized *user data entries* to physical *user
+data registers*, and PAL is responsible for implementing that mapping,
+including spilling overflow *user data entries* to memory if needed.
+
+Since the *user data registers* are GRBM-accessible SPI registers, this
+mapping is actually embedded in the ``.registers`` metadata entry. For
+most registers, the value in that map is a literal 32-bit value that
+should be written to the register by the driver. However, when the
+register is a *user data register* (any USER_DATA register e.g.,
+SPI_SHADER_USER_DATA_PS_5), the value is instead an encoding that tells
+the driver to write either a *user data entry* value or one of several
+driver-internal values to the register. This encoding is described in
+the following table:
.. note::
- `amdgpu_gfx` Function calls are currently in development and are
- subject to major changes.
+ Currently, *user data registers* 0 and 1 (e.g., SPI_SHADER_USER_DATA_PS_0,
+ and SPI_SHADER_USER_DATA_PS_1) are reserved. *User data register* 0 must
+ always be programmed to the address of the GlobalTable, and *user data
+ register* 1 must always be programmed to the address of the PerShaderTable.
-This calling convention shares most properties with calling non-kernel
-functions (see
-:ref:`amdgpu-amdhsa-function-call-convention-non-kernel-functions`).
-Differences are:
+..
- - Currently there are none, differences will be listed here
+ .. table:: AMDPAL User Data Mapping
+ :name: amdgpu-amdpal-code-object-metadata-user-data-mapping-table
+
+ ========== ================= ===============================================================================
+ Value Name Description
+ ========== ================= ===============================================================================
+ 0..127 *User Data Entry* 32-bit value of user_data_entry[N] as specified via *CmdSetUserData()*
+ 0x10000000 GlobalTable 32-bit pointer to GPU memory containing the global internal table (should
+ always point to *user data register* 0).
+ 0x10000001 PerShaderTable 32-bit pointer to GPU memory containing the per-shader internal table. See
+ :ref:`amdgpu-amdpal-code-object-metadata-user-data-per-shader-table-section`
+ for more detail (should always point to *user data register* 1).
+ 0x10000002 SpillTable 32-bit pointer to GPU memory containing the user data spill table. See
+ :ref:`amdgpu-amdpal-code-object-metadata-user-data-spill-table-section` for
+ more detail.
+ 0x10000003 BaseVertex Vertex offset (32-bit unsigned integer). Not needed if the pipeline doesn't
+ reference the draw index in the vertex shader. Only supported by the first
+ stage in a graphics pipeline.
+ 0x10000004 BaseInstance Instance offset (32-bit unsigned integer). Only supported by the first stage in
+ a graphics pipeline.
+ 0x10000005 DrawIndex Draw index (32-bit unsigned integer). Only supported by the first stage in a
+ graphics pipeline.
+ 0x10000006 Workgroup Thread group count (32-bit unsigned integer). Low half of a 64-bit address of
+ a buffer containing the grid dimensions for a Compute dispatch operation. The
+ high half of the address is stored in the next sequential user-SGPR. Only
+ supported by compute pipelines.
+ 0x1000000A EsGsLdsSize Indicates that PAL will program this user-SGPR to contain the amount of LDS
+ space used for the ES/GS pseudo-ring-buffer for passing data between shader
+ stages.
+ 0x1000000B ViewId View id (32-bit unsigned integer) identifies a view of graphic
+ pipeline instancing.
+ 0x1000000C StreamOutTable 32-bit pointer to GPU memory containing the stream out target SRD table. This
+ can only appear for one shader stage per pipeline.
+ 0x1000000D PerShaderPerfData 32-bit pointer to GPU memory containing the per-shader performance data buffer.
+ 0x1000000F VertexBufferTable 32-bit pointer to GPU memory containing the vertex buffer SRD table. This can
+ only appear for one shader stage per pipeline.
+ 0x10000010 UavExportTable 32-bit pointer to GPU memory containing the UAV export SRD table. This can
+ only appear for one shader stage per pipeline (PS). These replace color targets
+ and are completely separate from any UAVs used by the shader. This is optional,
+ and only used by the PS when UAV exports are used to replace color-target
+ exports to optimize specific shaders.
+ 0x10000011 NggCullingData 64-bit pointer to GPU memory containing the hardware register data needed by
+ some NGG pipelines to perform culling. This value contains the address of the
+ first of two consecutive registers which provide the full GPU address.
+ 0x10000015 FetchShaderPtr 64-bit pointer to GPU memory containing the fetch shader subroutine.
+ ========== ================= ===============================================================================
+
+.. _amdgpu-amdpal-code-object-metadata-user-data-per-shader-table-section:
+
+Per-Shader Table
+################
+
+Low 32 bits of the GPU address for an optional buffer in the ``.data``
+section of the ELF. The high 32 bits of the address match the high 32 bits
+of the shader's program counter.
+
+The buffer can be anything the shader compiler needs it for, and
+allows each shader to have its own region of the ``.data`` section.
+Typically, this could be a table of buffer SRD's and the data pointed to
+by the buffer SRD's, but it could be a flat-address region of memory as
+well. Its layout and usage are defined by the shader compiler.
+
+Each shader's table in the ``.data`` section is referenced by the symbol
+``_amdgpu_``\ *xs*\ ``_shdr_intrl_data`` where *xs* corresponds with the
+hardware shader stage the data is for. E.g.,
+``_amdgpu_cs_shdr_intrl_data`` for the compute shader hardware stage.
+
+.. _amdgpu-amdpal-code-object-metadata-user-data-spill-table-section:
+
+Spill Table
+###########
+
+It is possible for a hardware shader to need access to more *user data
+entries* than there are slots available in user data registers for one
+or more hardware shader stages. In that case, the PAL runtime expects
+the necessary *user data entries* to be spilled to GPU memory and use
+one user data register to point to the spilled user data memory. The
+value of the *user data entry* must then represent the location where
+a shader expects to read the low 32-bits of the table's GPU virtual
+address. The *spill table* itself represents a set of 32-bit values
+managed by the PAL runtime in GPU-accessible memory that can be made
+indirectly accessible to a hardware shader.
Unspecified OS
--------------