Merge tag 'drm-next-2021-08-31-1' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Dave Airlie: "Highlights: - i915 has seen a lot of refactoring and uAPI cleanups due to a change in the upstream direction going forward This has all been audited with known userspace, but there may be some pitfalls that were missed. - i915 now uses common TTM to enable discrete memory on DG1/2 GPUs - i915 enables Jasper and Elkhart Lake by default and has preliminary XeHP/DG2 support - amdgpu adds support for Cyan Skillfish - lots of implicit fencing rules documented and fixed up in drivers - msm now uses the core scheduler - the irq midlayer has been removed for non-legacy drivers - the sysfb code now works on more than x86. Otherwise the usual smattering of stuff everywhere, panels, bridges, refactorings. Detailed summary: core: - extract i915 eDP backlight into core - DP aux bus support - drm_device.irq_enabled removed - port drivers to native irq interfaces - export gem shadow plane handling for vgem - print proper driver name in framebuffer registration - driver fixes for implicit fencing rules - ARM fixed rate compression modifier added - updated fb damage handling - rmfb ioctl logging/docs - drop drm_gem_object_put_locked - define DRM_FORMAT_MAX_PLANES - add gem fb vmap/vunmap helpers - add lockdep_assert(once) helpers - mark drm irq midlayer as legacy - use offset adjusted bo mapping conversion vgaarb: - cleanups fbdev: - extend efifb handling to all arches - div by 0 fixes for multiple drivers udmabuf: - add hugepage mapping support dma-buf: - non-dynamic exporter fixups - document implicit fencing rules amdgpu: - Initial Cyan Skillfish support - switch virtual DCE over to vkms based atomic - VCN/JPEG power down fixes - NAVI PCIE link handling fixes - AMD HDMI freesync fixes - Yellow Carp + Beige Goby fixes - Clockgating/S0ix/SMU/EEPROM fixes - embed hw fence in job - rework dma-resv handling - ensure eviction to system ram amdkfd: - uapi: SVM address range query added - sysfs leak fix - GPUVM TLB optimizations - vmfault/migration counters i915: - Enable JSL and EHL by default - preliminary XeHP/DG2 support - remove all CNL support (never shipped) - move to TTM for discrete memory support - allow mixed object mmap handling - GEM uAPI spring cleaning - add I915_MMAP_OBJECT_FIXED - reinstate ADL-P mmap ioctls - drop a bunch of unused by userspace features - disable and remove GPU relocations - revert some i915 misfeatures - major refactoring of GuC for Gen11+ - execbuffer object locking separate step - reject caching/set-domain on discrete - Enable pipe DMC loading on XE-LPD and ADL-P - add PSF GV point support - Refactor and fix DDI buffer translations - Clean up FBC CFB allocation code - Finish INTEL_GEN() and friends macro conversions nouveau: - add eDP backlight support - implicit fence fix msm: - a680/7c3 support - drm/scheduler conversion panfrost: - rework GPU reset virtio: - fix fencing for planes ast: - add detect support bochs: - move to tiny GPU driver vc4: - use hotplug irqs - HDMI codec support vmwgfx: - use internal vmware device headers ingenic: - demidlayering irq rcar-du: - shutdown fixes - convert to bridge connector helpers zynqmp-dsub: - misc fixes mgag200: - convert PLL handling to atomic mediatek: - MT8133 AAL support - gem mmap object support - MT8167 support etnaviv: - NXP Layerscape LS1028A SoC support - GEM mmap cleanups tegra: - new user API exynos: - missing unlock fix - build warning fix - use refcount_t" * tag 'drm-next-2021-08-31-1' of git://anongit.freedesktop.org/drm/drm: (1318 commits) drm/amd/display: Move AllowDRAMSelfRefreshOrDRAMClockChangeInVblank to bounding box drm/amd/display: Remove duplicate dml init drm/amd/display: Update bounding box states (v2) drm/amd/display: Update number of DCN3 clock states drm/amdgpu: disable GFX CGCG in aldebaran drm/amdgpu: Clear RAS interrupt status on aldebaran drm/amdgpu: Add support for RAS XGMI err query drm/amdkfd: Account for SH/SE count when setting up cu masks. drm/amdgpu: rename amdgpu_bo_get_preferred_pin_domain drm/amdgpu: drop redundant cancel_delayed_work_sync call drm/amdgpu: add missing cleanups for more ASICs on UVD/VCE suspend drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend drm/amdkfd: map SVM range with correct access permission drm/amdkfd: check access permisson to restore retry fault drm/amdgpu: Update RAS XGMI Error Query drm/amdgpu: Add driver infrastructure for MCA RAS drm/amd/display: Add Logging for HDMI color depth information drm/amd/amdgpu: consolidate PSP TA init shared buf functions drm/amd/amdgpu: add name field back to ras_common_if drm/amdgpu: Fix build with missing pm_suspend_target_state module export ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2021-09-01 11:26:46 -0700
committer: Linus Torvalds <torvalds@linux-foundation.org> 2021-09-01 11:26:46 -0700
commit: 477f70cd2a67904e04c2c2b9bd0fa2e95222f2f6 (patch)
tree: 1897dd1de49e1ea24897163533e2d8ead5dad0ad /Documentation/gpu/rfc
parent: 835d31d319d9c8c4eb6cac074643360ba0ecab10 (diff)
parent: 8f0284f190e6a0aa09015090568c03f18288231a (diff)
download: linux-next-477f70cd2a67904e04c2c2b9bd0fa2e95222f2f6.tar.gz
3 files changed, 274 insertions, 0 deletions
diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h b/Documentation/gpu/rfc/i915_parallel_execbuf.h
new file mode 100644
index 000000000000..8cbe2c4e0172
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h
@@ -0,0 +1,122 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2021 Intel Corporation
+ */
+
+#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */
+
+/**
+ * struct drm_i915_context_engines_parallel_submit - Configure engine for
+ * parallel submission.
+ *
+ * Setup a slot in the context engine map to allow multiple BBs to be submitted
+ * in a single execbuf IOCTL. Those BBs will then be scheduled to run on the GPU
+ * in parallel. Multiple hardware contexts are created internally in the i915
+ * run these BBs. Once a slot is configured for N BBs only N BBs can be
+ * submitted in each execbuf IOCTL and this is implicit behavior e.g. The user
+ * doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows how
+ * many BBs there are based on the slot's configuration. The N BBs are the last
+ * N buffer objects or first N if I915_EXEC_BATCH_FIRST is set.
+ *
+ * The default placement behavior is to create implicit bonds between each
+ * context if each context maps to more than 1 physical engine (e.g. context is
+ * a virtual engine). Also we only allow contexts of same engine class and these
+ * contexts must be in logically contiguous order. Examples of the placement
+ * behavior described below. Lastly, the default is to not allow BBs to
+ * preempted mid BB rather insert coordinated preemption on all hardware
+ * contexts between each set of BBs. Flags may be added in the future to change
+ * both of these default behaviors.
+ *
+ * Returns -EINVAL if hardware context placement configuration is invalid or if
+ * the placement configuration isn't supported on the platform / submission
+ * interface.
+ * Returns -ENODEV if extension isn't supported on the platform / submission
+ * interface.
+ *
+ * .. code-block:: none
+ *
+ *	Example 1 pseudo code:
+ *	CS[X] = generic engine of same class, logical instance X
+ *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ *	set_engines(INVALID)
+ *	set_parallel(engine_index=0, width=2, num_siblings=1,
+ *		     engines=CS[0],CS[1])
+ *
+ *	Results in the following valid placement:
+ *	CS[0], CS[1]
+ *
+ *	Example 2 pseudo code:
+ *	CS[X] = generic engine of same class, logical instance X
+ *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ *	set_engines(INVALID)
+ *	set_parallel(engine_index=0, width=2, num_siblings=2,
+ *		     engines=CS[0],CS[2],CS[1],CS[3])
+ *
+ *	Results in the following valid placements:
+ *	CS[0], CS[1]
+ *	CS[2], CS[3]
+ *
+ *	This can also be thought of as 2 virtual engines described by 2-D array
+ *	in the engines the field with bonds placed between each index of the
+ *	virtual engines. e.g. CS[0] is bonded to CS[1], CS[2] is bonded to
+ *	CS[3].
+ *	VE[0] = CS[0], CS[2]
+ *	VE[1] = CS[1], CS[3]
+ *
+ *	Example 3 pseudo code:
+ *	CS[X] = generic engine of same class, logical instance X
+ *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ *	set_engines(INVALID)
+ *	set_parallel(engine_index=0, width=2, num_siblings=2,
+ *		     engines=CS[0],CS[1],CS[1],CS[3])
+ *
+ *	Results in the following valid and invalid placements:
+ *	CS[0], CS[1]
+ *	CS[1], CS[3] - Not logical contiguous, return -EINVAL
+ */
+struct drm_i915_context_engines_parallel_submit {
+	/**
+	 * @base: base user extension.
+	 */
+	struct i915_user_extension base;
+
+	/**
+	 * @engine_index: slot for parallel engine
+	 */
+	__u16 engine_index;
+
+	/**
+	 * @width: number of contexts per parallel engine
+	 */
+	__u16 width;
+
+	/**
+	 * @num_siblings: number of siblings per context
+	 */
+	__u16 num_siblings;
+
+	/**
+	 * @mbz16: reserved for future use; must be zero
+	 */
+	__u16 mbz16;
+
+	/**
+	 * @flags: all undefined flags must be zero, currently not defined flags
+	 */
+	__u64 flags;
+
+	/**
+	 * @mbz64: reserved for future use; must be zero
+	 */
+	__u64 mbz64[3];
+
+	/**
+	 * @engines: 2-d array of engine instances to configure parallel engine
+	 *
+	 * length = width (i) * num_siblings (j)
+	 * index = j + i * num_siblings
+	 */
+	struct i915_engine_class_instance engines[0];
+
+} __packed;
+
diff --git a/Documentation/gpu/rfc/i915_scheduler.rst b/Documentation/gpu/rfc/i915_scheduler.rst
new file mode 100644
index 000000000000..cbda75065dad
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_scheduler.rst
@@ -0,0 +1,148 @@
+=========================================
+I915 GuC Submission/DRM Scheduler Section
+=========================================
+
+Upstream plan
+=============
+For upstream the overall plan for landing GuC submission and integrating the
+i915 with the DRM scheduler is:
+
+* Merge basic GuC submission
+	* Basic submission support for all gen11+ platforms
+	* Not enabled by default on any current platforms but can be enabled via
+	  modparam enable_guc
+	* Lots of rework will need to be done to integrate with DRM scheduler so
+	  no need to nit pick everything in the code, it just should be
+	  functional, no major coding style / layering errors, and not regress
+	  execlists
+	* Update IGTs / selftests as needed to work with GuC submission
+	* Enable CI on supported platforms for a baseline
+	* Rework / get CI heathly for GuC submission in place as needed
+* Merge new parallel submission uAPI
+	* Bonding uAPI completely incompatible with GuC submission, plus it has
+	  severe design issues in general, which is why we want to retire it no
+	  matter what
+	* New uAPI adds I915_CONTEXT_ENGINES_EXT_PARALLEL context setup step
+	  which configures a slot with N contexts
+	* After I915_CONTEXT_ENGINES_EXT_PARALLEL a user can submit N batches to
+	  a slot in a single execbuf IOCTL and the batches run on the GPU in
+	  paralllel
+	* Initially only for GuC submission but execlists can be supported if
+	  needed
+* Convert the i915 to use the DRM scheduler
+	* GuC submission backend fully integrated with DRM scheduler
+		* All request queues removed from backend (e.g. all backpressure
+		  handled in DRM scheduler)
+		* Resets / cancels hook in DRM scheduler
+		* Watchdog hooks into DRM scheduler
+		* Lots of complexity of the GuC backend can be pulled out once
+		  integrated with DRM scheduler (e.g. state machine gets
+		  simplier, locking gets simplier, etc...)
+	* Execlists backend will minimum required to hook in the DRM scheduler
+		* Legacy interface
+		* Features like timeslicing / preemption / virtual engines would
+		  be difficult to integrate with the DRM scheduler and these
+		  features are not required for GuC submission as the GuC does
+		  these things for us
+		* ROI low on fully integrating into DRM scheduler
+		* Fully integrating would add lots of complexity to DRM
+		  scheduler
+	* Port i915 priority inheritance / boosting feature in DRM scheduler
+		* Used for i915 page flip, may be useful to other DRM drivers as
+		  well
+		* Will be an optional feature in the DRM scheduler
+	* Remove in-order completion assumptions from DRM scheduler
+		* Even when using the DRM scheduler the backends will handle
+		  preemption, timeslicing, etc... so it is possible for jobs to
+		  finish out of order
+	* Pull out i915 priority levels and use DRM priority levels
+	* Optimize DRM scheduler as needed
+
+TODOs for GuC submission upstream
+=================================
+
+* Need an update to GuC firmware / i915 to enable error state capture
+* Open source tool to decode GuC logs
+* Public GuC spec
+
+New uAPI for basic GuC submission
+=================================
+No major changes are required to the uAPI for basic GuC submission. The only
+change is a new scheduler attribute: I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP.
+This attribute indicates the 2k i915 user priority levels are statically mapped
+into 3 levels as follows:
+
+* -1k to -1 Low priority
+* 0 Medium priority
+* 1 to 1k High priority
+
+This is needed because the GuC only has 4 priority bands. The highest priority
+band is reserved with the kernel. This aligns with the DRM scheduler priority
+levels too.
+
+Spec references:
+----------------
+* https://www.khronos.org/registry/EGL/extensions/IMG/EGL_IMG_context_priority.txt
+* https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/chap5.html#devsandqueues-priority
+* https://spec.oneapi.com/level-zero/latest/core/api.html#ze-command-queue-priority-t
+
+New parallel submission uAPI
+============================
+The existing bonding uAPI is completely broken with GuC submission because
+whether a submission is a single context submit or parallel submit isn't known
+until execbuf time activated via the I915_SUBMIT_FENCE. To submit multiple
+contexts in parallel with the GuC the context must be explicitly registered with
+N contexts and all N contexts must be submitted in a single command to the GuC.
+The GuC interfaces do not support dynamically changing between N contexts as the
+bonding uAPI does. Hence the need for a new parallel submission interface. Also
+the legacy bonding uAPI is quite confusing and not intuitive at all. Furthermore
+I915_SUBMIT_FENCE is by design a future fence, so not really something we should
+continue to support.
+
+The new parallel submission uAPI consists of 3 parts:
+
+* Export engines logical mapping
+* A 'set_parallel' extension to configure contexts for parallel
+  submission
+* Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL
+
+Export engines logical mapping
+------------------------------
+Certain use cases require BBs to be placed on engine instances in logical order
+(e.g. split-frame on gen11+). The logical mapping of engine instances can change
+based on fusing. Rather than making UMDs be aware of fusing, simply expose the
+logical mapping with the existing query engine info IOCTL. Also the GuC
+submission interface currently only supports submitting multiple contexts to
+engines in logical order which is a new requirement compared to execlists.
+Lastly, all current platforms have at most 2 engine instances and the logical
+order is the same as uAPI order. This will change on platforms with more than 2
+engine instances.
+
+A single bit will be added to drm_i915_engine_info.flags indicating that the
+logical instance has been returned and a new field,
+drm_i915_engine_info.logical_instance, returns the logical instance.
+
+A 'set_parallel' extension to configure contexts for parallel submission
+------------------------------------------------------------------------
+The 'set_parallel' extension configures a slot for parallel submission of N BBs.
+It is a setup step that must be called before using any of the contexts. See
+I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE or I915_CONTEXT_ENGINES_EXT_BOND for
+similar existing examples. Once a slot is configured for parallel submission the
+execbuf2 IOCTL can be called submitting N BBs in a single IOCTL. Initially only
+supports GuC submission. Execlists supports can be added later if needed.
+
+Add I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT and
+drm_i915_context_engines_parallel_submit to the uAPI to implement this
+extension.
+
+.. kernel-doc:: Documentation/gpu/rfc/i915_parallel_execbuf.h
+        :functions: drm_i915_context_engines_parallel_submit
+
+Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL
+-------------------------------------------------------------------
+Contexts that have been configured with the 'set_parallel' extension can only
+submit N BBs in a single execbuf2 IOCTL. The BBs are either the last N objects
+in the drm_i915_gem_exec_object2 list or the first N if I915_EXEC_BATCH_FIRST is
+set. The number of BBs is implicit based on the slot submitted and how it has
+been configured by 'set_parallel' or other extensions. No uAPI changes are
+required to the execbuf2 IOCTL.
diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
index 05670442ca1b..91e93a705230 100644
--- a/Documentation/gpu/rfc/index.rst
+++ b/Documentation/gpu/rfc/index.rst
@@ -19,3 +19,7 @@ host such documentation:
 .. toctree::
 
     i915_gem_lmem.rst
+
+.. toctree::
+
+    i915_scheduler.rst
author	Linus Torvalds <torvalds@linux-foundation.org>	2021-09-01 11:26:46 -0700
committer	Linus Torvalds <torvalds@linux-foundation.org>	2021-09-01 11:26:46 -0700
commit	477f70cd2a67904e04c2c2b9bd0fa2e95222f2f6 (patch)
tree	1897dd1de49e1ea24897163533e2d8ead5dad0ad /Documentation/gpu/rfc
parent	835d31d319d9c8c4eb6cac074643360ba0ecab10 (diff)
parent	8f0284f190e6a0aa09015090568c03f18288231a (diff)
download	linux-next-477f70cd2a67904e04c2c2b9bd0fa2e95222f2f6.tar.gz