| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
It is similar with 2D image for avoiding extended image width truncated.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"imagedim_non_pow_2" cases of basic modudle of confrmance shows
regression after use TILE_Y mode for large image by previous patch.
This bug comes from the non-align16 kernel of clEnqueueCopyBufferToImage
and clEnqueueCopyImageToBuffer.
It will force CL_RGBA/CL_UNORM_INT8/8191x8192 image of conformance test
to CL_R/CL_UNSIGNED_INT8/32764x8192 image for copying.
So it makes width as 8191 x 4 = 32764 and its width will exceed the maximum
width (16 x 1024 = 16384) of GEN surface state structure which only has 14 bits.
So use align4 copy kernel to avoid this bug.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
|
|
|
|
|
|
|
|
| |
There is a race condition between building .bc and header files and
generating code from .cl targets. Fix the race by adding the
dependency to generated files.
Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
|
|
|
|
|
|
|
|
|
| |
Move the generated builtin str and bin files into the Cmake build
directory to avoid chaos when changing LLVM.
V2: Fix a bug that the builtin.cl was not written into build dir.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are some step to handle device enqueue:
1. allocate the device enqueue bo to store the device enqueue
information for parent kernel. Add must convert all global buffers to
SVM buffers to make sure the child kernels have the same GPU address.
2. When flush the command, check whether have device enqueue or not. If
has device enqueue, must wait finish and parse the device enqueue info.
3. Start the child ndrange according the device enqueue info, and the
parent's global buffers as the exec info.
Because of non uniform workgroup size, one enqueue api will flush
serveral times, but device enqueue only need handle once, so add a flag
to function cl_command_queue_flush to indicate the last flush.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
|
|
|
| |
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
| |
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
| |
V2:
Delete some useless macro.
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
| |
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
| |
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
| |
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable CL_DEVICE_SVM_COARSE_GRAIN_BUFFER svm support, use userptr and softpin
to implement it. Use userptr to share the page between cpu and gpu, and softpin
to unify the cpu and gpu's address. Now it works on i386 system. x86_64 depends on
backend support.
This patch base on DRM library and DRM kernel driver's softpin patch:
http://lists.freedesktop.org/archives/intel-gfx/2015-September/075446.html.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
|
|
|
| |
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
CMRT_LIBRARY_DIRS doesn't include any library paths
when the library is installed system-wide, such as
/usr.
Also dlopen versioned library, as distros tend to
split non-versioned sonames into -devel packages.
Signed-off-by: Armin K <krejzi@email.com>
Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rewrite the cl_event, and modify all the event functions
using this new event manner. Event will co-operate with
command queue's thread together.
v2:
Fix a logic problem in event create failed.
V3:
Set enqueue default to do nothing, handle some enqueue has nothing
to do.
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to query low level layout of GL buffer object/texture/render
buffer, previous implementation introduced an egl extension and
implemented in Beignet side. This way is broken once mesa change its
related internal code. In this patch, we use an new egl extension
(EGL_MESA_image_dma_buf_export) to query related layout infomations
of gl texture. Since this egl extension is already accepted by Khronos,
so it's a stable method. This patch just implement GL texture 2d buffer
sharing, and we will implement other target type if necessary.
v2:
Add CMake build option to enable cl_khr_gl_sharing(default off).
Clean up related CMake code.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The runtime code is a little verbose in CL object handle.
Every CL objects should have a reference, a lock to protect itself
and an ICD dispatcher. We can organize them to a struct and place
it at the beginning of each CL object.
This base object is also used to protect the CL objects MT safe.
CL_OBJECT_LOCK/CL_OBJECT_UNLOCK macro will lock/unlock objects,
but we should use them within one function call, and the critical
region should be short.
We add CL_OBJECT_TAKE_OWNERSHIP/CL_OBJECT_RELEASE_OWNERSHIP macro
to own the object for a long time. CL_OBJECT_TAKE_OWNERSHIP will
not hold the lock and so will not cause deadlock problems.
For example, when we call NDRange on some memobj, we should take
the ownship of the memobj. If another thread call NDRange on the
same memobj, we should return some error like CL_OUT_OF_RESOURCE
to users and protect the memobj from accessing simultaneously.
V2:
Add atomic_read to avoid atomic_add misuse.
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If BXT pooled EU enable, the 3*6 EUs is split into 2 pooled, so change
the sub slice to 2.
For min no. of eu in pool, only affact fused down 2*6 BXT devices,
because beignet don't support these devices now, add assert only.
assert.
This patch is based on kernel patch: https://patchwork.freedesktop.org/series/8200/
Thanks Arun.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
| |
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CMRT is C for Media Runtime on Intel GPU, see https://github.com/01org/cmrt.
There is a request to make Beignet as intermedia layer of CMRT, in
other words, application programer write OpenCL APIs to execute the
CM kernel on GPU, the following shows the key code, and please refer
to the next patch of unit test for detail.
prog = clCreateProgramWithBinary("cm kernel");
clBuildProgram(prog);
kernel = clCreateKernel(prog, "kernel name");
image = clCreateImage();
clSetKernelArg(kernel, image);
clEnqueueNDRangeKernel(kernel);
Inside Beignet, once cm kernel is invoked, the following relative APIs
will be directly passed to CMRT library (libcmrt.so) which is loaded
via dlopen only when necessary. Since we use this simple method to
keep the code clean, OpenCL spec is not strictly followed, and cl_event
is not supported for this case.
v2: add comments about the cm queue in fuction cmrt_enqueue
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
| |
The clock_gettime will cause the linkage error on some
version of GCC, we need to add -lrt at the end of the
link command line.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2:
1. Just upload the first vme_state.
2. Remove duplicated code in check_opt1_extension.
3. Check image format before cl_gpgpu_bind_image_for_vme.
4. Fix error of getting mv. Because we suppose this kernel run in SIMD16
mode, so dword 0 of grf 1 should be
__gen_ocl_region(8,vme_result.s0), not
__gen_ocl_region(0,vme_result.s1).
v3:
Return CL_IMAGE_FORMAT_NOT_SUPPORTED if image format is not the required
one.
v4:
Fix two conflicts after code rebase and wordaround a curbe related bug.
v6:
Treat simd8 and simd16 differently when getting mv.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
|
| |
Basically, it boils down to a difference in getopt(3). The
getopt(3) on (Free)BSD will exit parsing arguments at the
first unknown argument.
Signed-off-by: Koop Mast <kwm@rainbow-runner.nl>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can change the image_channel_order to CL_RGBA and
image_channel_data_type to CL_UNSIGNED_INT32 for some special
case, thus 16 bytes can be read by one work item. Bandwidth is
fully used.
v2: merge patch 3 of initializing region0;
remove k dimension in kernel for 2d image.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: "Weng, Chuanbo" <chuanbo.weng@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Values of device max compute units and max subslice obtained
directly from the driver should be more accurate than our own
ID-based lookup values. This is particularly important when a
single device ID may encompass more than one configuration. If
the driver cannot provide a valid value for the given device,
we fallback on the ID-based lookup value.
This query requires libdrm 2.4.60. For now we will consider
the use of this query to be optional and exclude it from
compilation when building against older libdrm. Later we may
want to consider requiring the query or at least warning
more strongly when it is not supported.
v2: Make feature use conditional on libdrm version (Zhigang).
Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
| |
The llvm include should be specified when llvm is
not installed in standard dir.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can change the image_channel_order to CL_RGBA and
image_channel_data_type to CL_UNSIGNED_INT32 for some special
case, thus 16 bytes can be read by one work item. Bandwidth is
fully used.
v2:
Now we just optimize for IMAGE2D, so add judgement to not affect
other image type's code path.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instead of cpu way.
Before this patch, cl_mem_kernel_copy_image do cpu memory copy in order
to copy image array objects. This is very slow for large image size.
This patch implement image array copy in cl way, which dramatically
accelerate image array related clEnqueueCopyImage.
clCopyImage case in OpenCL conformance test will not be blocked anymore.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At some platforms with old c/c++ environment, C++11 features are not
supported, it results in the failure to build the gbe compiler part
which depends on LLVM/clang using C++11 features.
The way to resolve is to build a standalone gbe compiler within another
feasible system, and build beignet with the already built standalone
gbe compiler by setting USE_STANDALONE_GBE_COMPILER=true. The path of
the standalone compiler is /usr/local/lib/beignet as default or could
be specified by STANDALONE_GBE_COMPILER_DIR.
Once USE_STANDALONE_GBE_COMPILER is given, all the gbe compiler relative
code will not be built any longer, only libcl.so and libgebinterp.so are
built. And libcl.so is special for GEN_PCI_ID, which is queried from the
building machie or could be specified as CMake option.
v2: separate the CMake option name.
update the commit comments.
add back the script for gen pci id, and build driver with it.
v3: add file FindStandaloneGbeCompiler.cmake to make the main cmakefile clean.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
userptr is used to wrap a memory pointer (page aligned) supplied
by user space into a buffer object accessed by GPU, and so no extra
copy is needed. It is supported starting from linux kernel 3.16
and libdrm 2.4.58.
This patch is originally finished by Zhenyu Wang <zhenyuw@linux.intel.com>,
I did a little change and some code clean.
No regression issue found on IVB+Ubuntu14.10 with libdrm upgraded with tests:
beignet/utests, piglit, OpenCV/test&perf, conformance/basic&mem_host_flags&buffers
V2: add page align limit for data size, add comments for kernel without MMU_NOTIFIER
V3: add runtime check with host_unified_memory, return CL_MEM_OBJECT_ALLOCATION_FAILURE if failed
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
|
|
|
| |
Signed-off-by: Meng Mengmeng <mengmeng.meng@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
| |
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
| |
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
| |
Add a build flag HAS_X11 for it.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
| |
Signed-off-by: Lv Meng <meng.lv@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
driver can get chipset id by ioctl instead of calling lspci in cmake;
user could generator gen binary by configuring cmake option
-DGEN_PCI_ID=xxxx or calling the gbe_bin_generater with option -t
GEN_PCI_ID.
v2: add "\0GENC" magic code for gen binary, fix typo.
Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
detail cases: 1Darray, 2Darray, 2Darrayto2D, 2Darrayto3D, 2Dto2Darray, 3Dto2Darray.
1d used gpu copy.
v2:
fixed 1d array to 1d array copy, don't need to switch depth and height.
Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
| |
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
enqueues a command to fill an image object with a specified color.
fix typo cl_context_get_static_kernel_from_bin.
v2:
fix image 1d array bug.
Signed-off-by: Luo <xionghu.luo@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
| |
After FindXLib.cmake was removed, XLIB_LIBARY should have been
replaced with X11_LIBRARIES.
Signed-off-by: Abrahm Scully <abrahm.scully@gmail.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
| |
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
currently, the Gen GPU pciid of the underlying system is queried
and then passed to gbe_bin_generater as the target option.
This does not work when building the driver on another system with
non-intel GPUs, this patch relaxes the dependency by exporting the
pciid setting at CMake level, therefore, the pciid could be given
as a CMake option besides the current real time query method.
this patch also remove the redundancy code in utest/CMake by setting
PARENT_SCOPE in src/CMake.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
| |
Then beignet can link to user preferred drm library rather than default
Signed-off-by: Li Peng <peng.li@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
currently, there are same symbol names in libinterp.a (inside
libcl.so) and libgbe.so (compiler), and so have to dlopen libgbe.so
with RTLD_DEEPBIND, this flag makes std::cerr inside libgbe crash.
extract the interp part from libcl.so as libgbeinterp.so, therefore,
first dlopen libgbe.so without RTLD_DEEPBIND, then dlopen libgbeinterp.so
with RTLD_DEEPBIND, to fix the std:cerr crash issue.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On embedded/handheld devices, storage and memory are scarce, it is
necessary to provide only the OpenCL runtime library with small size,
and only the executable binary kernel will be supported on such device.
At the beginning of process (before function main), OpenCL runtime
(libcl.so) will try to load the compiler (libgbe.so), the system's
behavior is the same as before if successfully loaded, otherwise,
the runtime assumes no OpenCL compiler in the system, and the device
info will be changed as CL_DEVICE_COMPILER_AVAILABLE=false and
CL_DEVICE_PROFILE="EMBEDDED_PROFILE", the clBuildProgram returns
CL_COMPILER_NOT_AVAILABLE if the program is created with
clCreateProgramWithSource, following the OpenCL spec.
To simulate the case without OpenCL compiler, just delete the file
libgbe.so, or export OCL_NON_COMPILER=1.
Some explanation of the binary kernel interpreter (libinterp.a):
libinterp.a is used to interpret the binary kernel inside runtime,
and the runtime library libcl.so is built against libinterp.a.
Since the code to interpret binary kernel is tightly integrated inside
the compiler, to avoid code duplicate, a new file gbe_bin_interpreter.cpp
is created to include some other .cpp files; to make libinterp.a small
(the purpose to make libcl.so small), the macro GBE_COMPILER_AVAILABLE
is used to make only the needed code active when build for libinterp.a.
V2: code base is changed to call function gbe_set_image_base_index in
gbe_bin_generater, while this function is modified in this patch as
gbe_set_image_base_index_compiler, fix it accordingly.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
seperate the kernel code from host code to make it clean; build the
kernels offline by gbe_bin_generator to improve the performance.
v2:
fix the image base issue with the standalone compiler.
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
| |
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of use pre-defined path for dependent modules, e.g libdrm,
libdrm_intel, etc. Use pkg-config helper for cmake instead. This makes
it easy to work with developer own built version of those dependences.
Also remove libGL dependence for 'gbe_bin_generator' which is not required.
libutest.so still requires libGL now but might be fixed by checking real
GL dependence.
v2: Fix build with mesa source (92e6260) and link required EGL lib with utests too.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by:Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
| |
Add these three cl files,
one for src and dst are not aligned but have same offset to 4.
second for src's %4 offset is bigger than the dst's
third for src's %4 offset is small than the dst's
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|