delta/beignet.git - gitlab.freedesktop.org: beignet/beignet.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Use aligned16 and aligne4 kernel to copy for large 3D image with TILE_Y.	Yan Wang	2017-06-14	1	-0/+2
\| \| \| \| \| \| \|	It is similar with 2D image for avoiding extended image width truncated. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
*	Fix bug of clEnqueueCopyBufferToImage and clEnqueueCopyImageToBuffer.	Yan Wang	2017-05-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	"imagedim_non_pow_2" cases of basic modudle of confrmance shows regression after use TILE_Y mode for large image by previous patch. This bug comes from the non-align16 kernel of clEnqueueCopyBufferToImage and clEnqueueCopyImageToBuffer. It will force CL_RGBA/CL_UNORM_INT8/8191x8192 image of conformance test to CL_R/CL_UNSIGNED_INT8/32764x8192 image for copying. So it makes width as 8191 x 4 = 32764 and its width will exceed the maximum width (16 x 1024 = 16384) of GEN surface state structure which only has 14 bits. So use align4 copy kernel to avoid this bug. Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
*	build: fix cmake code generation dependencies.	Ismo Puustinen	2017-05-25	1	-2/+2
\| \| \| \| \| \| \| \|	There is a race condition between building .bc and header files and generating code from .cl targets. Fix the race by adding the dependency to generated files. Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
*	CMAKE: Refine builtin kernel bin generator	Pan Xiuli	2017-03-07	1	-7/+7
\| \| \| \| \| \| \| \| \|	Move the generated builtin str and bin files into the Cmake build directory to avoid chaos when changing LLVM. V2: Fix a bug that the builtin.cl was not written into build dir. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
*	OCL20: handle device enqueue in runtime.	Yang, Rong R	2016-12-30	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are some step to handle device enqueue: 1. allocate the device enqueue bo to store the device enqueue information for parent kernel. Add must convert all global buffers to SVM buffers to make sure the child kernels have the same GPU address. 2. When flush the command, check whether have device enqueue or not. If has device enqueue, must wait finish and parse the device enqueue info. 3. Start the child ndrange according the device enqueue info, and the parent's global buffers as the exec info. Because of non uniform workgroup size, one enqueue api will flush serveral times, but device enqueue only need handle once, so add a flag to function cl_command_queue_flush to indicate the last flush. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
*	Modify program Info using cl_get_info_helper.	Junyan He	2016-12-15	1	-0/+1
\| \| \| \| \|	Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
*	Modify clGetSamplerInfo using cl_get_info_helper.	Junyan He	2016-12-15	1	-1/+2
\| \| \| \| \|	Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
*	Modify clGetDeviceInfo using cl_get_info_helper.	Junyan He	2016-12-15	1	-0/+1
\| \| \| \| \| \| \| \|	V2: Delete some useless macro. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
*	Modify clGetContextInfo using cl_get_info_helper.	Junyan He	2016-12-15	1	-0/+1
\| \| \| \| \|	Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
*	Modify clGetPlatformInfo using cl_get_info_helper.	Junyan He	2016-12-15	1	-0/+1
\| \| \| \| \|	Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
*	Add a helper function for all information get.	Junyan He	2016-12-15	1	-0/+1
\| \| \| \| \|	Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
*	OCL20: Add svm support.	Yang Rong	2016-11-08	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Enable CL_DEVICE_SVM_COARSE_GRAIN_BUFFER svm support, use userptr and softpin to implement it. Use userptr to share the page between cpu and gpu, and softpin to unify the cpu and gpu's address. Now it works on i386 system. x86_64 depends on backend support. This patch base on DRM library and DRM kernel driver's softpin patch: http://lists.freedesktop.org/archives/intel-gfx/2015-September/075446.html. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
*	Delete useless cl_thread files.	Junyan He	2016-10-10	1	-1/+0
\| \| \| \| \|	Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
*	buildsys: Use CMRT_LIBDIR instead of CMRT_LIBRARY_DIRS	Armin K	2016-10-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	CMRT_LIBRARY_DIRS doesn't include any library paths when the library is installed system-wide, such as /usr. Also dlopen versioned library, as distros tend to split non-versioned sonames into -devel packages. Signed-off-by: Armin K <krejzi@email.com> Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
*	Modify all event related functions using new event handle.	Junyan He	2016-09-28	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rewrite the cl_event, and modify all the event functions using this new event manner. Event will co-operate with command queue's thread together. v2: Fix a logic problem in event create failed. V3: Set enqueue default to do nothing, handle some enqueue has nothing to do. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
*	Runtime: re-enable cl_khr_gl_sharing with existing egl extension.	Chuanbo Weng	2016-09-12	1	-12/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to query low level layout of GL buffer object/texture/render buffer, previous implementation introduced an egl extension and implemented in Beignet side. This way is broken once mesa change its related internal code. In this patch, we use an new egl extension (EGL_MESA_image_dma_buf_export) to query related layout infomations of gl texture. Since this egl extension is already accepted by Khronos, so it's a stable method. This patch just implement GL texture 2d buffer sharing, and we will implement other target type if necessary. v2: Add CMake build option to enable cl_khr_gl_sharing(default off). Clean up related CMake code. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
*	Runtime: Add CL base object for all cl objects.	Junyan He	2016-09-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The runtime code is a little verbose in CL object handle. Every CL objects should have a reference, a lock to protect itself and an ICD dispatcher. We can organize them to a struct and place it at the beginning of each CL object. This base object is also used to protect the CL objects MT safe. CL_OBJECT_LOCK/CL_OBJECT_UNLOCK macro will lock/unlock objects, but we should use them within one function call, and the critical region should be short. We add CL_OBJECT_TAKE_OWNERSHIP/CL_OBJECT_RELEASE_OWNERSHIP macro to own the object for a long time. CL_OBJECT_TAKE_OWNERSHIP will not hold the lock and so will not cause deadlock problems. For example, when we call NDRange on some memobj, we should take the ownship of the memobj. If another thread call NDRange on the same memobj, we should return some error like CL_OUT_OF_RESOURCE to users and protect the memobj from accessing simultaneously. V2: Add atomic_read to avoid atomic_add misuse. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
*	Runtime: set the sub slice according to kernel pooled EU configure.	Yang Rong	2016-07-06	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	If BXT pooled EU enable, the 36 EUs is split into 2 pooled, so change the sub slice to 2. For min no. of eu in pool, only affact fused down 26 BXT devices, because beignet don't support these devices now, add assert only. assert. This patch is based on kernel patch: https://patchwork.freedesktop.org/series/8200/ Thanks Arun. Signed-off-by: Yang Rong <rong.r.yang@intel.com>
*	CMake: use CHECK_LIBRARY_EXISTS to check the function instead of version.	Yang Rong	2016-06-23	1	-20/+20
\| \| \| \| \|	Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
*	make Beignet as intermedia layer of CMRT	Guo Yejun	2015-12-09	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CMRT is C for Media Runtime on Intel GPU, see https://github.com/01org/cmrt. There is a request to make Beignet as intermedia layer of CMRT, in other words, application programer write OpenCL APIs to execute the CM kernel on GPU, the following shows the key code, and please refer to the next patch of unit test for detail. prog = clCreateProgramWithBinary("cm kernel"); clBuildProgram(prog); kernel = clCreateKernel(prog, "kernel name"); image = clCreateImage(); clSetKernelArg(kernel, image); clEnqueueNDRangeKernel(kernel); Inside Beignet, once cm kernel is invoked, the following relative APIs will be directly passed to CMRT library (libcmrt.so) which is loaded via dlopen only when necessary. Since we use this simple method to keep the code clean, OpenCL spec is not strictly followed, and cl_event is not supported for this case. v2: add comments about the cm queue in fuction cmrt_enqueue Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
*	CMake: Add -lrt to the link command of libcl.so	Junyan He	2015-11-17	1	-0/+1
\| \| \| \| \| \| \| \| \|	The clock_gettime will cause the linkage error on some version of GCC, we need to add -lrt at the end of the link command line. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
*	Add extensions intel_accelerator and basic intel_motion_estimation.	Chuanbo Weng	2015-11-10	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	v2: 1. Just upload the first vme_state. 2. Remove duplicated code in check_opt1_extension. 3. Check image format before cl_gpgpu_bind_image_for_vme. 4. Fix error of getting mv. Because we suppose this kernel run in SIMD16 mode, so dword 0 of grf 1 should be __gen_ocl_region(8,vme_result.s0), not __gen_ocl_region(0,vme_result.s1). v3: Return CL_IMAGE_FORMAT_NOT_SUPPORTED if image format is not the required one. v4: Fix two conflicts after code rebase and wordaround a curbe related bug. v6: Treat simd8 and simd16 differently when getting mv. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
*	Reorder GBE_BIN_GENERATER arguments.	Koop Mast	2015-07-17	1	-2/+2
\| \| \| \| \| \| \| \| \|	Basically, it boils down to a difference in getopt(3). The getopt(3) on (Free)BSD will exit parsing arguments at the first unknown argument. Signed-off-by: Koop Mast <kwm@rainbow-runner.nl> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
*	Optimization of clEnqueueCopyBufferToImage for 16 aligned case.	Luo Xionghu	2015-05-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	We can change the image_channel_order to CL_RGBA and image_channel_data_type to CL_UNSIGNED_INT32 for some special case, thus 16 bytes can be read by one work item. Bandwidth is fully used. v2: merge patch 3 of initializing region0; remove k dimension in kernel for 2d image. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Weng, Chuanbo" <chuanbo.weng@intel.com>
*	Query the driver directly for compute units and subslice	Jeff McGee	2015-03-19	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Values of device max compute units and max subslice obtained directly from the driver should be more accurate than our own ID-based lookup values. This is particularly important when a single device ID may encompass more than one configuration. If the driver cannot provide a valid value for the given device, we fallback on the ID-based lookup value. This query requires libdrm 2.4.60. For now we will consider the use of this query to be optional and exclude it from compilation when building against older libdrm. Later we may want to consider requiring the query or at least warning more strongly when it is not supported. v2: Make feature use conditional on libdrm version (Zhigang). Signed-off-by: Jeff McGee <jeff.mcgee@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	Add LLVM_INCLUDE_DIR to CMakeList of src.	Junyan He	2015-03-17	1	-1/+2
\| \| \| \| \| \| \| \|	The llvm include should be specified when llvm is not installed in standard dir. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	Optimization of clEnqueueCopyImageToBuffer for 16 aligned case.	Chuanbo Weng	2015-02-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can change the image_channel_order to CL_RGBA and image_channel_data_type to CL_UNSIGNED_INT32 for some special case, thus 16 bytes can be read by one work item. Bandwidth is fully used. v2: Now we just optimize for IMAGE2D, so add judgement to not affect other image type's code path. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	Implement 1D/2D image array related cl_mem_kernel_copy_image in cl way ↵	Chuanbo Weng	2015-02-06	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	instead of cpu way. Before this patch, cl_mem_kernel_copy_image do cpu memory copy in order to copy image array objects. This is very slow for large image size. This patch implement image array copy in cl way, which dramatically accelerate image array related clEnqueueCopyImage. clCopyImage case in OpenCL conformance test will not be blocked anymore. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	add CMake option USE_STANDALONE_GBE_COMPILER and STANDALONE_GBE_COMPILER_DIR	Guo Yejun	2015-01-12	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	At some platforms with old c/c++ environment, C++11 features are not supported, it results in the failure to build the gbe compiler part which depends on LLVM/clang using C++11 features. The way to resolve is to build a standalone gbe compiler within another feasible system, and build beignet with the already built standalone gbe compiler by setting USE_STANDALONE_GBE_COMPILER=true. The path of the standalone compiler is /usr/local/lib/beignet as default or could be specified by STANDALONE_GBE_COMPILER_DIR. Once USE_STANDALONE_GBE_COMPILER is given, all the gbe compiler relative code will not be built any longer, only libcl.so and libgebinterp.so are built. And libcl.so is special for GEN_PCI_ID, which is queried from the building machie or could be specified as CMake option. v2: separate the CMake option name. update the commit comments. add back the script for gen pci id, and build driver with it. v3: add file FindStandaloneGbeCompiler.cmake to make the main cmakefile clean. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	support CL_MEM_USE_HOST_PTR with userptr for cl buffer	Guo Yejun	2014-11-07	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	userptr is used to wrap a memory pointer (page aligned) supplied by user space into a buffer object accessed by GPU, and so no extra copy is needed. It is supported starting from linux kernel 3.16 and libdrm 2.4.58. This patch is originally finished by Zhenyu Wang <zhenyuw@linux.intel.com>, I did a little change and some code clean. No regression issue found on IVB+Ubuntu14.10 with libdrm upgraded with tests: beignet/utests, piglit, OpenCV/test&perf, conformance/basic&mem_host_flags&buffers V2: add page align limit for data size, add comments for kernel without MMU_NOTIFIER V3: add runtime check with host_unified_memory, return CL_MEM_OBJECT_ALLOCATION_FAILURE if failed Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com> Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
*	add building dependency GIT_SHA1	Meng Mengmeng	2014-11-05	1	-0/+1
\| \| \| \| \|	Signed-off-by: Meng Mengmeng <mengmeng.meng@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	create GIT_SHA1 without any dependency	Meng Mengmeng	2014-10-24	1	-3/+2
\| \| \| \|	Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	add beignet GIT_HAL1 if there is .git directory	Meng Mengmeng	2014-10-23	1	-0/+7
\| \| \| \|	Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	Only compiler X11 files and do X11 operations when found X11.	Yang Rong	2014-08-28	1	-2/+9
\| \| \| \| \| \| \|	Add a build flag HAS_X11 for it. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	improve the clEnqueueCopyBufferRect performance in some cases	Lv Meng	2014-07-16	1	-1/+2
\| \| \| \| \| \| \|	Signed-off-by: Lv Meng <meng.lv@intel.com> Reviewed-by: He Junyan <junyan.he@inbox.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	remove lspci, gbe_bin_genenrater would generator llvm binary by default.	Luo	2014-07-08	1	-19/+13
\| \| \| \| \| \| \| \| \| \| \| \|	driver can get chipset id by ioctl instead of calling lspci in cmake; user could generator gen binary by configuring cmake option -DGEN_PCI_ID=xxxx or calling the gbe_bin_generater with option -t GEN_PCI_ID. v2: add "\0GENC" magic code for gen binary, fix typo. Signed-off-by: Luo <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	Build: check whether lspci exists.	Zhigang Gong	2014-07-04	1	-0/+7
\| \| \| \|	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
*	add cpu copy for 1Darray and 2darray related copy APIs.	Luo	2014-06-24	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	detail cases: 1Darray, 2Darray, 2Darrayto2D, 2Darrayto3D, 2Dto2Darray, 3Dto2Darray. 1d used gpu copy. v2: fixed 1d array to 1d array copy, don't need to switch depth and height. Signed-off-by: Luo <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com> Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
*	add BEIGNET_INSTALL_DIR to clean code	Guo Yejun	2014-06-24	1	-1/+1
\| \| \| \| \|	Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	implement API clEnqueueFillImage.	Luo	2014-06-24	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \|	enqueues a command to fill an image object with a specified color. fix typo cl_context_get_static_kernel_from_bin. v2: fix image 1d array bug. Signed-off-by: Luo <xionghu.luo@intel.com> Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	Fix linking to X11 libraries.	Abrahm Scully	2014-06-19	1	-1/+1
\| \| \| \| \| \| \| \|	After FindXLib.cmake was removed, XLIB_LIBARY should have been replaced with X11_LIBRARIES. Signed-off-by: Abrahm Scully <abrahm.scully@gmail.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	Add the kernels used by clEnqueueBufferFill API	Junyan He	2014-06-13	1	-1/+4
\| \| \| \| \|	Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	relax the build dependency on Gen GPUopencl-1.1	Guo Yejun	2014-06-12	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	currently, the Gen GPU pciid of the underlying system is queried and then passed to gbe_bin_generater as the target option. This does not work when building the driver on another system with non-intel GPUs, this patch relaxes the dependency by exporting the pciid setting at CMake level, therefore, the pciid could be given as a CMake option besides the current real time query method. this patch also remove the redundancy code in utest/CMake by setting PARENT_SCOPE in src/CMake. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	add DRM_LIBDIR path into link directory list	Li Peng	2014-06-04	1	-1/+1
\| \| \| \| \| \| \|	Then beignet can link to user preferred drm library rather than default Signed-off-by: Li Peng <peng.li@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	extract libgbeinterp.so from runtime (libcl.so)	Guo Yejun	2014-05-30	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	currently, there are same symbol names in libinterp.a (inside libcl.so) and libgbe.so (compiler), and so have to dlopen libgbe.so with RTLD_DEEPBIND, this flag makes std::cerr inside libgbe crash. extract the interp part from libcl.so as libgbeinterp.so, therefore, first dlopen libgbe.so without RTLD_DEEPBIND, then dlopen libgbeinterp.so with RTLD_DEEPBIND, to fix the std:cerr crash issue. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	separate runtime(libcl.so) and compiler(libgbe.so)	Guo Yejun	2014-05-28	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On embedded/handheld devices, storage and memory are scarce, it is necessary to provide only the OpenCL runtime library with small size, and only the executable binary kernel will be supported on such device. At the beginning of process (before function main), OpenCL runtime (libcl.so) will try to load the compiler (libgbe.so), the system's behavior is the same as before if successfully loaded, otherwise, the runtime assumes no OpenCL compiler in the system, and the device info will be changed as CL_DEVICE_COMPILER_AVAILABLE=false and CL_DEVICE_PROFILE="EMBEDDED_PROFILE", the clBuildProgram returns CL_COMPILER_NOT_AVAILABLE if the program is created with clCreateProgramWithSource, following the OpenCL spec. To simulate the case without OpenCL compiler, just delete the file libgbe.so, or export OCL_NON_COMPILER=1. Some explanation of the binary kernel interpreter (libinterp.a): libinterp.a is used to interpret the binary kernel inside runtime, and the runtime library libcl.so is built against libinterp.a. Since the code to interpret binary kernel is tightly integrated inside the compiler, to avoid code duplicate, a new file gbe_bin_interpreter.cpp is created to include some other .cpp files; to make libinterp.a small (the purpose to make libcl.so small), the macro GBE_COMPILER_AVAILABLE is used to make only the needed code active when build for libinterp.a. V2: code base is changed to call function gbe_set_image_base_index in gbe_bin_generater, while this function is modified in this patch as gbe_set_image_base_index_compiler, fix it accordingly. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com> Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	move enqueue_copy_image kernels outside of runtime code.	Luo	2014-05-22	1	-2/+21
\| \| \| \| \| \| \| \| \| \| \|	seperate the kernel code from host code to make it clean; build the kernels offline by gbe_bin_generator to improve the performance. v2: fix the image base issue with the standalone compiler. Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com> Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
*	Add the pci id support for gbe_generate	Junyan He	2014-05-22	1	-1/+7
\| \| \| \| \|	Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
*	Use pkg-config to check modules	Zhenyu Wang	2014-04-17	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of use pre-defined path for dependent modules, e.g libdrm, libdrm_intel, etc. Use pkg-config helper for cmake instead. This makes it easy to work with developer own built version of those dependences. Also remove libGL dependence for 'gbe_bin_generator' which is not required. libutest.so still requires libGL now but might be fixed by checking real GL dependence. v2: Fix build with mesa source (92e6260) and link required EGL lib with utests too. Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Reviewed-by：Zhigang Gong <zhigang.gong@linux.intel.com>
*	Add three copy cl files for Enqueue Copy usage.	Junyan He	2014-04-02	1	-1/+3
\| \| \| \| \| \| \| \| \| \|	Add these three cl files, one for src and dst are not aligned but have same offset to 4. second for src's %4 offset is bigger than the dst's third for src's %4 offset is small than the dst's Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>