summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Allow creating out-of-order queues with clCreateCommandQueueHEADmasterRebecca N. Palmer2018-08-201-29/+5
| | | | | | | | clCreateCommandQueueWithProperties can already create them, but that's a 2.0 function. Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Make in-order command queues actually be in-orderRebecca N. Palmer2018-08-206-34/+71
| | | | | | | | | When beignet added out-of-order execution support (7fd45f15), it made *all* command queues out-of-order, even if they were created as (and are reported by clGetCommandQueueInfo as) in-order. Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Add preliminary LLVM 7 supportRebecca N. Palmer2018-08-206-2/+22
| | | | | | | | | | | | | | | This is preliminary because LLVM 7 has not been released yet: it was tested with the snapshot from Debian experimental (svn336894). 1.Change linking order, as clangCodeGen now links to clangFrontend 2.Pass references not pointers to WriteBitcodeToFile and CloneModule 3.Add the headers that LoopSimplifyID, LCSSAID and some create*Pass have moved to 4.Define our DEBUG whether or not we just undefined LLVM's (theirs is now LLVM_DEBUG, but we never actually use it) Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Add LLVM 6.0 supportRebecca N. Palmer2018-08-202-1/+7
| | | | | | | | LLVMContext::setDiagnosticHandler and LoopInfo::markAsRemoved have been renamed. Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* More user-friendly "type not supported" errorsRebecca N. Palmer2018-08-201-12/+12
| | | | | | | | | Output a meaningful error message instead of just sel.has*Type. In the case of double inputs (i.e. possibly literals), specify how to make a literal single precision. Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Don't leak memory on long chains of eventsRebecca N. Palmer2018-08-202-8/+25
| | | | | | | | | | | | Delete event->depend_events when it is no longer needed, to allow the event objects it refers to to be freed. This avoids out-of-memory hangs in large dependency trees (e.g. long iterative calculations): https://launchpad.net/bugs/1354086 Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Enable Coffee Lake supportMark Thompson2018-02-053-3/+153
| | | | | | | | | Little change is needed here because the graphics core is the same as Kaby Lake. Includes all PCI IDs currently supported by the kernel driver in the drm-intel tree (Coffee Lake S, H and U devices in GT 1, 2 and 3 configurations). Signed-off-by: Mark Thompson <sw@jkqxz.net>
* Fix enabling of fp64 extensionMark Thompson2018-02-051-8/+8
| | | | | | | | This should only be enabled after setting the default extensions, because the default setup overwrites the current extension string rather than adding to it. Signed-off-by: Mark Thompson <sw@jkqxz.net>
* Ensure that DRM device uses the i915 driverMark Thompson2018-02-051-0/+30
| | | | | | | | | This avoids calling random ioctl()s and returning nonsensical errors for unsupported devices. In particular, loading is much cleaner on setups where the driver needs to iterate over multiple devices to find the correct one because the Intel graphics device is not the first DRM device. Signed-off-by: Mark Thompson <sw@jkqxz.net>
* Runtime: Remove X11 dri2 connection failed warning message.Yang Rong2018-01-101-2/+0
| | | | | | | | | This meesage is just for X11, if use wayland, it is not a error, so delete it. If X11 device open failed, they are another warning message below. Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Docs: OCL_STRICT_CONFORMANCE is default-on since 1.1Rebecca N. Palmer2017-11-011-3/+1
| | | | | Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Docs: Fix grammarRebecca N. Palmer2017-11-011-12/+12
| | | | | Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Docs: Add Release 1.3.2 to NEWS.Yang Rong2017-10-261-0/+3
| | | | Signed-off-by: Yang Rong <rong.r.yang@intel.com>
* GBE: Fix a TBAA issue against llvm5.0.Yang Rong2017-10-261-1/+2
| | | | | | | | | Casting from pointer of char to pointer of int breaks llvm TypeBasedAliasAnalysis. So we use may_alias attribute to explicitly tell the TBAA that it may alias other data type memory access. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* metainfo: escape ampersandIgor Gnatenko2017-10-091-1/+1
| | | | | | | com.intel.beignet.metainfo.xml: failed to parse com.intel.beignet.metainfo.xml: Error on line 147: Entity did not end with a semicolon; most likely you used an ampersand character without intending to start an entity - escape ampersand as &amp; Signed-off-by: Igor Gnatenko <ignatenkobrain@fedoraproject.org> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* backend: use simd-1 for scalar dst in indirectMov.Song, Ruiling2017-09-211-14/+24
| | | | | | | | This fix a failure introduced by load-store optimization on IVB. the test case is: builtin_kernel_block_motion_estimate_intel Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* GBE: remove static context to fix Segmentation fault.Yang Rong2017-09-214-33/+39
| | | | | | | | | | | | | If application has static clProgram, when application exit, the static context has been deleted before delete static clProgram will cause segmentation fault. As the global static context is just for link, use the individual context of each llvm module, when link the llvm module, generate the new llvm module from src. V2: fix llvm 3.8 build error and CleanLlvmResource delete bug. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* GBE: enable llvm5.0 support.Yang Rong2017-09-216-33/+87
| | | | | | | | | | | 1. getOrInsertFunction without nullptr. 2. handle f16 rounding. 3. remove llvm value dump. 4. handle AddrSpaceCastInst when parsing block info. V2: use stripPointerCasts instead of BitCast and AddrSpaceCast. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* libocl: enable llvm5.0 support.Yang Rong2017-09-213-32/+59
| | | | | | | | | There are 2 changes: 1. enable cl_khr_3d_image_writes, llvm5.0 required. 2. change enqueue_ndrange functions and ndrange_t type for llvm5.0. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* libocl: Consider only bottom ilogb(2m-1)+1 bitsJan Vesely2017-09-211-30/+30
| | | | | Signed-off-by: Jan Vesely <jano.vesely@gmail.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* libocl: Add shuffle and shuffle2 builtins for half typeJan Vesely2017-09-212-0/+4
| | | | | Signed-off-by: Jan Vesely <jano.vesely@gmail.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* Runtime: implement clEnqueueAcquireGLObjects and clEnqueueReleaseGLObjects.Yang Rong2017-09-211-0/+150
| | | | | | | | | | | As the application is responsible for synchronizing access to shared objects, before call clEnqueueAcquireGLObjects, GL's use has been finished, so just set the event status. clEnqueueReleaseGLObjects is same. V2: V1 is wrong version, correct it. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* Runtime: fix a build warning.Yang, Rong R2017-07-311-5/+6
| | | | | Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* GBE: fix a errMsg uninitialized build warning.Yang, Rong R2017-07-271-3/+3
| | | | | Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* Runtime: fix the context ref is not 0 assert when delete.Yang, Rong R2017-07-271-22/+8
| | | | | | | | | | | The CL_ENQUEUE_FILL_BUFFER_ALIGN8_* internal program is the same program, only add the program's ref once, but when delete context, caculate the internal program count, will add them individually. This mismatch will cause the context be free by mistake. New different CL_ENQUEUE_FILL_BUFFER_ALIGN8_* program for clearly. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* backend: Fix a bug in load-store optimization.Song, Ruiling2017-07-271-25/+46
| | | | | | | | when we are merging STOREs, we should use the very last instruction as the insertion point. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Runtime: fix a cl_gpgpu_bind_image_for_vme NULL SIGSEGV.Yang, Rong R2017-07-271-1/+2
| | | | | Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* cmake: add option OCL_ICD_INSTALL_PREFIX to set icd file install path.Yang, Rong R2017-07-271-12/+15
| | | | | | | | It is for the user who don't has root permission. V2: change to option name to OCL_ICD_INSTALL_PREFIX. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* backend: refine global immediate optimizationrander2017-07-201-4/+0
| | | | | | | | for ABS(UD) = UD on Gen, so delete it, or it make compilation failed on some platform Signed-off-by: rander.wang <rander.wang@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Fix GCC6 build bugPan Xiuli2017-07-201-0/+1
| | | | | | | | GCC6 refine the c headers and need to add the needed function header, like the abs in math.h. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* backend: refine load store optimizationSong, Ruiling2017-07-181-37/+88
| | | | | | | | | | | | | | | | | | | | | | this fix basic test in conformance tests failed for vec8 of char because of overflow. And it fix many test items failed in opencv because of offset error (1)modify the size of searchInsnArray to 32, it is the max size for char And add check for overflow if too many insn (2)Make sure the start insn is the first insn of searched array because if it is not the first, the offset maybe invalid. And it is complex to modify offset without error V2: refine search index, using J not I V3: remove (2), now add offset to the pointer of start pass OpenCV, conformance basic and compiler tests, utests V4: check pointer type, if 64bit, modify it by 64, or 32 V5: refine findSafeInstruction() and variable naming in findConsecutiveAccess(). Signed-off-by: rander.wang <rander.wang@intel.com> Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* add utest compiler_block_motion_estimate_intel for extension ↵Luo Xionghu2017-07-123-0/+224
| | | | | | | | | cl_intel_device_side_avc_motion_estimation. fix build warnings. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Signed-off-by: Xionghu Luo <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* add utest compiler_skip_check for extension ↵Luo Xionghu2017-07-123-0/+244
| | | | | | | | | cl_intel_device_side_avc_motion_estimation. fix build warnings. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Signed-off-by: Xionghu Luo <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* add utest compiler_intra_prediction for extenstion ↵Luo Xionghu2017-07-123-1/+211
| | | | | | | | | | cl_intel_device_side_avc_motion_estimation. fix build warnings. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Signed-off-by: Xionghu Luo <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Implement extension cl_intel_device_side_avc_motion_estimation.Chuanbo Weng2017-07-1232-36/+2282
| | | | | | | | | | | | | | | | | | This patch mainly contains: 1. built-in function __gen_ocl_ime implementation. 2. Lots of built-in functions of cl_intel_device_side_avc_motion_estimation are implemented. 3. This extension is required to run in simd16 mode. v2: move the utests to seprate patches one by one; as all the utests has extension function check, no need to put them in stand alone utest; uncomment the self test; fix extension check logic issue, should be && instead of ||. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Signed-off-by: Xionghu Luo <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Runtime: remove ctx's useless fileds.Yang, Rong R2017-07-103-43/+5
| | | | | | | built_in_prgs and built_in_kernels seems useless, remove them. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* Utest: fix a build-in program leak.Yang, Rong R2017-07-101-0/+1
| | | | | Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* Runtime: fix a recurrent release context error.Yang, Rong R2017-07-101-10/+8
| | | | | | | | | Before release internal resources, must set them to null, otherwize, when delete these resources, will call release context again. The ctx->built_in_prgs should be release by application. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* backend: improve add zero patternrander2017-07-061-5/+5
| | | | | | | | | | remove the negation check for adding zero. it also can be applied this optimization V2: refine the function name for zeroAdd Signed-off-by: rander.wang <rander.wang@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* utests: add utest for fdiv to rcprander2017-07-063-1/+71
| | | | | | | | for this case 1.0f/src, 2.0f/src can be converted, but 3.0f/src and i/src cant Signed-off-by: rander.wang <rander.wang@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* backend: refine fdiv to rcp at some casesrander2017-07-061-0/+28
| | | | | | | | | | | | | | | | | | | when the src0 of fdiv is a immedia value and it is exactly pow of 2, like 2.0f, 4.0f, 1.0/8.0f, fdiv %0, imm, %1 can be convert to rcp %0, %1 mul %0, %0, imm. for fdiv cost 8cycle, rcp 4cycle. it will save at least 3cycle. pass the conformance test and utests V2: refine negation flag V3: modify negation by negate Signed-off-by: rander.wang <rander.wang@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* backend: refine math log functionrander2017-07-041-40/+10
| | | | | | | | | | | | remove a few unnecessary codes , and get 20% improvement at worse case. If X is a NAN, there are some if-return codes to return NAN. Now change it to add(x - x) which get the same NAN pass the conformance tests and utests Signed-off-by: rander.wang <rander.wang@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* backend: refine pow functionrander2017-07-041-146/+148
| | | | | | | | | | | Now save 40% time than before (1) group many branches which deal with corner case to one branch. (2) using HW exp2 and log2 to replace some instructions pass conformance tests and utest Signed-off-by: rander.wang <rander.wang@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Runtime: refine max group size for SKL & KBLrander2017-07-041-9/+9
| | | | | | | | | Now change max group size to 256. it is a reasonable size for Gen9. According to performance test, 256 make good progress in openCV and no regression. So change it Signed-off-by: rander.wang <rander.wang@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* backend: refine load/store merging algorithmrander2017-06-231-9/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now it works for sequence: load(0), load(1), load(2) but it cant work for load(2), load(0), load(1). because it compared the last merged load and the new one not all the loads for sequence: load(0), load(1), load(2). the load(0) is the start, can find that load(1) is successor without space, so put it to a merge fifo. then the start is moving to the top of fifo load(1), and compared with load(2). Also load(2) can be merged for load(2), load(0), load(1). load(2) cant be merged with load(0) for a space between them. So skip load(0) and mov to next load(1).And this load(1) can be merged. But it never go back merge load(0) Now change the algorithm. (1) find all loads maybe merged arround the start by the distance to the start. the distance is depended on data type, for 32bit data, the distance is 4. Put them in a list (2) sort the list by the distance from the start. (3) search the continuous sequence including the start to merge V2: (1)refine the sort and compare algoritm. First find all the IO in small offset compared to start. Then call std:sort (2)check the number of candidate IO to be favorable to performance for most cases there is no chance to merge IO Signed-off-by: rander.wang <rander.wang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* backend: add global immediate optimizationrander2017-06-231-25/+342
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | there are some global immediates in global var list of LLVM. these imm can be integrated in instructions. for compiler_global_immediate_optimized test in utest, there are two global immediates: L0: MOV(1) %42<0>:UD : 0x0:UD MOV(1) %43<0>:UD : 0x30:UD used by: ADD(16) %49<1>:D : %42<0,1,0>:D %48<8,8,1>:D ADD(16) %54<1>:D : %43<0,1,0>:D %53<8,8,1>:D it can be ADD(16) %49<1>:D : %48<8,8,1>:D 0x0:UD ADD(16) %54<1>:D : %53<8,8,1>:D 0x30:UD Then the MOV can be removed. And after this optimization, ADD 0 can be change to MOV, then local copy propagation can be done. V2: (1) add environment variable to enable/disable the optimization (2) refine the architecture of imm optimization, inherit from global optimizer not local block optimizer V3: merge with latest master driver V4: (1)refine some type errors (2)remove UD/D check for no need (3)refine imm calculate for UD/D Signed-off-by: rander.wang <rander.wang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* GBE: clean llvm module's clone and release.Yang, Rong R2017-06-2310-58/+76
| | | | | | | | | | | | | | | There are some changes: 1. Clone the module before call LLVMLinkModules2, remove other clones for it. 2. Don't delete module in function llvmToGen. 3. Add a function programNewFromLLVMFile so genProgramNewFromLLVM and buildFromLLVMModule only handle llvm module. Actually, programNewFromLLVMFile is only used by clCreateProgramWithLLVMIntel, and I think it is useless, maybe we could delete it at all. V2: define errDiag beside #if/#endif. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
* Add missed kernel names into built-in kernel list.Yan Wang2017-06-221-1/+16
| | | | | Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Runtime: Add missing SKL deivce IDPan Xiuli2017-06-222-1/+9
| | | | | | | It seems we missed some newly added device ID for SKL. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Fix context leak with internal kernelsPatrick Beaulieu2017-06-161-1/+21
| | | | | | | Account for internal program ctx references in cl_context_delete Signed-off-by: Patrick Beaulieu <patrick.beaulieu@avigilon.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>