summaryrefslogtreecommitdiff
path: root/backend/src/llvm
Commit message (Collapse)AuthorAgeFilesLines
* Add preliminary LLVM 7 supportRebecca N. Palmer2018-08-203-1/+9
| | | | | | | | | | | | | | | This is preliminary because LLVM 7 has not been released yet: it was tested with the snapshot from Debian experimental (svn336894). 1.Change linking order, as clangCodeGen now links to clangFrontend 2.Pass references not pointers to WriteBitcodeToFile and CloneModule 3.Add the headers that LoopSimplifyID, LCSSAID and some create*Pass have moved to 4.Define our DEBUG whether or not we just undefined LLVM's (theirs is now LLVM_DEBUG, but we never actually use it) Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Add LLVM 6.0 supportRebecca N. Palmer2018-08-202-1/+7
| | | | | | | | LLVMContext::setDiagnosticHandler and LoopInfo::markAsRemoved have been renamed. Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* GBE: remove static context to fix Segmentation fault.Yang Rong2017-09-212-11/+0
| | | | | | | | | | | | | If application has static clProgram, when application exit, the static context has been deleted before delete static clProgram will cause segmentation fault. As the global static context is just for link, use the individual context of each llvm module, when link the llvm module, generate the new llvm module from src. V2: fix llvm 3.8 build error and CleanLlvmResource delete bug. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* GBE: enable llvm5.0 support.Yang Rong2017-09-216-33/+87
| | | | | | | | | | | 1. getOrInsertFunction without nullptr. 2. handle f16 rounding. 3. remove llvm value dump. 4. handle AddrSpaceCastInst when parsing block info. V2: use stripPointerCasts instead of BitCast and AddrSpaceCast. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* backend: Fix a bug in load-store optimization.Song, Ruiling2017-07-271-25/+46
| | | | | | | | when we are merging STOREs, we should use the very last instruction as the insertion point. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* backend: refine load store optimizationSong, Ruiling2017-07-181-37/+88
| | | | | | | | | | | | | | | | | | | | | | this fix basic test in conformance tests failed for vec8 of char because of overflow. And it fix many test items failed in opencv because of offset error (1)modify the size of searchInsnArray to 32, it is the max size for char And add check for overflow if too many insn (2)Make sure the start insn is the first insn of searched array because if it is not the first, the offset maybe invalid. And it is complex to modify offset without error V2: refine search index, using J not I V3: remove (2), now add offset to the pointer of start pass OpenCV, conformance basic and compiler tests, utests V4: check pointer type, if 64bit, modify it by 64, or 32 V5: refine findSafeInstruction() and variable naming in findConsecutiveAccess(). Signed-off-by: rander.wang <rander.wang@intel.com> Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Implement extension cl_intel_device_side_avc_motion_estimation.Chuanbo Weng2017-07-123-0/+38
| | | | | | | | | | | | | | | | | | This patch mainly contains: 1. built-in function __gen_ocl_ime implementation. 2. Lots of built-in functions of cl_intel_device_side_avc_motion_estimation are implemented. 3. This extension is required to run in simd16 mode. v2: move the utests to seprate patches one by one; as all the utests has extension function check, no need to put them in stand alone utest; uncomment the self test; fix extension check logic issue, should be && instead of ||. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Signed-off-by: Xionghu Luo <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* backend: refine load/store merging algorithmrander2017-06-231-9/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now it works for sequence: load(0), load(1), load(2) but it cant work for load(2), load(0), load(1). because it compared the last merged load and the new one not all the loads for sequence: load(0), load(1), load(2). the load(0) is the start, can find that load(1) is successor without space, so put it to a merge fifo. then the start is moving to the top of fifo load(1), and compared with load(2). Also load(2) can be merged for load(2), load(0), load(1). load(2) cant be merged with load(0) for a space between them. So skip load(0) and mov to next load(1).And this load(1) can be merged. But it never go back merge load(0) Now change the algorithm. (1) find all loads maybe merged arround the start by the distance to the start. the distance is depended on data type, for 32bit data, the distance is 4. Put them in a list (2) sort the list by the distance from the start. (3) search the continuous sequence including the start to merge V2: (1)refine the sort and compare algoritm. First find all the IO in small offset compared to start. Then call std:sort (2)check the number of candidate IO to be favorable to performance for most cases there is no chance to merge IO Signed-off-by: rander.wang <rander.wang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* GBE: clean llvm module's clone and release.Yang, Rong R2017-06-233-19/+5
| | | | | | | | | | | | | | | There are some changes: 1. Clone the module before call LLVMLinkModules2, remove other clones for it. 2. Don't delete module in function llvmToGen. 3. Add a function programNewFromLLVMFile so genProgramNewFromLLVM and buildFromLLVMModule only handle llvm module. Actually, programNewFromLLVMFile is only used by clCreateProgramWithLLVMIntel, and I think it is useless, maybe we could delete it at all. V2: define errDiag beside #if/#endif. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
* Backend: Add intel_reqd_sub_group_size supportPan Xiuli2017-06-161-0/+24
| | | | | | | | | | If we get intel_reqd_sub_group_size attribute from frontend then set it to backend. V2: Refine the codeGenNum with runtime caclculate and fail the build if the size from frontend is illegal. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* keep GEN IR as SSA styleGuo Yejun2017-06-091-3/+5
| | | | | Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Backend: Fix performance regression with sampler refine fro LLVM40Pan Xiuli2017-05-181-4/+37
| | | | | | | | | | | After the refine we can not know if a sampler is a constant initialized or not. Then the compiler optimization for constant sampler will break and we will runtime decide which SAMPLE instruction will use. Now fix the sampler refine for LLVM40 to enable the constant check. V2: Fix a typo of function __gen_ocl_sampler_to_int type. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Backend: Fix llvm40 assert about literal structsPan Xiuli2017-05-181-1/+2
| | | | | | | In llvm literal structs have no name, so check it first. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
* GBE: set memcpy and memset functions's linkage to LinkOnceAnyLinkage at last ↵Yang, Rong R2017-05-153-7/+14
| | | | | | | | | | call. LLVM IR pass will produce memcpy and memset, if set LinkOnceAnyLinkage, memcpy and memset will be delete before and cause fail. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
* Backend: add double support to ↵rander2017-04-172-0/+16
| | | | | | | | | | convert_uchar|char|short|ushort|int|uint|long|ulong_sat(double x) HW support Double to int16, int32 from IVB, others done by software. Double to int64 is supported by BWD+, now skip it and refine it later Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
* Backend: Add LLVM40 supportPan Xiuli2017-04-1313-18/+145
| | | | | | | | | | | | | | | | 1.Refine APFloat fltSemantics. 2.Refine bitcode read/write header. 3.Refine clang invocation. 4.Refine return llvm::error handler. 5.Refine ilist_iterator usage. 6.Refine CFG Printer pass manager. 7.Refine GEP with pointer type changing. 8.Refine libocl 20 support V2: Add missing ocl_sampler.ll and ocl_sampler_20.ll file V3: Fix some build problem for llvm36 Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Backend: Refine FCmp one and unePan Xiuli2017-04-131-4/+6
| | | | | | | | | | | | | | | | llvm will merge: %1 = fcmp olt %a, %b %2 = fcmp ogt %a, %b %dst = or %1, %2 into %dst = fcmp one %a, %b And own CMP.NE is actually une so refine Fcmp one into CMP.LT and CMP.GT and OR Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Backend: Refine LLVM version check macroPan Xiuli2017-04-1314-89/+89
| | | | | | | | LLVM 4.0 is coming, we should refine our version check to fit the LLVM_MAJOR_VERSION bump to 4. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Backend: Refine GEP lowering codePan Xiuli2017-04-133-16/+30
| | | | | | | | Pointer is not as like as array or vector, we should handle it in a standalone path to fit furture change about PointerType inheritance. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Backend: Fix an include file error problemPan Xiuli2017-04-132-2/+2
| | | | | | | | We should not include any llvm header in ir unit, and we need add missing headers for proliling after deleting llvm headers. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Backend: Remove old llvm support code.Pan Xiuli2017-04-134-54/+0
| | | | | | | | LLVM 3.3 or older is not supportted by Beignet now, and we need delete these codes. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Properly check return value from __cxa_demangleJan Beich2017-03-231-2/+2
| | | | | | | | | | | FreeBSD uses libcxxrt (via libc++) instead of GNU libiberty (via libstdc++) for __cxa_demangle(). When *output_buffer* and *length* both are NULL it doesn't modify *status* on success. Rather than rely on maybe uninitialized variable check the function doesn't return NULL. Fixes: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213732 Signed-off-by: Jan Beich <jbeich@freebsd.org> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
* implement extension cl_intel_media_block_io WRITE related functionLuo Xionghu2017-03-133-1/+46
| | | | | | | | v2: use static fixBlockSize; no need set default width/height in IR level. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
* add extension cl_intel_media_block_io READ related functionLuo Xionghu2017-03-133-1/+99
| | | | | | | | v2: add #define intel_media_block_io in libocl; move extension check code to this patch; Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
* GBE: use shr instead of division as possible.Yang Rong2017-02-101-1/+12
| | | | | | | | GEN's div instruction need several cycles, use the shl instruction when divisor is pow of 2 constant. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* GBE: use shl instead of multiply as possible.Yang Rong2017-02-101-0/+19
| | | | | | | | i32 multiply and i64 multiply need several instructions, use the shl instruction when one source is pow of 2 constant. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* Fix typoRebecca N. Palmer2017-02-061-1/+1
| | | | | Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
* GBE: use shift for PowerOf2 size when lowering GEP.Ruiling Song2017-02-061-6/+13
| | | | | | | | | For 64bit address, the multiply would expand to several instructions. As for most time, the size is PowerOf 2. So we can use left-shift to do this. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* GBE: fix llvm3.5 version build error.Yang Rong2017-01-192-5/+8
| | | | | Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
* GBE: Fix getTypesize bug with LLVM3.9Pan Xiuli2017-01-091-5/+6
| | | | | | | | We will check some type size but some of the type size have change name in LLVM3.9, change the check to fit the type name now. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* GBE: fix a mix analyze bug.Yang Rong2017-01-091-2/+4
| | | | | | | | When update pointerOrigMap, only non-select and non-phi insn need update second[0], if update select or phi's second[0], will over write the info. Signed-off-by: Yang Rong <rong.r.yang@intel.com>
* OCL20: handle device enqueue helper functions in the backend.Yang, Rong R2016-12-302-4/+44
| | | | | | | | | | Add useDeviceEnqueue to kernel to indicate the kernel use device enqueue or not. V2: Remove and correct debug info. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* OCL20: add device enqueue helper functions in backend.Yang, Rong R2016-12-303-0/+428
| | | | | | | | | | | This functions collect all block infos, convert unnamed call to named function call. Collect device enqueue's invoke functions and store them in the unit, set these functions to OpenCL kernel function. Because it change the module's kernel functions, so must called before link, otherwize, the built-in functions called in invoke functions may not be materialized. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
* GBE: correct the llvm.loop.unroll.enable meta.Yang Rong2016-12-301-9/+5
| | | | | | | | | | | | | LLVM check has meta llvm.loop.unroll.enable and llvm.loop.unroll.disable or not. If llvm.loop.unroll.disable and llvm.loop.unroll.enable are both set, llvm.loop.unroll.disable will override llvm.loop.unroll.enable. V2: don't add meta when not enable unroll, let llvm unroll pass to decide. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* Add the NULL pointer check.Yang Rong2016-12-291-0/+1
| | | | | Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* GEB/Runtime: eliminate release build warnings.Yang Rong2016-12-292-1/+3
| | | | | Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* Avoid possible invalid pointer by vector interator.Yan Wang2016-12-281-2/+2
| | | | | | | | | | | | "revisit" as vector containber will be pushed more elements in findPointerEsacape() and cause previous interator to introduce possible invalid pointer. When compiling huge kernel like blender, it will cause random segment fault crash. [] operator will be more safe. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* OCL20: Add generic address space memcpy and memset.Yang, Rong R2016-12-282-0/+24
| | | | | Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* OCL20: enable -cl-std=CL2.0.Yang, Rong R2016-12-285-10/+46
| | | | | | | | | | | When build from source, get the OpenCL version from the option. Use spir64 triple if it is OpenCL 2.0. Get the OpenCL version for llvm module's meta. If OpenCL version is 2.0, set the unit's point size to 64 bits before using unit.getPointerSize(). Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* GBE: remove image type's access qual from image type name.Yang, Rong R2016-12-281-0/+8
| | | | | | | OpenCL spec require type name don't include access qual, so remove it. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* GBE: don't use call->getCalledFunction() to decide the materialize function.Yang, Rong R2016-12-281-4/+4
| | | | | | | | | If the call inst is a bitcast value, call->getCalledFunction() will return NULL. Use the call->getCalledValue()->stripPointerCasts()->getName() to check. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* GBE: fix ctz fail.Yang, Rong R2016-12-281-1/+1
| | | | | | | LZD require ud type. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* GBE: reorder the LLVM pass to reduce the compilation time.Yang Rong2016-12-262-5/+7
| | | | | | | | | | | Set all function's linkage to LinkOnceAnyLinkage, then Inlining pass could delete the inlined functions. And reorder createFunctionInliningPass before createStripAttributesPass can reduce the compilation time significant, but haven't found the root casue. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Yan Wang <yan.wang@linux.intel.com>
* Restore jump threading pass for reducing compiling time when run the large ↵Yan Wang2016-12-141-1/+1
| | | | | | | | | | | | and complex kernel like Luxmark. Jump threading pass could optimize the connection between LLVM basic blocks of the function and provide the chance to merge and remove unnecessary basic blocks to reduce the compilation time and ASM code size. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Fix getting bitwidth of PointerType of LLVM.Yan Wang2016-11-301-1/+1
| | | | | | | | | PointerType could not be forced to IntegerTyoe for getting bitwidth. With Rong's comments, use getTypeBitSize() instead of Type::getIntegerBitWidth(). Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* GBE: Refine program scope variable logic.Ruiling Song2016-11-081-21/+33
| | | | | | | | | | The program scope variable may appear in Module's global list as unordered. So I choose to split the program scope logic into two passes. The first pass will just create these constants. The second pass to initialize the data. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
* GBE: handle ConstantExpr in program-scope variable handling.Ruiling Song2016-11-081-1/+44
| | | | | | | | | although we have eliminate ConstantExpr in llvm instructions, but in program scope variable, we still meet ConstantExpr. So, we handle it here. also enhance the test case to hit it. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
* GBE: don't try to erase a llvm:Constant.Ruiling Song2016-11-081-1/+3
| | | | | | | | | | If it is a llvm::Constant, don't add it to erase candidate. The reason I make this change is it cannot cast to llvm::Instruction. I think we still need to make clear whether or not we should delete a constant and how. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
* GBE: add ocl 2.0 work_group_barrier support.Ruiling Song2016-11-082-3/+27
| | | | | | | | | | | | | | | | to do an image barrier, we need to: 1. flush L3 RW cache. 2. do a barrier gateway. 3. flush sampler cache. Note the fence argument maybe ORed together. We need to support non-immediate barrier() argument in future. v2: change syncField to 6, and modify syncStr. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
* GBE: Fix type mismatch bug.Ruiling Song2016-11-081-1/+2
| | | | | | | the move instruction should have same type src & dst. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>