| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is preliminary because LLVM 7 has not been released yet:
it was tested with the snapshot from Debian experimental (svn336894).
1.Change linking order, as clangCodeGen now links to clangFrontend
2.Pass references not pointers to WriteBitcodeToFile and CloneModule
3.Add the headers that LoopSimplifyID, LCSSAID and
some create*Pass have moved to
4.Define our DEBUG whether or not we just undefined LLVM's
(theirs is now LLVM_DEBUG, but we never actually use it)
Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
| |
LLVMContext::setDiagnosticHandler and LoopInfo::markAsRemoved
have been renamed.
Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If application has static clProgram, when application exit, the static
context has been deleted before delete static clProgram will cause
segmentation fault.
As the global static context is just for link, use the individual context
of each llvm module, when link the llvm module, generate the new llvm
module from src.
V2: fix llvm 3.8 build error and CleanLlvmResource delete bug.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
1. getOrInsertFunction without nullptr.
2. handle f16 rounding.
3. remove llvm value dump.
4. handle AddrSpaceCastInst when parsing block info.
V2: use stripPointerCasts instead of BitCast and AddrSpaceCast.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
| |
when we are merging STOREs, we should use the very last instruction
as the insertion point.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this fix basic test in conformance tests failed for vec8 of char because
of overflow. And it fix many test items failed in opencv because of offset error
(1)modify the size of searchInsnArray to 32, it is the max size for char
And add check for overflow if too many insn
(2)Make sure the start insn is the first insn of searched array
because if it is not the first, the offset maybe invalid. And
it is complex to modify offset without error
V2: refine search index, using J not I
V3: remove (2), now add offset to the pointer of start
pass OpenCV, conformance basic and compiler tests, utests
V4: check pointer type, if 64bit, modify it by 64, or 32
V5: refine findSafeInstruction() and variable naming in
findConsecutiveAccess().
Signed-off-by: rander.wang <rander.wang@intel.com>
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch mainly contains:
1. built-in function __gen_ocl_ime implementation.
2. Lots of built-in functions of cl_intel_device_side_avc_motion_estimation
are implemented.
3. This extension is required to run in simd16 mode.
v2: move the utests to seprate patches one by one;
as all the utests has extension function check, no need to put them
in stand alone utest;
uncomment the self test;
fix extension check logic issue, should be && instead of ||.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Signed-off-by: Xionghu Luo <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now it works for sequence: load(0), load(1), load(2)
but it cant work for load(2), load(0), load(1). because
it compared the last merged load and the new one not all
the loads
for sequence: load(0), load(1), load(2). the load(0) is the
start, can find that load(1) is successor without space, so
put it to a merge fifo. then the start is moving to the top
of fifo load(1), and compared with load(2). Also load(2) can be merged
for load(2), load(0), load(1). load(2) cant be merged with
load(0) for a space between them. So skip load(0) and mov to next
load(1).And this load(1) can be merged. But it never go back merge
load(0)
Now change the algorithm.
(1) find all loads maybe merged arround the start by the distance to
the start. the distance is depended on data type, for 32bit data, the
distance is 4. Put them in a list
(2) sort the list by the distance from the start.
(3) search the continuous sequence including the start to merge
V2: (1)refine the sort and compare algoritm. First find all the IO
in small offset compared to start. Then call std:sort
(2)check the number of candidate IO to be favorable to performance
for most cases there is no chance to merge IO
Signed-off-by: rander.wang <rander.wang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are some changes:
1. Clone the module before call LLVMLinkModules2, remove other
clones for it.
2. Don't delete module in function llvmToGen.
3. Add a function programNewFromLLVMFile so genProgramNewFromLLVM
and buildFromLLVMModule only handle llvm module. Actually,
programNewFromLLVMFile is only used by clCreateProgramWithLLVMIntel,
and I think it is useless, maybe we could delete it at all.
V2: define errDiag beside #if/#endif.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
If we get intel_reqd_sub_group_size attribute from frontend then set it
to backend.
V2: Refine the codeGenNum with runtime caclculate and fail the build if
the size from frontend is illegal.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
| |
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
After the refine we can not know if a sampler is a constant initialized
or not. Then the compiler optimization for constant sampler will break
and we will runtime decide which SAMPLE instruction will use.
Now fix the sampler refine for LLVM40 to enable the constant check.
V2: Fix a typo of function __gen_ocl_sampler_to_int type.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
| |
In llvm literal structs have no name, so check it first.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
call.
LLVM IR pass will produce memcpy and memset, if set LinkOnceAnyLinkage,
memcpy and memset will be delete before and cause fail.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
convert_uchar|char|short|ushort|int|uint|long|ulong_sat(double x)
HW support Double to int16, int32 from IVB, others done by software.
Double to int64 is supported by BWD+, now skip it and refine it later
Signed-off-by: rander <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1.Refine APFloat fltSemantics.
2.Refine bitcode read/write header.
3.Refine clang invocation.
4.Refine return llvm::error handler.
5.Refine ilist_iterator usage.
6.Refine CFG Printer pass manager.
7.Refine GEP with pointer type changing.
8.Refine libocl 20 support
V2: Add missing ocl_sampler.ll and ocl_sampler_20.ll file
V3: Fix some build problem for llvm36
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
llvm will merge:
%1 = fcmp olt %a, %b
%2 = fcmp ogt %a, %b
%dst = or %1, %2
into
%dst = fcmp one %a, %b
And own CMP.NE is actually une so refine Fcmp one into CMP.LT and CMP.GT
and OR
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
| |
LLVM 4.0 is coming, we should refine our version check to fit the
LLVM_MAJOR_VERSION bump to 4.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
| |
Pointer is not as like as array or vector, we should handle it in a
standalone path to fit furture change about PointerType inheritance.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
| |
We should not include any llvm header in ir unit, and we need add
missing headers for proliling after deleting llvm headers.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
| |
LLVM 3.3 or older is not supportted by Beignet now, and we need delete
these codes.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
FreeBSD uses libcxxrt (via libc++) instead of GNU libiberty (via
libstdc++) for __cxa_demangle(). When *output_buffer* and *length*
both are NULL it doesn't modify *status* on success. Rather than rely
on maybe uninitialized variable check the function doesn't return NULL.
Fixes: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213732
Signed-off-by: Jan Beich <jbeich@freebsd.org>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
|
|
|
|
|
|
| |
v2: use static fixBlockSize; no need set default width/height in IR
level.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
|
|
|
|
|
|
| |
v2: add #define intel_media_block_io in libocl; move extension check
code to this patch;
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
|
|
|
|
|
|
| |
GEN's div instruction need several cycles, use the shl
instruction when divisor is pow of 2 constant.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
| |
i32 multiply and i64 multiply need several instructions, use the shl
instruction when one source is pow of 2 constant.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
| |
Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
|
|
|
|
|
|
|
| |
For 64bit address, the multiply would expand to several instructions.
As for most time, the size is PowerOf 2. So we can use left-shift to
do this.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
| |
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
|
|
|
|
|
|
| |
We will check some type size but some of the type size have change name
in LLVM3.9, change the check to fit the type name now.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
| |
When update pointerOrigMap, only non-select and non-phi insn need
update second[0], if update select or phi's second[0], will over
write the info.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
Add useDeviceEnqueue to kernel to indicate the kernel use device
enqueue or not.
V2: Remove and correct debug info.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This functions collect all block infos, convert unnamed call to named function
call. Collect device enqueue's invoke functions and store them in the unit,
set these functions to OpenCL kernel function.
Because it change the module's kernel functions, so must called before link,
otherwize, the built-in functions called in invoke functions may not be materialized.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LLVM check has meta llvm.loop.unroll.enable and
llvm.loop.unroll.disable or not.
If llvm.loop.unroll.disable and llvm.loop.unroll.enable are both set,
llvm.loop.unroll.disable will override llvm.loop.unroll.enable.
V2: don't add meta when not enable unroll, let llvm unroll pass to
decide.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
| |
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
| |
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
"revisit" as vector containber will be pushed more elements in
findPointerEsacape() and cause previous interator to introduce
possible invalid pointer.
When compiling huge kernel like blender, it will cause random
segment fault crash.
[] operator will be more safe.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
| |
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
When build from source, get the OpenCL version from the option. Use
spir64 triple if it is OpenCL 2.0.
Get the OpenCL version for llvm module's meta. If OpenCL version is
2.0, set the unit's point size to 64 bits before using
unit.getPointerSize().
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
|
|
| |
OpenCL spec require type name don't include access qual, so remove it.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
|
| |
If the call inst is a bitcast value, call->getCalledFunction() will
return NULL. Use the call->getCalledValue()->stripPointerCasts()->getName()
to check.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
|
|
| |
LZD require ud type.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Set all function's linkage to LinkOnceAnyLinkage, then Inlining pass
could delete the inlined functions.
And reorder createFunctionInliningPass before createStripAttributesPass
can reduce the compilation time significant, but haven't found the root
casue.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Yan Wang <yan.wang@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
and complex kernel like Luxmark.
Jump threading pass could optimize the connection between LLVM
basic blocks of the function and provide the chance to merge and
remove unnecessary basic blocks to reduce the compilation time and
ASM code size.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
| |
PointerType could not be forced to IntegerTyoe for getting bitwidth.
With Rong's comments, use getTypeBitSize() instead of
Type::getIntegerBitWidth().
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
The program scope variable may appear in Module's global list as
unordered. So I choose to split the program scope logic into two
passes. The first pass will just create these constants. The second
pass to initialize the data.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
|
|
|
|
|
|
|
| |
although we have eliminate ConstantExpr in llvm instructions,
but in program scope variable, we still meet ConstantExpr.
So, we handle it here. also enhance the test case to hit it.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
If it is a llvm::Constant, don't add it to erase candidate.
The reason I make this change is it cannot cast to llvm::Instruction.
I think we still need to make clear whether or not we should
delete a constant and how.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to do an image barrier, we need to:
1. flush L3 RW cache.
2. do a barrier gateway.
3. flush sampler cache.
Note the fence argument maybe ORed together.
We need to support non-immediate barrier() argument in future.
v2:
change syncField to 6, and modify syncStr.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
|
|
|
|
|
| |
the move instruction should have same type src & dst.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|