| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
| |
The backend SEL instruction could support bool type
since we change the bool representation to normal
S16 data type. Now let us remove this assertion
check.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
| |
"__ocl_math_fastpath_flag" was directly optimized out when compiling libocl under llvm 3.6
And set its initialization value after loading libocl.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is currently not possible to have 32- and 64-bit builds of beignet
installed on the same system, as the path in intel-beignet.icd
can only be one of the two installations. This fixes this by giving
this file a different name when beignet is installed in a multiarch
directory:
intel-beignet-i386-linux-gnu.icd -> /usr/lib/i386-linux-gnu/beignet/libCL.so
intel-beignet-x86_64-linux-gnu.icd -> /usr/lib/x86_64-linux-gnu/beignet/libCL.so
Discussion and possible alternative approaches:
http://lists.alioth.debian.org/pipermail/pkg-opencl-devel/Week-of-Mon-20150223/date.html
While preparing this patch I noticed that intel-beignet.icd.in
uses @LIB_INSTALL_DIR@/beignet/ rather than @BEIGNET_INSTALL_DIR@,
which will obviously break if the latter is set directly. Is
that a bug or is this intended to be an internal-only variable?
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Although opencl does not allow unaligned load/store of dword/qword,
LLVM still may generate such kind of instructions, especially
large integer load/store is legalized into load/store of qword with
possible unaligned address. The implementation is simple:
for store, bitcast d/q word to vector of bytes before writing out,
for load, load vector of bytes and then bitcast them to d/q word.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
| |
All the constant expressions should be expanded in prior to
gen writer pass.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous expand constant pass will not expand a constant
expression within a constant vector. So after adding the expand
constant pass, we still get some constant expressions at gen
writer pass and the worse case is there are some large integer
hid in those constant expressions which are not supported in
gen writer pass and will cause assertions.
This patch will identify those constant vectors and expand
all the possible constant expression elements.
v2:
minor fix including wording fix in commit log.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
|
|
|
|
|
| |
We should use this macro rather than @LIB_INSTALL_DIR@/beignet/.
Reported by Rebecca N. Palmer.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=779213
Summary: On hardware where the Intel GPU is disabled, beignet was found
to assert-fail on load, taking the application down with it before it
can do anything (including checking for hardware via clGetDeviceIDs).
This fixes this crash, allowing existing error handling to return
CL_DEVICE_NOT_FOUND, and the application to then try other ICDs until
it finds the right one for the hardware.
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
large integer type like i96 may be expanded to be low 64bit and high 32bit.
When it is cast to <i32 x 3>, we should first make the expanded type to be
of same type, here i32. insertelement could not insert element of different type.
Then we can do insertelement one by one to generate the <i32 x 3> vector.
This could fix the bug:
https://bugs.freedesktop.org/show_bug.cgi?id=89167
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This revert the following commit:
"Re-apply "improve the build performance of vector type built-in function.""
commitId: 06cce8178649759e12a3a353f0550189d371871b.
I finally decide to do this because although below kind of program has less
instructions and less compile-time, but it will also introduce extra memory access,
which would cause bad run-time performance if the loop is not unrolled. If the loop
is unrolled, it would be similar like scalarized version.
OVERLOADABLE float16 func (float16 param0)
{
union{
float va[16];
float16 vv16;
}uret;
union{
float pa[16];
float16 pv16;
}usrc0;
usrc0.pv16 = param0;
for(int i =0; i < 16; i++)
uret.va[i] = func(usrc0.pa[i]);
return uret.vv16;
}
I did some experiment on the affected built-in. I fixed the GPU frequency at 1050,
and increase input data to 862000. The result is like below (obviously the scalarized
version has better performance):
bultin_asinh_float16:
loop version: 200ms
scalarized version: 150ms
builtin_sinh_float16:
loop version: 250ms
scalarized version: 160ms
And also this patch would reduce the generation of large integer. Although we support
large integer legalization, I find sometime it is hard to legalize in very efficient way
like large integer LE/GT.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
using (void*)0 could not pass compilation in clang 3.6.
It will be treated as private address space pointer, if you compare
a global pointer with NULL, that is a private and global pointer
comparison, this is not allowed by OpenCL spec. But zero is allowed
as it is a pointer and integer comparison.
Detailed discussion, please read:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2015-February/041429.html
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
| |
Use strncpy to avoid overflow and need return errSize.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can change the image_channel_order to CL_RGBA and
image_channel_data_type to CL_UNSIGNED_INT32 for some special
case, thus 16 bytes can be read by one work item. Bandwidth is
fully used.
v2:
Now we just optimize for IMAGE2D, so add judgement to not affect
other image type's code path.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
| |
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
There are some changes from llvm3.5:
1. Some functions return std::unique_ptr instead of pointer.
2. MetaNode to Value and Value to MetaNode.
V2: Fix llvm3.5 build error.
V3: Print link and function materialize message.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
| |
llvm::Linker::LinkModules's define will be changed in llvm3.6, and LLVMLinkModules'
define is more stable, so use LLVMLinkModules to link.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
According to the API explanation, we should use exiting block instead of
latch block. llvm 3.6 place an assert on this.
v2:
Use latch block if it is the exiting block, else use exiting block.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
| |
Using the enum to avoid name conflict.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
| |
This is used to solve the integer bitwidth that is not power of two.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
| |
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
| |
The type may be float.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
| |
The DEBUG macro will try to link llvm::DebugFlag and llvm::isCurrentDebugType()
which is absent in release version of LLVM library. So define it to empty.
The problem occurs when building debug version of beignet against the
release version of llvm.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The pass is also from PNaCl. which lower large integer into smaller ones.
But different from previous legalize pass. It is easy to handle various
instructions like load or phi with large integer operand.
I find that instruction combine may make me hard to totally eliminate
empty block. As CFG simplify pass may generate switch-case when preventing
empty block. And Switch lower pass after CFG simplify may make empty block
still exist in Gen IR. So, I temporarily disable the empty block check in backend.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea is lower the constantExpr into an Instruction.
Fix the ptrtoInt and IntToPtr implementation, it simply maps to
a convert if type size not the same.
Fix a bitcast from integer to float issue. As we expand llvm::ConstantExpr
into llvm::Instruction. We will meet below situation.
%10 = bitcast i32 1073741824 to float
%11 = fcmp float %10 0.000000e+00
This will translated into GenIR:
%100 = loadi S32 1073741824
%101 = fcmp %100, 0.0f
In later instruction selection, we may directly getFloat() from %100
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
funny things may happen with usernames like
'asm', 'attribute', 'x86_64', 'i386', and so on
this breaks on usernames with non-alnum chars ('-', '.')
Signed-off-by: Andreas Beckmann <anbe@debian.org>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
| |
v2:
LLVM 3.3 is better than 3.4, switch the order of 3.4 and 3.3.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
| |
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the buffer has CL_MEM_ALLOC_HOST_PTR, the runtime
need to free the host_ptr at destructor. But if the buffer
is a subbuffer, then its host ptr is not allocated by itself,
we should not free it here. Otherwise, it may cause
some weird errors such as:
"corrupted double-linked list..".
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instead of cpu way.
Before this patch, cl_mem_kernel_copy_image do cpu memory copy in order
to copy image array objects. This is very slow for large image size.
This patch implement image array copy in cl way, which dramatically
accelerate image array related clEnqueueCopyImage.
clCopyImage case in OpenCL conformance test will not be blocked anymore.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
As previously noted, when cl_mem_allocate fails, its error handling
then calls cl_mem_delete on the incompletely-set-up buffer, which
aborts at assert(mem->ctx).
This patch appears to fix the problem, but be warned I don't know this
code well enough to know what else it might break.
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Xionghu Luo <xionghu.luo@intel.com>
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
|
|
|
| |
This document includes the steps of using libva buffer sharing and
the way to build and run corresponding example.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
clCreateImageFromLibvaIntel.
This example reads a source nv12 file to a VASurface, and creates a
target
VASurface. Then creates corresponding cl image objects from them. After
using ocl to do mirror effect post-processing on source VASurface,
target
VASurface is shown on screen by default.
Code of loading nv12 file to VASurface are referenced from
libva/test/encode/avcenc.c.
v2:
Delete 1920x1080.nv12 and 640x480.nv12 because of large size, add
256_128.nv12 as default test image.
v3:
1. Re-org files: add libva as a submodule then use display related
files.
2. Show result on screen by default instead of saving as a file.
3. Fix warnings.
v4: Fix whitespace format warnings.
v5:
1. Modify upload_nv12_to_surface to read a nv12 file and then upload
it to an NV12 VASurface. Also modify store_surface_to_nv12.
2. Change the cl post-processing kernel from gray effect to mirror
effect, which make demo cooler.
3. Minor fix of other problems.
v6: Remove unnecessary OUTPUT_NV12_DEFAULT related code.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The llvm will generate ir which has if.else block before
if.then block. We parse the printf statement before llvm_to_gen.
The later if-else analysis will reorder the if-else blocks.
This cause when we print out the result, we get the wrong message
from another printf statement.
Add printf index to the index buffer to record which one the result
belongs to, and so this bug is fixed.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
| |
Because of the HW limitation, vertical stride is at
least aligned to 2. For 1D array image, the data has interval.
The image size is just twice as big as the buffer size we think.
Use clEnqueueWriteImage is safe and fix this bug.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2:
Add comment from ruiling:
or dead loop, it has an unconditional branch at its end.
Simply do not treat it as a loop is also acceptable.
I ran into this problem when I execute
./opencv_test_imgproc --gtest_filter=OCL_Imgproc/HoughLines.RealImage/0
And it fix the problem.
Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
|
|
| |
Gen8 use GEN_TYPE_HF, we need to let getTypeSize support it.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: Zhu Bingbing <bingbingx.zhu@intel.com>
|
|
|
|
|
|
|
|
| |
ulong and uint64_t have different size on i386 and
i386_64, which cause the test case failure.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On IVB and HSW, When dst spans two registers, the source MUST
span two registers. So the following instructions:
mov (16) r104.0<2>:uw r126.0<8;8,1>:uw { Align1, H1 }
mov (16) r104.1<2>:uw r111.0<8;8,1>:uw { Align1, H1 }
mov (16) r106.0<2>:uw r110.0<8;8,1>:uw { Align1, H1 }
mov (16) r106.1<2>:uw r109.0<8;8,1>:uw { Align1, H1 }
are illegal.
Add the check to split instruction into 2 SIMD8 instructions here.
TODO:
These instructions are allowed on BDW, need to improve.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
| |
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
| |
the limitation is loosed from page size to cache line size
alignment inside driver, update utest accordingly.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
| |
the current limitation is both host_ptr and buffer size should be
page aligned, loose the limitation of host_ptr to be cache line
size (64byte) alignment, and no limitation for the size.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
| |
the correct value of cache line size is 64 bytes, not 128.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
| |
We need to pass correct popcount source type to backend.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some instructions modify the ACC register in the gen_context
stage which's not regonized by current instruction scheduling
algorithm. This patch fix this bug by checking all the possible
SEL_OPs which may change the ACC implicitly.
The corresponding bugzilla link is as below:
https://bugs.freedesktop.org/show_bug.cgi?id=88587
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|