| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
|
|
| |
We will defer the fix of this known issue to 1.1.0. Let's document it
before that.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
|
|
| |
For HSW platform, due to the atomic in L3 related registers' usage,
we always need to disable the cmd parser.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
| |
Although the recommended LLVM version is 3.5, Beignet still support to
build with LLVM 3.3/3.4.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: Mengmeng Meng <mengmeng.meng@intel.com>
|
|
|
|
|
|
|
| |
In uniform mode, we should set simd width to 1 and set noMask bit.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
| |
Using the inserPos is good enough. If using --insertPos, there
is one potential issue when the insertPos is the head of a list
then it will trigger an assertion.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we want to delete an old instruction and replace it with the new one,
we only call the LLVM IR's replace function which is not sufficient for
the scalarize pass, as we also keep some local reference int eh vecVals
map. We need to replace all of those local reference also.
Otherwise, the deleted values may be used in the subsequent instructions
which causes fatal error latter.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Due to the private constant buffer support, it introduces private address
space mixed with constant address space some time. And more generic, one
constant address space may be located in private address space in LLVM IR
layer. Such as the following code:
__kernel ...
{
const int2 foo[] = {{0, 1}, {2, 3}};
int2 data = foo[get_global_id(0) % 2];
}
The foo is in private address space but we finally will use __constant bti
to access it in Gen backend. The the above code will cause a assertion fail
in gen insturcion selection stage, because it generate a vector loading
instruction on a __constant buffer.
So we should use the actual BTI data to determine one pointer's address space
rather than get it from the LLVM IR layer.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
If the front end label ip exceed 0xffff, then the backend will
use real DW to represent each block's IP address. This is
a dynamic behaviour according to the actual front end's label
number.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
| |
The front end label is still 16 bit. But the auxiliary
label could be larger than that. This is the preparation
to support 32 bit label for both front end and backend.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
| |
For some extremly large kernel, these values may be larger than
0xFFFF, we have to extend them to 32 bit.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
| |
Should not use hard coded uint16_t for safe type definition.
Prepare to extend some types to uint32_t.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
| |
Beignet uses drm_intel_gem_bo_context_exec() to flush command queue to
linux drm driver layer. We need to check the return value of that function,
as it may fail when the application uses very large array.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
All the gen registers should get the uniform information from the
corresponding virtual registers. The use of GenRegister::xxxgrf
on a virtual register is very dangerous which may cause inconsistency.
This patch eliminate all the use of it in gen_insn_selection stage.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
| |
Otherwise, it may cause segfault in instruction encoding stage.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CL_COMPLETE + thread safety for callbacks
When trying to register a callback on the clEnqueueReadBuffer command, since it is processed
synchroniously all the time, the command was marked CL_COMPLETE every time. If the event returned
by clEnqueueReadBuffer was then used to register a callback function, the callback function did
no check to execute it if nessary.
Modified the handling of the callback registration in cl_set_event_callback to only call the callback being created if it's status is already reached.
Added thread safety measures for pfn_notify calls since the status value can be changed while executing the callback.
Grouped the pfn_notify calls to a unified function cl_event_call_callback that handles thread safety: it queues callbacks in a node list while under the protection of pthread_mutex and then calls the callbacks outside of the pthread_mutex (this is required because the callback can deadlock if it calls a cl_api function that uses the mutex)
Signed-off-by: David Couturier <david.couturier@polymtl.ca>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
| |
If /dev/dri/cardX is inaccessible, return CL_DEVICE_NOT_FOUND,
don't assert-fail.
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
| |
As beignet now works with LLVM/Clang 3.6, accept this version
when searching for llvm-config.
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
| |
compile the OpenCL standard library with the same version of clang
as will compile OpenCL user code, not plain "clang" (i.e. the
system default version, which may be different).
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
As presently written, a 'make package' will attempt to INSTALL the
Beignet ICD loader to /etc/OpenCL/vendors whereas it should just
do a local install and then package the file. The proposed change instructs
CPack to include the `DESTDIR` variable when it calls `make install`, thus
directing the desination for the ICD loader to a local directory instead
of a system path.
Signed-off-by: Brian Kloppenborg <brian@arrayfire.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
llvm 3.6 will give a UNDEF value for NAN. The will cause
the store instruction for UNDEF to be ignored. We need
to modify it to NAN here.
Comments from Zhigang:
"
The related commit of why LLVM won't just simply return NaN for such
case is at:
Make the sqrt intrinsic return undef for a negative input.
As discussed here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140609/220598.html
And again here:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/077168.html
The sqrt of a negative number when using the llvm intrinsic is undefined.
We should return undef rather than 0.0 to match the definition in the LLVM IR lang ref.
This change should not affect any code that isn't using "no-nans-fp-math";
ie, no-nans is a requirement for generating the llvm intrinsic in place of a sqrt function call.
Unfortunately, the behavior introduced by this patch will not match current gcc, xlc, icc, and
possibly other compilers. The current clang/llvm behavior of returning 0.0 doesn't either.
We knowingly approve of this difference with the other compilers in an attempt to flag code
that is invoking undefined behavior.
A front-end warning should also try to convince the user that the program will fail:
http://llvm.org/bugs/show_bug.cgi?id=21093
Differential Revision: http://reviews.llvm.org/D5527
This patch is a workaround for the following scenario:
printf("%f \n", sqrt(-1.0f));
"
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Accordying to OpenCL 1.2 Rev 17:
"CL_KERNEL_ARG_TYPE_CONST is returned if the argument is a pointer and the referenced type is declared with the restrict or const qualifier. For
example, a kernel argument declared as global int const *x returns CL_KERNEL_ARG_TYPE_CONST but a kernel argument declared as global int *
const x does not."
So only need to return CL_KERNEL_ARG_TYPE_CONST for pointer arguments.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: "Weng, Chuanbo" <chuanbo.weng@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
On some distributions, the CMAKE_INSTALL_FULL_LIBDIR or CMAKE_LIBRARY_ARCHITECTURE
may be undefined. To avoid generate intel-beignet-.icd file name, we need to get
rid of the extra "-" for such case.
Reported by Igor Gnatenko.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
|
| |
And update document accordingly.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As constant propagation will introduce constantExpr and gep instruction,
I choose not to run constant propagation pass after RemoveGep pass.
So, here only generate Multiply as needed.
We may do such kind of optimization in Gen IR level in the future.
This could fix the performance regression introduced by:
"GBE: Import constantexpr lower pass from pNaCl"
to the opencv case:
opencv_perf_imgproc/OCL_BilateralFixture_Bilateral
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
| |
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: Meng Mengmeng <mengmeng.meng@intel.com>
|
|
|
|
|
|
| |
LLVM3.6 revert the c api LLVMLinkModules to LLVM3.5 last-minute. Consist with it.
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
As there may be some other LLVM users such as mesa, and they
may link to different LLVM library. To avoid such type of
conflicts, we use -Bsymbolic to disable the symbol preemption.
This patch should fix the build bug at:
https://bugs.freedesktop.org/show_bug.cgi?id=89325
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
| |
I found some optimization pass may add fastcall attribute to some
builtin functions. We need to add the corresponding support.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LLVM 3.6 may generate the following instructions:
%Pivot = icmp slt i1 %trunc49, false
when do siwth lowering pass.
To support it we must use GEN_TYPE_W to represent B rather
than GEN_TYPE_UW and we also need to remove the corresponding
assertions.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
The backend SEL instruction could support bool type
since we change the bool representation to normal
S16 data type. Now let us remove this assertion
check.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
| |
"__ocl_math_fastpath_flag" was directly optimized out when compiling libocl under llvm 3.6
And set its initialization value after loading libocl.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is currently not possible to have 32- and 64-bit builds of beignet
installed on the same system, as the path in intel-beignet.icd
can only be one of the two installations. This fixes this by giving
this file a different name when beignet is installed in a multiarch
directory:
intel-beignet-i386-linux-gnu.icd -> /usr/lib/i386-linux-gnu/beignet/libCL.so
intel-beignet-x86_64-linux-gnu.icd -> /usr/lib/x86_64-linux-gnu/beignet/libCL.so
Discussion and possible alternative approaches:
http://lists.alioth.debian.org/pipermail/pkg-opencl-devel/Week-of-Mon-20150223/date.html
While preparing this patch I noticed that intel-beignet.icd.in
uses @LIB_INSTALL_DIR@/beignet/ rather than @BEIGNET_INSTALL_DIR@,
which will obviously break if the latter is set directly. Is
that a bug or is this intended to be an internal-only variable?
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Although opencl does not allow unaligned load/store of dword/qword,
LLVM still may generate such kind of instructions, especially
large integer load/store is legalized into load/store of qword with
possible unaligned address. The implementation is simple:
for store, bitcast d/q word to vector of bytes before writing out,
for load, load vector of bytes and then bitcast them to d/q word.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
| |
All the constant expressions should be expanded in prior to
gen writer pass.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous expand constant pass will not expand a constant
expression within a constant vector. So after adding the expand
constant pass, we still get some constant expressions at gen
writer pass and the worse case is there are some large integer
hid in those constant expressions which are not supported in
gen writer pass and will cause assertions.
This patch will identify those constant vectors and expand
all the possible constant expression elements.
v2:
minor fix including wording fix in commit log.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
|
|
|
|
|
| |
We should use this macro rather than @LIB_INSTALL_DIR@/beignet/.
Reported by Rebecca N. Palmer.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=779213
Summary: On hardware where the Intel GPU is disabled, beignet was found
to assert-fail on load, taking the application down with it before it
can do anything (including checking for hardware via clGetDeviceIDs).
This fixes this crash, allowing existing error handling to return
CL_DEVICE_NOT_FOUND, and the application to then try other ICDs until
it finds the right one for the hardware.
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
large integer type like i96 may be expanded to be low 64bit and high 32bit.
When it is cast to <i32 x 3>, we should first make the expanded type to be
of same type, here i32. insertelement could not insert element of different type.
Then we can do insertelement one by one to generate the <i32 x 3> vector.
This could fix the bug:
https://bugs.freedesktop.org/show_bug.cgi?id=89167
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This revert the following commit:
"Re-apply "improve the build performance of vector type built-in function.""
commitId: 06cce8178649759e12a3a353f0550189d371871b.
I finally decide to do this because although below kind of program has less
instructions and less compile-time, but it will also introduce extra memory access,
which would cause bad run-time performance if the loop is not unrolled. If the loop
is unrolled, it would be similar like scalarized version.
OVERLOADABLE float16 func (float16 param0)
{
union{
float va[16];
float16 vv16;
}uret;
union{
float pa[16];
float16 pv16;
}usrc0;
usrc0.pv16 = param0;
for(int i =0; i < 16; i++)
uret.va[i] = func(usrc0.pa[i]);
return uret.vv16;
}
I did some experiment on the affected built-in. I fixed the GPU frequency at 1050,
and increase input data to 862000. The result is like below (obviously the scalarized
version has better performance):
bultin_asinh_float16:
loop version: 200ms
scalarized version: 150ms
builtin_sinh_float16:
loop version: 250ms
scalarized version: 160ms
And also this patch would reduce the generation of large integer. Although we support
large integer legalization, I find sometime it is hard to legalize in very efficient way
like large integer LE/GT.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|