summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* build: use EXECUTE_PROCESS to replace the deprecated EXEC_PROGRAM.Release_v1.0Zhigang Gong2015-05-262-8/+8
| | | | | | | | | | | | | | | I found EXEC_PROMGRAM may truncate the output variable for some case thus we may get incorrect generated unit test cases thus break the configuration some times. This patch use EXECUTE_PROCESS to replace all the deprecated EXEC_PROGRAM and it will not truncate the output variable of the command. v2: fix the error in examples/CMakeLists.txt. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
* utests: fix test case builtin_tgamma.Rebecca N. Palmer2015-05-191-3/+16
| | | | | | | | | | Compare with tgamma instead of tgammaf for better accuracy. Include negative inputs, and handle the resulting denormals. Print maximum error found. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Allow building with Python 3Rebecca N. Palmer2015-05-192-12/+14
| | | | | | | | | Make the build scripts work in both Python 2 and Python 3. (CMake prefers Python 2 if both are available, but will use Python 3 if only it is installed.) Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Make tgamma meet the accuracy standard.Rebecca N. Palmer2015-05-191-7/+89
| | | | | | | | | | | | The old tgamma=exp(lgamma) implementation had high rounding error on large outputs, exceeding the 16ulp specification for approx. x>8 (hence the test failure in strict conformance mode). Replace this with an implementation based on glibc's http://sources.debian.net/src/glibc/2.19-17/sysdeps/ieee754/flt-32/e_gammaf_r.c/ Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
* utest_pow: don't fail on declared lack of denormals.Rebecca N. Palmer2015-05-191-2/+8
| | | | | | | | | 0.01**20.5 is denormal; at least Ivy Bridge does not support denormals and hence returns 0. As this is allowed by the OpenCL standard, it shouldn't fail the test. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
* Docs: update/clarify Haswell issuesRebecca N. Palmer2015-05-192-19/+35
| | | | | | | Reflect recent beignet and Linux changes. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Add a sanity test in clGetDeviceIDsRebecca N. Palmer2015-05-192-0/+84
| | | | | | | | | | | | | | Run a small __local-using kernel in clGetDeviceIDs; if this returns the wrong result, return CL_DEVICE_NOT_FOUND. As far as I can see, there's no way to tell in advance (except unreliably with a global version check) whether __local-using batches will be accepted...so the easiest solution is probably to just try running one and see what result we get. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* correct the src output of alu3 when OCL_OUTPUT_ASM=1Guo Yejun2015-05-191-3/+12
| | | | | Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
* GBE: Fix the immediate data typeRuiling Song2015-05-192-7/+7
| | | | | Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
* Remove some LGPL incompatible code.Zhigang Gong2015-05-194-1447/+0
| | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* Bump to 1.0.3.Release_v1.0.3Zhigang Gong2015-04-172-1/+4
| | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* Doc: update known issue for the store/load pointer issue.Zhigang Gong2015-04-151-0/+11
| | | | | | | We will defer the fix of this known issue to 1.1.0. Let's document it before that. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* Doc: update cmd parser issue for HSW platforms.Zhigang Gong2015-04-141-5/+8
| | | | | | | For HSW platform, due to the atomic in L3 related registers' usage, we always need to disable the cmd parser. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* GBE: should initialize useDWLabel to false by default.Zhigang Gong2015-04-141-1/+1
| | | | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* Doc: add a command to install dependencies.Zhigang Gong2015-04-141-0/+6
| | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* utests: don't continue to run any case when fail to initialize device.Zhigang Gong2015-04-141-1/+5
| | | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* runtime: don't try to open nonexistent render nodes or device files.Zhigang Gong2015-04-141-1/+4
| | | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* CMake: allow to build with LLVM 3.3/3.4.Zhigang Gong2015-04-141-2/+1
| | | | | | | | Although the recommended LLVM version is 3.5, Beignet still support to build with LLVM 3.3/3.4. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Tested-by: Mengmeng Meng <mengmeng.meng@intel.com>
* GBE: fix a bug in byte scatter write.Zhigang Gong2015-04-131-5/+11
| | | | | | | In uniform mode, we should set simd width to 1 and set noMask bit. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* GBE: fix an potential assertion in constant expanding pass.Zhigang Gong2015-04-131-1/+1
| | | | | | | | | Using the inserPos is good enough. If using --insertPos, there is one potential issue when the insertPos is the head of a list then it will trigger an assertion. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
* GBE: correct the instruction replacement logic in scalarize pass.Zhigang Gong2015-04-131-9/+34
| | | | | | | | | | | | | When we want to delete an old instruction and replace it with the new one, we only call the LLVM IR's replace function which is not sufficient for the scalarize pass, as we also keep some local reference int eh vecVals map. We need to replace all of those local reference also. Otherwise, the deleted values may be used in the subsequent instructions which causes fatal error latter. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* GBE: refine error handling for private libva buffer sharing extension.Zhigang Gong2015-04-132-0/+11
| | | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* GBE: Use actual bti information to determine a pointer's addressspace.Zhigang Gong2015-04-135-8/+23
| | | | | | | | | | | | | | | | | | | | | | | Due to the private constant buffer support, it introduces private address space mixed with constant address space some time. And more generic, one constant address space may be located in private address space in LLVM IR layer. Such as the following code: __kernel ... { const int2 foo[] = {{0, 1}, {2, 3}}; int2 data = foo[get_global_id(0) % 2]; } The foo is in private address space but we finally will use __constant bti to access it in Gen backend. The the above code will cause a assertion fail in gen insturcion selection stage, because it generate a vector loading instruction on a __constant buffer. So we should use the actual BTI data to determine one pointer's address space rather than get it from the LLVM IR layer. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* GBE: Extend front label ip to 32 bit on demand.Zhigang Gong2015-04-1310-29/+128
| | | | | | | | | | If the front end label ip exceed 0xffff, then the backend will use real DW to represent each block's IP address. This is a dynamic behaviour according to the actual front end's label number. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* GBE: don't type cast register/labelindex to integer.Zhigang Gong2015-04-133-33/+32
| | | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* GBE: extend backend label to 32 bit.Zhigang Gong2015-04-133-15/+15
| | | | | | | | | The front end label is still 16 bit. But the auxiliary label could be larger than that. This is the preparation to support 32 bit label for both front end and backend. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* GBE: extend registers/tuples/immediates to 32bit wide.Zhigang Gong2015-04-137-29/+30
| | | | | | | | For some extremly large kernel, these values may be larger than 0xFFFF, we have to extend them to 32 bit. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* GBE: fix safe type definition.Zhigang Gong2015-04-131-1/+1
| | | | | | | | Should not use hard coded uint16_t for safe type definition. Prepare to extend some types to uint32_t. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* strip unsupported attributes and calling conventions.Zhigang Gong2015-04-136-4/+134
| | | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* runtime: Enhance the error handling when flush gpgpu command queue.Zhigang Gong2015-04-109-30/+30
| | | | | | | | | Beignet uses drm_intel_gem_bo_context_exec() to flush command queue to linux drm driver layer. We need to check the return value of that function, as it may fail when the application uses very large array. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* GBE: avoid to use the GenRegister::xxxgrf(simdWidth,xxx).Zhigang Gong2015-04-101-47/+46
| | | | | | | | | | All the gen registers should get the uniform information from the corresponding virtual registers. The use of GenRegister::xxxgrf on a virtual register is very dangerous which may cause inconsistency. This patch eliminate all the use of it in gen_insn_selection stage. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* GBE: correct some temporary virtual register's simdWidth.Zhigang Gong2015-04-102-12/+19
| | | | | | | Otherwise, it may cause segfault in instruction encoding stage. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* update document.Zhigang Gong2015-03-301-2/+33
| | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* Fix: (v3) Event callback that were not executed when command was already ↵David Couturier2015-03-272-22/+61
| | | | | | | | | | | | | | | | | | CL_COMPLETE + thread safety for callbacks When trying to register a callback on the clEnqueueReadBuffer command, since it is processed synchroniously all the time, the command was marked CL_COMPLETE every time. If the event returned by clEnqueueReadBuffer was then used to register a callback function, the callback function did no check to execute it if nessary. Modified the handling of the callback registration in cl_set_event_callback to only call the callback being created if it's status is already reached. Added thread safety measures for pfn_notify calls since the status value can be changed while executing the callback. Grouped the pfn_notify calls to a unified function cl_event_call_callback that handles thread safety: it queues callbacks in a node list while under the protection of pthread_mutex and then calls the callbacks outside of the pthread_mutex (this is required because the callback can deadlock if it calls a cl_api function that uses the mutex) Signed-off-by: David Couturier <david.couturier@polymtl.ca> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* Don't crash if device inaccessibleRebecca N. Palmer2015-03-241-1/+0
| | | | | | | | If /dev/dri/cardX is inaccessible, return CL_DEVICE_NOT_FOUND, don't assert-fail. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* FindLLVM: allow LLVM/Clang 3.6Rebecca N. Palmer2015-03-241-2/+2
| | | | | | | | As beignet now works with LLVM/Clang 3.6, accept this version when searching for llvm-config. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Use matching versions of clang/llvm and libclang/libllvmRebecca Palmer2015-03-242-10/+31
| | | | | | | | | compile the OpenCL standard library with the same version of clang as will compile OpenCL user code, not plain "clang" (i.e. the system default version, which may be different). Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* BUGFIX: Prohibit 'make package' from doing system install of ICD vendor fileBrian Kloppenborg2015-03-241-0/+1
| | | | | | | | | | | | As presently written, a 'make package' will attempt to INSTALL the Beignet ICD loader to /etc/OpenCL/vendors whereas it should just do a local install and then package the file. The proposed change instructs CPack to include the `DESTDIR` variable when it calls `make install`, thus directing the desination for the ICD loader to a local directory instead of a system path. Signed-off-by: Brian Kloppenborg <brian@arrayfire.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Generate NAN for UNDEF value in printf parser.stableJunyan He2015-03-181-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | llvm 3.6 will give a UNDEF value for NAN. The will cause the store instruction for UNDEF to be ignored. We need to modify it to NAN here. Comments from Zhigang: " The related commit of why LLVM won't just simply return NaN for such case is at: Make the sqrt intrinsic return undef for a negative input. As discussed here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140609/220598.html And again here: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/077168.html The sqrt of a negative number when using the llvm intrinsic is undefined. We should return undef rather than 0.0 to match the definition in the LLVM IR lang ref. This change should not affect any code that isn't using "no-nans-fp-math"; ie, no-nans is a requirement for generating the llvm intrinsic in place of a sqrt function call. Unfortunately, the behavior introduced by this patch will not match current gcc, xlc, icc, and possibly other compilers. The current clang/llvm behavior of returning 0.0 doesn't either. We knowingly approve of this difference with the other compilers in an attempt to flag code that is invoking undefined behavior. A front-end warning should also try to convince the user that the program will fail: http://llvm.org/bugs/show_bug.cgi?id=21093 Differential Revision: http://reviews.llvm.org/D5527 This patch is a workaround for the following scenario: printf("%f \n", sqrt(-1.0f)); " Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* runtime: fix a conformance bug in cl_get_kernel_arg_info.Zhigang Gong2015-03-181-1/+5
| | | | | | | | | | | | Accordying to OpenCL 1.2 Rev 17: "CL_KERNEL_ARG_TYPE_CONST is returned if the argument is a pointer and the referenced type is declared with the restrict or const qualifier. For example, a kernel argument declared as global int const *x returns CL_KERNEL_ARG_TYPE_CONST but a kernel argument declared as global int * const x does not." So only need to return CL_KERNEL_ARG_TYPE_CONST for pointer arguments. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Tested-by: "Weng, Chuanbo" <chuanbo.weng@intel.com>
* Build: fix the beignet icd name when CMAKE_INSTALL_FULL_LIBDIR is undefined.Zhigang Gong2015-03-171-2/+8
| | | | | | | | | | On some distributions, the CMAKE_INSTALL_FULL_LIBDIR or CMAKE_LIBRARY_ARCHITECTURE may be undefined. To avoid generate intel-beignet-.icd file name, we need to get rid of the extra "-" for such case. Reported by Igor Gnatenko. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* Bump version to 1.0.2Release_v1.0.2Zhigang Gong2015-03-162-1/+4
| | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* Build: set 3.5 as the stable LLVM version for beignet.Zhigang Gong2015-03-163-11/+18
| | | | | | And update document accordingly. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* GBE: Only emit multiply when immediate is not one.Ruiling Song2015-03-121-3/+5
| | | | | | | | | | | | | | | As constant propagation will introduce constantExpr and gep instruction, I choose not to run constant propagation pass after RemoveGep pass. So, here only generate Multiply as needed. We may do such kind of optimization in Gen IR level in the future. This could fix the performance regression introduced by: "GBE: Import constantexpr lower pass from pNaCl" to the opencv case: opencv_perf_imgproc/OCL_BilateralFixture_Bilateral Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* GBE: Only add non-zero offset in gep lowering pass.Ruiling Song2015-03-101-8/+10
| | | | | Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* GBE: add a new incompatible compile option -cl-finite-math-only.Zhigang Gong2015-03-091-1/+1
| | | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Tested-by: Meng Mengmeng <mengmeng.meng@intel.com>
* Fix llvm3.6 build error.Yang Rong2015-03-052-8/+0
| | | | | | LLVM3.6 revert the c api LLVMLinkModules to LLVM3.5 last-minute. Consist with it. Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
* Build: use -Bsymbolic to fix conflicts with other LLVM users.Zhigang Gong2015-03-021-1/+1
| | | | | | | | | | | | As there may be some other LLVM users such as mesa, and they may link to different LLVM library. To avoid such type of conflicts, we use -Bsymbolic to disable the symbol preemption. This patch should fix the build bug at: https://bugs.freedesktop.org/show_bug.cgi?id=89325 Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* GBE: add fastcall support.Zhigang Gong2015-03-023-2/+7
| | | | | | | | I found some optimization pass may add fastcall attribute to some builtin functions. We need to add the corresponding support. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* GBE: support compare two bool variables.Zhigang Gong2015-03-023-5/+2
| | | | | | | | | | | | | | LLVM 3.6 may generate the following instructions: %Pivot = icmp slt i1 %trunc49, false when do siwth lowering pass. To support it we must use GEN_TYPE_W to represent B rather than GEN_TYPE_UW and we also need to remove the corresponding assertions. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>