delta/beignet.git - gitlab.freedesktop.org: beignet/beignet.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Bump to 1.0.3.Release_v1.0.3	Zhigang Gong	2015-04-17	2	-1/+4
\| \| \| \|	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
*	Doc: update known issue for the store/load pointer issue.	Zhigang Gong	2015-04-15	1	-0/+11
\| \| \| \| \| \| \|	We will defer the fix of this known issue to 1.1.0. Let's document it before that. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
*	Doc: update cmd parser issue for HSW platforms.	Zhigang Gong	2015-04-14	1	-5/+8
\| \| \| \| \| \| \|	For HSW platform, due to the atomic in L3 related registers' usage, we always need to disable the cmd parser. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
*	GBE: should initialize useDWLabel to false by default.	Zhigang Gong	2015-04-14	1	-1/+1
\| \| \| \| \| \|	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
*	Doc: add a command to install dependencies.	Zhigang Gong	2015-04-14	1	-0/+6
\| \| \| \|	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
*	utests: don't continue to run any case when fail to initialize device.	Zhigang Gong	2015-04-14	1	-1/+5
\| \| \| \| \|	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
*	runtime: don't try to open nonexistent render nodes or device files.	Zhigang Gong	2015-04-14	1	-1/+4
\| \| \| \| \|	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
*	CMake: allow to build with LLVM 3.3/3.4.	Zhigang Gong	2015-04-14	1	-2/+1
\| \| \| \| \| \| \| \|	Although the recommended LLVM version is 3.5, Beignet still support to build with LLVM 3.3/3.4. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Tested-by: Mengmeng Meng <mengmeng.meng@intel.com>
*	GBE: fix a bug in byte scatter write.	Zhigang Gong	2015-04-13	1	-5/+11
\| \| \| \| \| \| \|	In uniform mode, we should set simd width to 1 and set noMask bit. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
*	GBE: fix an potential assertion in constant expanding pass.	Zhigang Gong	2015-04-13	1	-1/+1
\| \| \| \| \| \| \| \| \|	Using the inserPos is good enough. If using --insertPos, there is one potential issue when the insertPos is the head of a list then it will trigger an assertion. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
*	GBE: correct the instruction replacement logic in scalarize pass.	Zhigang Gong	2015-04-13	1	-9/+34
\| \| \| \| \| \| \| \| \| \| \| \| \|	When we want to delete an old instruction and replace it with the new one, we only call the LLVM IR's replace function which is not sufficient for the scalarize pass, as we also keep some local reference int eh vecVals map. We need to replace all of those local reference also. Otherwise, the deleted values may be used in the subsequent instructions which causes fatal error latter. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
*	GBE: refine error handling for private libva buffer sharing extension.	Zhigang Gong	2015-04-13	2	-0/+11
\| \| \| \| \|	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
*	GBE: Use actual bti information to determine a pointer's addressspace.	Zhigang Gong	2015-04-13	5	-8/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Due to the private constant buffer support, it introduces private address space mixed with constant address space some time. And more generic, one constant address space may be located in private address space in LLVM IR layer. Such as the following code: __kernel ... { const int2 foo[] = {{0, 1}, {2, 3}}; int2 data = foo[get_global_id(0) % 2]; } The foo is in private address space but we finally will use __constant bti to access it in Gen backend. The the above code will cause a assertion fail in gen insturcion selection stage, because it generate a vector loading instruction on a __constant buffer. So we should use the actual BTI data to determine one pointer's address space rather than get it from the LLVM IR layer. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
*	GBE: Extend front label ip to 32 bit on demand.	Zhigang Gong	2015-04-13	10	-29/+128
\| \| \| \| \| \| \| \| \| \|	If the front end label ip exceed 0xffff, then the backend will use real DW to represent each block's IP address. This is a dynamic behaviour according to the actual front end's label number. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
*	GBE: don't type cast register/labelindex to integer.	Zhigang Gong	2015-04-13	3	-33/+32
\| \| \| \| \|	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
*	GBE: extend backend label to 32 bit.	Zhigang Gong	2015-04-13	3	-15/+15
\| \| \| \| \| \| \| \| \|	The front end label is still 16 bit. But the auxiliary label could be larger than that. This is the preparation to support 32 bit label for both front end and backend. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
*	GBE: extend registers/tuples/immediates to 32bit wide.	Zhigang Gong	2015-04-13	7	-29/+30
\| \| \| \| \| \| \| \|	For some extremly large kernel, these values may be larger than 0xFFFF, we have to extend them to 32 bit. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
*	GBE: fix safe type definition.	Zhigang Gong	2015-04-13	1	-1/+1
\| \| \| \| \| \| \| \|	Should not use hard coded uint16_t for safe type definition. Prepare to extend some types to uint32_t. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
*	strip unsupported attributes and calling conventions.	Zhigang Gong	2015-04-13	6	-4/+134
\| \| \| \| \|	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
*	runtime: Enhance the error handling when flush gpgpu command queue.	Zhigang Gong	2015-04-10	9	-30/+30
\| \| \| \| \| \| \| \| \|	Beignet uses drm_intel_gem_bo_context_exec() to flush command queue to linux drm driver layer. We need to check the return value of that function, as it may fail when the application uses very large array. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
*	GBE: avoid to use the GenRegister::xxxgrf(simdWidth,xxx).	Zhigang Gong	2015-04-10	1	-47/+46
\| \| \| \| \| \| \| \| \| \|	All the gen registers should get the uniform information from the corresponding virtual registers. The use of GenRegister::xxxgrf on a virtual register is very dangerous which may cause inconsistency. This patch eliminate all the use of it in gen_insn_selection stage. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
*	GBE: correct some temporary virtual register's simdWidth.	Zhigang Gong	2015-04-10	2	-12/+19
\| \| \| \| \| \| \|	Otherwise, it may cause segfault in instruction encoding stage. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
*	update document.	Zhigang Gong	2015-03-30	1	-2/+33
\| \| \| \|	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
*	Fix: (v3) Event callback that were not executed when command was already ↵	David Couturier	2015-03-27	2	-22/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CL_COMPLETE + thread safety for callbacks When trying to register a callback on the clEnqueueReadBuffer command, since it is processed synchroniously all the time, the command was marked CL_COMPLETE every time. If the event returned by clEnqueueReadBuffer was then used to register a callback function, the callback function did no check to execute it if nessary. Modified the handling of the callback registration in cl_set_event_callback to only call the callback being created if it's status is already reached. Added thread safety measures for pfn_notify calls since the status value can be changed while executing the callback. Grouped the pfn_notify calls to a unified function cl_event_call_callback that handles thread safety: it queues callbacks in a node list while under the protection of pthread_mutex and then calls the callbacks outside of the pthread_mutex (this is required because the callback can deadlock if it calls a cl_api function that uses the mutex) Signed-off-by: David Couturier <david.couturier@polymtl.ca> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
*	Don't crash if device inaccessible	Rebecca N. Palmer	2015-03-24	1	-1/+0
\| \| \| \| \| \| \| \|	If /dev/dri/cardX is inaccessible, return CL_DEVICE_NOT_FOUND, don't assert-fail. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	FindLLVM: allow LLVM/Clang 3.6	Rebecca N. Palmer	2015-03-24	1	-2/+2
\| \| \| \| \| \| \| \|	As beignet now works with LLVM/Clang 3.6, accept this version when searching for llvm-config. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	Use matching versions of clang/llvm and libclang/libllvm	Rebecca Palmer	2015-03-24	2	-10/+31
\| \| \| \| \| \| \| \| \|	compile the OpenCL standard library with the same version of clang as will compile OpenCL user code, not plain "clang" (i.e. the system default version, which may be different). Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	BUGFIX: Prohibit 'make package' from doing system install of ICD vendor file	Brian Kloppenborg	2015-03-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	As presently written, a 'make package' will attempt to INSTALL the Beignet ICD loader to /etc/OpenCL/vendors whereas it should just do a local install and then package the file. The proposed change instructs CPack to include the `DESTDIR` variable when it calls `make install`, thus directing the desination for the ICD loader to a local directory instead of a system path. Signed-off-by: Brian Kloppenborg <brian@arrayfire.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	Generate NAN for UNDEF value in printf parser.stable	Junyan He	2015-03-18	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	llvm 3.6 will give a UNDEF value for NAN. The will cause the store instruction for UNDEF to be ignored. We need to modify it to NAN here. Comments from Zhigang: " The related commit of why LLVM won't just simply return NaN for such case is at: Make the sqrt intrinsic return undef for a negative input. As discussed here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140609/220598.html And again here: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/077168.html The sqrt of a negative number when using the llvm intrinsic is undefined. We should return undef rather than 0.0 to match the definition in the LLVM IR lang ref. This change should not affect any code that isn't using "no-nans-fp-math"; ie, no-nans is a requirement for generating the llvm intrinsic in place of a sqrt function call. Unfortunately, the behavior introduced by this patch will not match current gcc, xlc, icc, and possibly other compilers. The current clang/llvm behavior of returning 0.0 doesn't either. We knowingly approve of this difference with the other compilers in an attempt to flag code that is invoking undefined behavior. A front-end warning should also try to convince the user that the program will fail: http://llvm.org/bugs/show_bug.cgi?id=21093 Differential Revision: http://reviews.llvm.org/D5527 This patch is a workaround for the following scenario: printf("%f \n", sqrt(-1.0f)); " Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	runtime: fix a conformance bug in cl_get_kernel_arg_info.	Zhigang Gong	2015-03-18	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Accordying to OpenCL 1.2 Rev 17: "CL_KERNEL_ARG_TYPE_CONST is returned if the argument is a pointer and the referenced type is declared with the restrict or const qualifier. For example, a kernel argument declared as global int const x returns CL_KERNEL_ARG_TYPE_CONST but a kernel argument declared as global int const x does not." So only need to return CL_KERNEL_ARG_TYPE_CONST for pointer arguments. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Tested-by: "Weng, Chuanbo" <chuanbo.weng@intel.com>
*	Build: fix the beignet icd name when CMAKE_INSTALL_FULL_LIBDIR is undefined.	Zhigang Gong	2015-03-17	1	-2/+8
\| \| \| \| \| \| \| \| \| \|	On some distributions, the CMAKE_INSTALL_FULL_LIBDIR or CMAKE_LIBRARY_ARCHITECTURE may be undefined. To avoid generate intel-beignet-.icd file name, we need to get rid of the extra "-" for such case. Reported by Igor Gnatenko. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
*	Bump version to 1.0.2Release_v1.0.2	Zhigang Gong	2015-03-16	2	-1/+4
\| \| \| \|	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
*	Build: set 3.5 as the stable LLVM version for beignet.	Zhigang Gong	2015-03-16	3	-11/+18
\| \| \| \| \| \|	And update document accordingly. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
*	GBE: Only emit multiply when immediate is not one.	Ruiling Song	2015-03-12	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As constant propagation will introduce constantExpr and gep instruction, I choose not to run constant propagation pass after RemoveGep pass. So, here only generate Multiply as needed. We may do such kind of optimization in Gen IR level in the future. This could fix the performance regression introduced by: "GBE: Import constantexpr lower pass from pNaCl" to the opencv case: opencv_perf_imgproc/OCL_BilateralFixture_Bilateral Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	GBE: Only add non-zero offset in gep lowering pass.	Ruiling Song	2015-03-10	1	-8/+10
\| \| \| \| \|	Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	GBE: add a new incompatible compile option -cl-finite-math-only.	Zhigang Gong	2015-03-09	1	-1/+1
\| \| \| \| \|	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Tested-by: Meng Mengmeng <mengmeng.meng@intel.com>
*	Fix llvm3.6 build error.	Yang Rong	2015-03-05	2	-8/+0
\| \| \| \| \| \|	LLVM3.6 revert the c api LLVMLinkModules to LLVM3.5 last-minute. Consist with it. Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
*	Build: use -Bsymbolic to fix conflicts with other LLVM users.	Zhigang Gong	2015-03-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	As there may be some other LLVM users such as mesa, and they may link to different LLVM library. To avoid such type of conflicts, we use -Bsymbolic to disable the symbol preemption. This patch should fix the build bug at: https://bugs.freedesktop.org/show_bug.cgi?id=89325 Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
*	GBE: add fastcall support.	Zhigang Gong	2015-03-02	3	-2/+7
\| \| \| \| \| \| \| \|	I found some optimization pass may add fastcall attribute to some builtin functions. We need to add the corresponding support. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
*	GBE: support compare two bool variables.	Zhigang Gong	2015-03-02	3	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	LLVM 3.6 may generate the following instructions: %Pivot = icmp slt i1 %trunc49, false when do siwth lowering pass. To support it we must use GEN_TYPE_W to represent B rather than GEN_TYPE_UW and we also need to remove the corresponding assertions. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
*	GBE: remove the unecessary type check for SEL instructio.	Zhigang Gong	2015-03-02	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	The backend SEL instruction could support bool type since we change the bool representation to normal S16 data type. Now let us remove this assertion check. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
*	GBE: Fix fast-math issue under llvm 3.6.	Ruiling Song	2015-02-28	2	-7/+5
\| \| \| \| \| \| \| \|	"__ocl_math_fastpath_flag" was directly optimized out when compiling libocl under llvm 3.6 And set its initialization value after loading libocl. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	Enable multiarch (32/64-bit co-installation)	Rebecca N. Palmer	2015-02-27	1	-3/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is currently not possible to have 32- and 64-bit builds of beignet installed on the same system, as the path in intel-beignet.icd can only be one of the two installations. This fixes this by giving this file a different name when beignet is installed in a multiarch directory: intel-beignet-i386-linux-gnu.icd -> /usr/lib/i386-linux-gnu/beignet/libCL.so intel-beignet-x86_64-linux-gnu.icd -> /usr/lib/x86_64-linux-gnu/beignet/libCL.so Discussion and possible alternative approaches: http://lists.alioth.debian.org/pipermail/pkg-opencl-devel/Week-of-Mon-20150223/date.html While preparing this patch I noticed that intel-beignet.icd.in uses @LIB_INSTALL_DIR@/beignet/ rather than @BEIGNET_INSTALL_DIR@, which will obviously break if the latter is set directly. Is that a bug or is this intended to be an internal-only variable? Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	GBE: Support unaligned load/store of dword/qword in GenIR.	Ruiling Song	2015-02-27	1	-0/+76
\| \| \| \| \| \| \| \| \| \| \| \|	Although opencl does not allow unaligned load/store of dword/qword, LLVM still may generate such kind of instructions, especially large integer load/store is legalized into load/store of qword with possible unaligned address. The implementation is simple: for store, bitcast d/q word to vector of bytes before writing out, for load, load vector of bytes and then bitcast them to d/q word. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	GBE: remove constant expression handling code in gen writer pass.	Zhigang Gong	2015-02-27	1	-213/+1
\| \| \| \| \| \| \| \|	All the constant expressions should be expanded in prior to gen writer pass. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
*	GBE: expand constant expressions in constant vector	Zhigang Gong	2015-02-27	1	-0/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous expand constant pass will not expand a constant expression within a constant vector. So after adding the expand constant pass, we still get some constant expressions at gen writer pass and the worse case is there are some large integer hid in those constant expressions which are not supported in gen writer pass and will cause assertions. This patch will identify those constant vectors and expand all the possible constant expression elements. v2: minor fix including wording fix in commit log. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
*	build: use @BEIGNET_INSTALL_DIR@ for the icd file.	Zhigang Gong	2015-02-27	1	-1/+1
\| \| \| \| \| \| \|	We should use this macro rather than @LIB_INSTALL_DIR@/beignet/. Reported by Rebecca N. Palmer. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
*	Crash when hardware inaccessible	Rebecca N. Palmer	2015-02-27	1	-13/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=779213 Summary: On hardware where the Intel GPU is disabled, beignet was found to assert-fail on load, taking the application down with it before it can do anything (including checking for hardware via clGetDeviceIDs). This fixes this crash, allowing existing error handling to return CL_DEVICE_NOT_FOUND, and the application to then try other ICDs until it finds the right one for the hardware. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	GBE: unify element type before insertelement in legalize pass.	Ruiling Song	2015-02-26	1	-4/+41
\| \| \| \| \| \| \| \| \| \| \| \| \|	large integer type like i96 may be expanded to be low 64bit and high 32bit. When it is cast to <i32 x 3>, we should first make the expanded type to be of same type, here i32. insertelement could not insert element of different type. Then we can do insertelement one by one to generate the <i32 x 3> vector. This could fix the bug: https://bugs.freedesktop.org/show_bug.cgi?id=89167 Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
*	libocl: Directly scalarize built-in with vector input.	Ruiling Song	2015-02-25	1	-39/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This revert the following commit: "Re-apply "improve the build performance of vector type built-in function."" commitId: 06cce8178649759e12a3a353f0550189d371871b. I finally decide to do this because although below kind of program has less instructions and less compile-time, but it will also introduce extra memory access, which would cause bad run-time performance if the loop is not unrolled. If the loop is unrolled, it would be similar like scalarized version. OVERLOADABLE float16 func (float16 param0) { union{ float va[16]; float16 vv16; }uret; union{ float pa[16]; float16 pv16; }usrc0; usrc0.pv16 = param0; for(int i =0; i < 16; i++) uret.va[i] = func(usrc0.pa[i]); return uret.vv16; } I did some experiment on the affected built-in. I fixed the GPU frequency at 1050, and increase input data to 862000. The result is like below (obviously the scalarized version has better performance): bultin_asinh_float16: loop version: 200ms scalarized version: 150ms builtin_sinh_float16: loop version: 250ms scalarized version: 160ms And also this patch would reduce the generation of large integer. Although we support large integer legalization, I find sometime it is hard to legalize in very efficient way like large integer LE/GT. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>