summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
...
* GBE: remove the unecessary type check for SEL instructio.Zhigang Gong2015-03-021-1/+0
| | | | | | | | | | The backend SEL instruction could support bool type since we change the bool representation to normal S16 data type. Now let us remove this assertion check. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
* GBE: Fix fast-math issue under llvm 3.6.Ruiling Song2015-02-282-7/+5
| | | | | | | | "__ocl_math_fastpath_flag" was directly optimized out when compiling libocl under llvm 3.6 And set its initialization value after loading libocl. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Enable multiarch (32/64-bit co-installation)Rebecca N. Palmer2015-02-271-3/+18
| | | | | | | | | | | | | | | | | | | | | It is currently not possible to have 32- and 64-bit builds of beignet installed on the same system, as the path in intel-beignet.icd can only be one of the two installations. This fixes this by giving this file a different name when beignet is installed in a multiarch directory: intel-beignet-i386-linux-gnu.icd -> /usr/lib/i386-linux-gnu/beignet/libCL.so intel-beignet-x86_64-linux-gnu.icd -> /usr/lib/x86_64-linux-gnu/beignet/libCL.so Discussion and possible alternative approaches: http://lists.alioth.debian.org/pipermail/pkg-opencl-devel/Week-of-Mon-20150223/date.html While preparing this patch I noticed that intel-beignet.icd.in uses @LIB_INSTALL_DIR@/beignet/ rather than @BEIGNET_INSTALL_DIR@, which will obviously break if the latter is set directly. Is that a bug or is this intended to be an internal-only variable? Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* GBE: Support unaligned load/store of dword/qword in GenIR.Ruiling Song2015-02-271-0/+76
| | | | | | | | | | | | Although opencl does not allow unaligned load/store of dword/qword, LLVM still may generate such kind of instructions, especially large integer load/store is legalized into load/store of qword with possible unaligned address. The implementation is simple: for store, bitcast d/q word to vector of bytes before writing out, for load, load vector of bytes and then bitcast them to d/q word. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* GBE: remove constant expression handling code in gen writer pass.Zhigang Gong2015-02-271-213/+1
| | | | | | | | All the constant expressions should be expanded in prior to gen writer pass. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
* GBE: expand constant expressions in constant vectorZhigang Gong2015-02-271-0/+46
| | | | | | | | | | | | | | | | | | The previous expand constant pass will not expand a constant expression within a constant vector. So after adding the expand constant pass, we still get some constant expressions at gen writer pass and the worse case is there are some large integer hid in those constant expressions which are not supported in gen writer pass and will cause assertions. This patch will identify those constant vectors and expand all the possible constant expression elements. v2: minor fix including wording fix in commit log. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
* build: use @BEIGNET_INSTALL_DIR@ for the icd file.Zhigang Gong2015-02-271-1/+1
| | | | | | | We should use this macro rather than @LIB_INSTALL_DIR@/beignet/. Reported by Rebecca N. Palmer. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* Crash when hardware inaccessibleRebecca N. Palmer2015-02-271-13/+16
| | | | | | | | | | | | | | | https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=779213 Summary: On hardware where the Intel GPU is disabled, beignet was found to assert-fail on load, taking the application down with it before it can do anything (including checking for hardware via clGetDeviceIDs). This fixes this crash, allowing existing error handling to return CL_DEVICE_NOT_FOUND, and the application to then try other ICDs until it finds the right one for the hardware. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* GBE: unify element type before insertelement in legalize pass.Ruiling Song2015-02-261-4/+41
| | | | | | | | | | | | | large integer type like i96 may be expanded to be low 64bit and high 32bit. When it is cast to <i32 x 3>, we should first make the expanded type to be of same type, here i32. insertelement could not insert element of different type. Then we can do insertelement one by one to generate the <i32 x 3> vector. This could fix the bug: https://bugs.freedesktop.org/show_bug.cgi?id=89167 Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* libocl: Directly scalarize built-in with vector input.Ruiling Song2015-02-251-39/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This revert the following commit: "Re-apply "improve the build performance of vector type built-in function."" commitId: 06cce8178649759e12a3a353f0550189d371871b. I finally decide to do this because although below kind of program has less instructions and less compile-time, but it will also introduce extra memory access, which would cause bad run-time performance if the loop is not unrolled. If the loop is unrolled, it would be similar like scalarized version. OVERLOADABLE float16 func (float16 param0) { union{ float va[16]; float16 vv16; }uret; union{ float pa[16]; float16 pv16; }usrc0; usrc0.pv16 = param0; for(int i =0; i < 16; i++) uret.va[i] = func(usrc0.pa[i]); return uret.vv16; } I did some experiment on the affected built-in. I fixed the GPU frequency at 1050, and increase input data to 862000. The result is like below (obviously the scalarized version has better performance): bultin_asinh_float16: loop version: 200ms scalarized version: 150ms builtin_sinh_float16: loop version: 250ms scalarized version: 160ms And also this patch would reduce the generation of large integer. Although we support large integer legalization, I find sometime it is hard to legalize in very efficient way like large integer LE/GT. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* libocl: define NULL to zeroRuiling Song2015-02-251-1/+1
| | | | | | | | | | | | | | using (void*)0 could not pass compilation in clang 3.6. It will be treated as private address space pointer, if you compare a global pointer with NULL, that is a private and global pointer comparison, this is not allowed by OpenCL spec. But zero is allowed as it is a pointer and integer comparison. Detailed discussion, please read: http://lists.cs.uiuc.edu/pipermail/cfe-dev/2015-February/041429.html Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Correct the error llvm link msg copy in function genProgramLinkFromLLVM.Yang Rong2015-02-131-4/+3
| | | | | | | Use strncpy to avoid overflow and need return errSize. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Optimization of clEnqueueCopyImageToBuffer for 16 aligned case.Chuanbo Weng2015-02-134-9/+57
| | | | | | | | | | | | | | We can change the image_channel_order to CL_RGBA and image_channel_data_type to CL_UNSIGNED_INT32 for some special case, thus 16 bytes can be read by one work item. Bandwidth is fully used. v2: Now we just optimize for IMAGE2D, so add judgement to not affect other image type's code path. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* GBE: fix build error for LLVM 3.4/3.3.Zhigang Gong2015-02-131-0/+16
| | | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
* GBE: fix build error for llvm 3.6.Zhigang Gong2015-02-131-1/+1
| | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* Remove useless llvm head file FindUsedTypes.h.Yang Rong2015-02-122-2/+0
| | | | | Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Add llvm3.6 build support.Yang Rong2015-02-127-5/+90
| | | | | | | | | | | There are some changes from llvm3.5: 1. Some functions return std::unique_ptr instead of pointer. 2. MetaNode to Value and Value to MetaNode. V2: Fix llvm3.5 build error. V3: Print link and function materialize message. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Use llvm-c's LLVMLinkModules instead of llvm::Linker::LinkModules.Yang Rong2015-02-122-26/+14
| | | | | | | | llvm::Linker::LinkModules's define will be changed in llvm3.6, and LLVMLinkModules' define is more stable, so use LLVMLinkModules to link. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* GBE: We need use exiting block here.Ruiling Song2015-02-121-6/+12
| | | | | | | | | | | According to the API explanation, we should use exiting block instead of latch block. llvm 3.6 place an assert on this. v2: Use latch block if it is the exiting block, else use exiting block. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Change the KB and MB define to enum.Yang Rong2015-02-111-3/+7
| | | | | | | Using the enum to avoid name conflict. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* GBE: Import PromoteIntegers pass from pNaClRuiling Song2015-02-114-0/+658
| | | | | | | This is used to solve the integer bitwidth that is not power of two. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* GBE: Load/store should use same address space as before.Ruiling Song2015-02-111-4/+6
| | | | | Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* GBE: Fix a bug in legalize pass.Ruiling Song2015-02-111-3/+3
| | | | | | | The type may be float. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* GBE: Fix a build error against llvm release versionRuiling Song2015-02-111-1/+4
| | | | | | | | | | The DEBUG macro will try to link llvm::DebugFlag and llvm::isCurrentDebugType() which is absent in release version of LLVM library. So define it to empty. The problem occurs when building debug version of beignet against the release version of llvm. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
* GBE: expand large integer instructionsRuiling Song2015-02-115-11/+779
| | | | | | | | | | | | | The pass is also from PNaCl. which lower large integer into smaller ones. But different from previous legalize pass. It is easy to handle various instructions like load or phi with large integer operand. I find that instruction combine may make me hard to totally eliminate empty block. As CFG simplify pass may generate switch-case when preventing empty block. And Switch lower pass after CFG simplify may make empty block still exist in Gen IR. So, I temporarily disable the empty block check in backend. Signed-off-by: Ruiling Song <ruiling.song@intel.com>
* GBE: Import constantexpr lower pass from pNaClRuiling Song2015-02-117-15/+299
| | | | | | | | | | | | | | | | | | The idea is lower the constantExpr into an Instruction. Fix the ptrtoInt and IntToPtr implementation, it simply maps to a convert if type size not the same. Fix a bitcast from integer to float issue. As we expand llvm::ConstantExpr into llvm::Instruction. We will meet below situation. %10 = bitcast i32 1073741824 to float %11 = fcmp float %10 0.000000e+00 This will translated into GenIR: %100 = loadi S32 1073741824 %101 = fcmp %100, 0.0f In later instruction selection, we may directly getFloat() from %100 Signed-off-by: Ruiling Song <ruiling.song@intel.com>
* remove unsafe define -D__$(USER)__Andreas Beckmann2015-02-111-2/+0
| | | | | | | | | | funny things may happen with usernames like 'asm', 'attribute', 'x86_64', 'i386', and so on this breaks on usernames with non-alnum chars ('-', '.') Signed-off-by: Andreas Beckmann <anbe@debian.org> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* prefer newer llvm versions over 3.3Andreas Beckmann2015-02-111-2/+2
| | | | | | | | v2: LLVM 3.3 is better than 3.4, switch the order of 3.4 and 3.3. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Correct the bit fields error for indirect address of Gen8Junyan He2015-02-111-2/+2
| | | | | Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* runtime: don't free the host_ptr for a subbuffer.Zhigang Gong2015-02-091-1/+3
| | | | | | | | | | | | When the buffer has CL_MEM_ALLOC_HOST_PTR, the runtime need to free the host_ptr at destructor. But if the buffer is a subbuffer, then its host ptr is not allocated by itself, we should not free it here. Otherwise, it may cause some weird errors such as: "corrupted double-linked list..". Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
* Implement 1D/2D image array related cl_mem_kernel_copy_image in cl way ↵Chuanbo Weng2015-02-069-12/+172
| | | | | | | | | | | | | instead of cpu way. Before this patch, cl_mem_kernel_copy_image do cpu memory copy in order to copy image array objects. This is very slow for large image size. This patch implement image array copy in cl way, which dramatically accelerate image array related clEnqueueCopyImage. clCopyImage case in OpenCL conformance test will not be blocked anymore. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Return error, don't crash, on allocation failureRebecca N. Palmer2015-02-061-10/+13
| | | | | | | | | | | | As previously noted, when cl_mem_allocate fails, its error handling then calls cl_mem_delete on the incompletely-set-up buffer, which aborts at assert(mem->ctx). This patch appears to fix the problem, but be warned I don't know this code well enough to know what else it might break. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* runtime: fix a potential null pointer dereference.Zhigang Gong2015-02-061-6/+7
| | | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Xionghu Luo <xionghu.luo@intel.com>
* update document.Zhigang Gong2015-02-061-3/+6
| | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* Add document to describe the detials of libva buffer sharing.Chuanbo Weng2015-02-062-0/+68
| | | | | | | | This document includes the steps of using libva buffer sharing and the way to build and run corresponding example. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
* Add example to show libva buffer sharing with extension ↵Chuanbo Weng2015-02-065-0/+553
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | clCreateImageFromLibvaIntel. This example reads a source nv12 file to a VASurface, and creates a target VASurface. Then creates corresponding cl image objects from them. After using ocl to do mirror effect post-processing on source VASurface, target VASurface is shown on screen by default. Code of loading nv12 file to VASurface are referenced from libva/test/encode/avcenc.c. v2: Delete 1920x1080.nv12 and 640x480.nv12 because of large size, add 256_128.nv12 as default test image. v3: 1. Re-org files: add libva as a submodule then use display related files. 2. Show result on screen by default instead of saving as a file. 3. Fix warnings. v4: Fix whitespace format warnings. v5: 1. Modify upload_nv12_to_surface to read a nv12 file and then upload it to an NV12 VASurface. Also modify store_surface_to_nv12. 2. Change the cl post-processing kernel from gray effect to mirror effect, which make demo cooler. 3. Minor fix of other problems. v6: Remove unnecessary OUTPUT_NV12_DEFAULT related code. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
* Add submodule libva for examples.Zhigang Gong2015-02-062-0/+3
| | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* Backend: Fix one bug of printf because of ir reorder.Junyan He2015-02-064-14/+34
| | | | | | | | | | | | | The llvm will generate ir which has if.else block before if.then block. We parse the printf statement before llvm_to_gen. The later if-else analysis will reorder the if-else blocks. This cause when we print out the result, we get the wrong message from another printf statement. Add printf index to the index buffer to record which one the result belongs to, and so this bug is fixed. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Fix a bug of 1d image array test case.Junyan He2015-02-041-6/+8
| | | | | | | | | | Because of the HW limitation, vertical stride is at least aligned to 2. For 1D array image, the data has interval. The image size is just twice as big as the buffer size we think. Use clEnqueueWriteImage is safe and fix this bug. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* check the predication in case of endless loop.Luo2015-01-281-0/+5
| | | | | | | | | | | | | | v2: Add comment from ruiling: or dead loop, it has an unconditional branch at its end. Simply do not treat it as a loop is also acceptable. I ran into this problem when I execute ./opencv_test_imgproc --gtest_filter=OCL_Imgproc/HoughLines.RealImage/0 And it fix the problem. Signed-off-by: Luo <xionghu.luo@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* GBE: add GEN_TYPE_HF to getTypeSize.Zhigang Gong2015-01-261-0/+1
| | | | | | | Gen8 use GEN_TYPE_HF, we need to let getTypeSize support it. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Tested-by: Zhu Bingbing <bingbingx.zhu@intel.com>
* Fix bug for bitcast test case because of long type.Junyan He2015-01-261-5/+5
| | | | | | | | ulong and uint64_t have different size on i386 and i386_64, which cause the test case failure. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Add the check for src and dst span different registers.Junyan He2015-01-231-2/+41
| | | | | | | | | | | | | | | | | On IVB and HSW, When dst spans two registers, the source MUST span two registers. So the following instructions: mov (16) r104.0<2>:uw r126.0<8;8,1>:uw { Align1, H1 } mov (16) r104.1<2>:uw r111.0<8;8,1>:uw { Align1, H1 } mov (16) r106.0<2>:uw r110.0<8;8,1>:uw { Align1, H1 } mov (16) r106.1<2>:uw r109.0<8;8,1>:uw { Align1, H1 } are illegal. Add the check to split instruction into 2 SIMD8 instructions here. TODO: These instructions are allowed on BDW, need to improve. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* Add test case for long bitcast.Junyan He2015-01-233-0/+275
| | | | | Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* update utest to loose userptr limitationGuo Yejun2015-01-232-2/+2
| | | | | | | | the limitation is loosed from page size to cache line size alignment inside driver, update utest accordingly. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* loose the alignment limitation for host_ptr of CL_MEM_USE_HOST_PTRGuo Yejun2015-01-233-4/+22
| | | | | | | | | the current limitation is both host_ptr and buffer size should be page aligned, loose the limitation of host_ptr to be cache line size (64byte) alignment, and no limitation for the size. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* correct the cache line size to be 64Guo Yejun2015-01-232-2/+2
| | | | | | | the correct value of cache line size is 64 bytes, not 128. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* GBE: fix popcount bugs.Zhigang Gong2015-01-224-10/+20
| | | | | | | We need to pass correct popcount source type to backend. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com>
* GBE: fix an ACC register related instruction scheduling bugZhigang Gong2015-01-213-2/+18
| | | | | | | | | | | | | Some instructions modify the ACC register in the gen_context stage which's not regonized by current instruction scheduling algorithm. This patch fix this bug by checking all the possible SEL_OPs which may change the ACC implicitly. The corresponding bugzilla link is as below: https://bugs.freedesktop.org/show_bug.cgi?id=88587 Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* Bump version to 1.0.1.Release_v1.0.1Zhigang Gong2015-01-192-2/+5
| | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>