summaryrefslogtreecommitdiff
path: root/src/cl_event.h
Commit message (Collapse)AuthorAgeFilesLines
* Don't leak memory on long chains of eventsRebecca N. Palmer2018-08-201-1/+1
| | | | | | | | | | | | Delete event->depend_events when it is no longer needed, to allow the event objects it refers to to be freed. This avoids out-of-memory hangs in large dependency trees (e.g. long iterative calculations): https://launchpad.net/bugs/1354086 Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Improve event execute function.Junyan He2016-12-281-2/+4
| | | | | | | | | | | | | | | | | | | | | | | Modify the event exec function, make it as the uniformal entry for all event command execution. This will help the timestamp record and profiling feature a lot. V2: 1. Set event init state to bigger than CL_QUEUED. Event state should be set to CL_QUEUED exactly when it is to be queued. Profiling feature make this requirement clearer. We need to record the timestamp exactly when it it to be queued. So we need to add a additional state beyond CL_QUEUED. 2. Fix cl_event_update_timestamp_gen bugi, the CL_SUMITTED time may be less. GPU may record the timestamp of CL_RUNNING before CPU record timestamp of CL_SUMITTED. It is a async process and it is hard for us to control. According to SPEC, we need to record timestamp after some state is done. We can just now set CL_SUMITTED to CL_RUNNING timestamp if the CL_SUBMITTED timestamp is the bigger one. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Add profiling feature based on new event implementation.Junyan He2016-12-281-6/+4
| | | | | | | | | | | | | | | | | | | | | | | TODO: In opencl 2.0, a new profiling item called CL_PROFILING_COMMAND_COMPLETE is imported. It means that we need to record the time stamp of all the child events created by the "Kernel enqueing kernels" feature finish. This should be done after the "Kernel enqueing kernels" feature enabled. V2: Update event time stamp before inserting to queue thread, avoid MT issue. V3: Fixup overflow problem. V4: Fixup overflow to 0xfffffffffffffffff problem. Just take ownership and release event lock when call the update timestamp function. The update timestamp function may have block system call can should not hold the lock to call it. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Refine list related functions.Junyan He2016-12-281-2/+2
| | | | | | | Make the list related functions more clear and readable. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Modify all event related functions using new event handle.Junyan He2016-09-281-91/+55
| | | | | | | | | | | | | | | | Rewrite the cl_event, and modify all the event functions using this new event manner. Event will co-operate with command queue's thread together. v2: Fix a logic problem in event create failed. V3: Set enqueue default to do nothing, handle some enqueue has nothing to do. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Delete all the verbose locks and use list to store CL objects.Junyan He2016-09-231-1/+0
| | | | | | | | | | We use context's lock when we add and delete cl objects. Every cl object should use it's own lock to protect itself. We also add some helper functions to ease the adding and removing operations. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Runtime: Apply base object to cl_eventJunyan He2016-09-021-4/+5
| | | | | Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* runtime: refine the last_event in queue to a listPan Xiuli2015-10-131-0/+5
| | | | | | | | | | | | | | Refine the event struct to make last_event become a list to store all uncompeleted events and update them every queue flush. This can make sure all events created in the runtime have a chance to update status and run callback functions and then be deleted. We will also fix the memory leak problem casued by uncompeted events. This is a bugfix for https://bugs.freedesktop.org/show_bug.cgi?id=91710 The leaked events with gpu buffers will be unreferenced and cause other drm buffer leak and result in terrible memory leak. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* Calculate appropriate timestamps for cl profileMidhun Kodiyath2015-09-231-0/+11
| | | | | | | | | Fix to calculate the current cpu monotonic raw timestamp in nanoseconds for enqueued,submitted,start and finshed and send this to application based on the parameter queries. Signed-off-by: Midhun Kodiyath <midhunchandra.kodiyath@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* runtime: Enhance the error handling when flush gpgpu command queue.Zhigang Gong2015-04-101-1/+1
| | | | | | | | | Beignet uses drm_intel_gem_bo_context_exec() to flush command queue to linux drm driver layer. We need to check the return value of that function, as it may fail when the application uses very large array. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* Fix: (v3) Event callback that were not executed when command was already ↵David Couturier2015-03-271-1/+3
| | | | | | | | | | | | | | | | | | CL_COMPLETE + thread safety for callbacks When trying to register a callback on the clEnqueueReadBuffer command, since it is processed synchroniously all the time, the command was marked CL_COMPLETE every time. If the event returned by clEnqueueReadBuffer was then used to register a callback function, the callback function did no check to execute it if nessary. Modified the handling of the callback registration in cl_set_event_callback to only call the callback being created if it's status is already reached. Added thread safety measures for pfn_notify calls since the status value can be changed while executing the callback. Grouped the pfn_notify calls to a unified function cl_event_call_callback that handles thread safety: it queues callbacks in a node list while under the protection of pthread_mutex and then calls the callbacks outside of the pthread_mutex (this is required because the callback can deadlock if it calls a cl_api function that uses the mutex) Signed-off-by: David Couturier <david.couturier@polymtl.ca> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* License: adjust all license version to LGPL v2.1+.Zhigang Gong2014-11-111-1/+1
| | | | | | | | To make the license statement consistent to each other, adjust all license versions to v2.1+. Thus beignet should have a pure LGPL v2.1+ license. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* runtime: fix some subtle event bugs.Zhigang Gong2014-07-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | This patch fix the following two bugs in event handling. 1. When it's time to call a event's user call back function, we need to set the executed to true before the call. As that call back function may call into clReleaseEvent(), and if we don't set the executed status to true, it will enter infinite recursive loop. 2. After the user call clEnqueueNDRangeKernel to get a valid event, the user set a call back function to that event, and in that call back function, it will release that event. This scenario is totally correct. But our current event handling doesn't have a deadicated timer thread to update those on-the-fly events' status. Thus those events will not have a chance to get updated, and those call back function will not executed forever. To introduce a complete timer style thread to maintain this type of events is too heavy for this fix release. This patch choose an easy way to work around it. It will make sure the last gpgpu event to be finished before current task to be enqueued. After this patch, most of the OpenCV 3.0 cases could run smoothly without any serious issue. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* runtime: fix a gpgpu event and thread local gpgpu handling bug.Zhigang Gong2014-07-031-2/+5
| | | | | | | | | | | | | | | When pending a command queue, we need to record the whole gpgpu structure not just the batch buffer. For the following reason: 1. We need to keep those private buffer, for example those printf buffers. 2. We need to make sure this gpgpu will not be reused by other enqueuement. v2: Don't try to flush all user event attached to the queue. Just need to flush the current event when doing command queue flush. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* add [opencl-1.2] API clEnqueueBarrierWithWaitList.Luo2014-06-131-0/+2
| | | | | | | | | | | | | This command blocks command execution, that is, any following commands enqueued after it do not execute until it completes; API clEnqueueMarkerWithWaitList patch didn't push the latest, update in this patch. Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com> Signed-off-by: Luo <xionghu.luo@intel.com> Conflicts: src/cl_event.c
* add [opencl 1.2] API clEnqueueMarkerWithWaitList.Luo2014-06-131-1/+1
| | | | Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* fix event related bugs.Luo2014-05-221-0/+4
| | | | | | | | 1. remove repeated user events in list. 2. missed braces in loops. 3. fix barrier event reference not incresed. Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* Complete the feature of clGetEventProfilingInfo APIJunyan He2013-11-291-1/+2
| | | | | | | | | | | | | | | | | The profiling feature is now all supported. We use drm_intel_reg_read to get the current time of GPU when the event is queued and submitted, and use PIPI_CONTROL cmd to get the executing time of the GPU for kernel start and end. One trivial problem is that: The GPU timer counter is 36 bits with resolution of 80ns, so 2^36*80 = 5500s, about half an hour. Some test may last about 2~5 min and if it starts at about half an hour, this may cause a wrap back problem and cause the case fail. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Using the PIPE_CONTROL to implement get time stamp in gen backendJunyan He2013-10-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | We use PIPE_CONTROL to get the time stamps from GPU just after batch start and before batch flush. Using the first one the caculate the CL_PROFILING_COMMAND_START time and uing the second one to caculate the CL_PROFILING_COMMAND_END time. There are 2 limitations here: 1. Then end time stamp is just before the FLUSH, so the Flush time is not included, which will cause to lose the accuracy. Because the we do not know which event will be used to do the profling when it is created, adding another flush for end time stamp may add some overload. 2. The time of CPU and GPU can not be sync correctly now. So the time of CL_PROFILING_COMMAND_QUEUED and CL_PROFILING_COMMAND_SUBMIT which happens on CPU side can not be caculated correctly with the same base time of GPU. So we just simplely set them to CL_PROFILING_COMMAND_START now. For the Event not involving GPU operations such as ReadBuffer, all the times are 0 now. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Implement clEnqueueMarker and clEnqueueBarrier.Yang Rong2013-09-181-2/+4
| | | | | | | | | | | | | Add some event info to cl_command_queue. One is non-complete user events, used to block marker event and barrier. After these events become CL_COMPLETE, the events blocked by these events also become CL_COMPLETE, so marker event will also set to CL_COMPLETE. If there is no user events, need wait last event complete and set marker event to complete. Add barrier_index, for clEnqueueBarrier, point to user events, indicate the enqueue apis follow clEnqueueBarrier should wait on how many user events. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Refine and fix some event bugs.Yang Rong2013-09-181-1/+1
| | | | | Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Add openCL event support.Yang Rong2013-08-121-1/+65
| | | | | | | | | | | | | | | | | Now use the defer execute to wait events. If there is no user event waited, then using wait rendering to wait GPU event complete and call the enqueue api immediately. If there is the user events waited, then should prepare the the enqueue data, and resume the enqueue when all user events that waited complete. The achieve these, add the enqueue callback to user event, and add the all user event and other wait event list to enqueue callback. When set user event to complete, check all enqueue callbacks wait this event. Now, clEnqueueMark/clEnqueueBarrier still not impletement, and clEnqueueMapBuffer /clEnqueueMapImage is not consistency with spec. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Implement KHR ICD extensionSimon Richter2013-04-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | This adds a pointer to the dispatch table at the beginning of every object of type - cl_command_queue - cl_context - cl_device_id - cl_event - cl_kernel - cl_mem - cl_platform_id - cl_program - cl_sampler as required by the ICD specification. The layout of the dispatch table comes from the OpenCL ICD loader by Brice Videau <brice.videau@imag.fr> and Vincent Danjean <Vincent.Danjean@ens-lyon.org>. To avoid dispatch table entries being overwritten with the ICD loader's implementations of the CL functions (as would be the proper behaviour for the ELF loader), the -Bsymbolic option is given to the linker. Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Added all miniCL filesbsegovia2012-08-101-0/+27