Description
So another contrived example:
#include <CL/sycl.hpp>
using namespace cl::sycl;
int main() {
queue q{property::queue::enable_profiling()};
auto e = q.submit([&](handler &cgh) {
int w = 512;
cgh.single_task<class event_wait>([=]() mutable {
for (int i = 0; i < 512000; ++i)
++w;
});
printf("%d \n", w);
});
auto nsTimeEnd = e.get_profiling_info<info::event_profiling::command_end>();
return 0;
}
So I have tried this with ComputeCPP (1.0) and the Intel SYCL (ISYCL) compiler in both cases using an Intel OCL selector. In ISYCL the get_profiling_info call will terminate in the SYCL runtime with the following diagnostic:
include/CL/sycl/detail/event_info.hpp:26: OpenCL API returns: -7 (CL_PROFILING_INFO_NOT_AVAILABLE)
Now if you put an explicit queue wait in, all is good in the world and it'll give a reasonable value! So, it seems that e.get_profiling_info is not a blocking call/implicit wait. And this leads to nondeterministic results in this case, some times it'll terminate, sometimes the events completed so you get your result back without problem.
However, the same snippet of code compiled with ComputeCPP seems to lead to a different result, without an explicit queue wait, it still yields a reasonable return value indicating that get_profiling_info is waiting on the event to complete before retrieving the information.
I'm not sure which behavior is correct as far as the specification is concerned (I did skim and I don't think there is a rule specifying get_profiling_info forces a wait till event completion). So this may be more of a specification related issue than an implementation issue, if that's the case I can move the issue.