- 
                Notifications
    You must be signed in to change notification settings 
- Fork 254
Add a hostcall interface #1140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add a hostcall interface #1140
Conversation
| Codecov Report
 
 @@            Coverage Diff             @@
##           master    #1140      +/-   ##
==========================================
+ Coverage   66.97%   75.94%   +8.97%     
==========================================
  Files         118      119       +1     
  Lines        7955     7737     -218     
==========================================
+ Hits         5328     5876     +548     
+ Misses       2627     1861     -766     
 Continue to review full report at Codecov. 
 | 
| Hmm, one problem is that the following deadlocks: # hostcall watcher task/thread
Threads.@spawn begin
    while true
        println(1)
        sleep(1)
    end
end
# the application, possibly getting stuck in a CUDA API call that needs the kernel to finish
while true
    ccall(:sleep, Cuint, (Cuint,), 1)
endI had expected this when running with a single thread, because the main task isn't preemtible, but even with multiple threads the main task getting stuck apparently blocks the scheduler, keeping the hostcall watcher thread from making progress. That would cause a deadlock. @vchuravy any thoughts? How does AMDGPU.jl solve this? | 
97b3ad8    to
    5538c0b      
    Compare
  
    | And for some preliminary time measurements: So 2.25us 'per' hostcall (uncontended, and nonblocking since the call doesn't return anything). That's not great, but it's a start. I also don't want to build on this before I'm sure this won't deadlock applications. And for reference,  | 
e99e290    to
    165e41a      
    Compare
  
    1d35604    to
    bf93220      
    Compare
  
    165e41a    to
    f0950dd      
    Compare
  
    bf93220    to
    9990547      
    Compare
  
    9990547    to
    1fe2b4c      
    Compare
  
    | 
 Are you sure you are blocking the scheduler or are you blocking GC? You need at least a safepoint in the loop | 
| 
 In which loop? The first does a sleep, so that's a yield point. The second loop doesn't need to be a loop, if could as well be an API call that blocks 'indefinitely'. | 
| Seems to deadlock regularly on CI, so I guess this will have to wait unless we have either application threads, or a way to make CUDA's blocking API calls yield. | 
5d585c4    to
    c850163      
    Compare
  
    
Fixes #440
Initial, simple implementation. I still need to steal ideas from ADMGPU.jl and optimizations from #567, but the initial goal is a simple but correct implementation that we can use for unlikely code paths such as error reporting.
Demo:
Depends on #1110.
Probably requires Base support like JuliaLang/julia#42302
cc @jpsamaroo