-
Notifications
You must be signed in to change notification settings - Fork 12k
Fix cuda mul mat for pascal cc==610 #6636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm not very familiar with this section of code, but I wonder if a special exception should be made for the GTX 1080Ti, similar to the way that other devices are specially accounted for, such as the |
Are you compiling with HIPBLAS? I wonder if line 1907 needs to be modified to see if Maybe we change: to: ? |
Unfortunately, I do not have a HIP device and have not tested it. I have tested on GTX 1080 Ti (cc == 610) and V100 (cc==700), and the results are correct. |
Okay. I'm not familiar with these cards, so apologies if my questions seem ignorant.
If cc == 610, then
Should evaluate to false, and removing All that to say, if cc == 610, then why is this change doing anything -- if I'm reading the code correctly, it shouldn't be making a difference? |
Removing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As-is, this is going to reduce the benefit of PR #4682. We need to either extend the relevant kernels, or better identify when one of them does not support the input.
This is an issue with the tests, not the CUDA code. The test case in question doesn't actually show up when evaluating a model, so it's only partially implemented. In terms of performance it wouldn't make sense to run a matrix multiplication like that on a GTX 1080 ti either. Under no circumstances should this PR be merged as-is. |
PR with a fix that does not reduce performance: #6667 |
This should now be fixed on master. |
The following error occurs when executing the test-backend-ops script on the current master branch using a GTX 1080Ti:
This pr fix it.