Skip to content

Fix ROCM build by relaxing constness #3895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

KerfuffleV2
Copy link
Collaborator

This change seems necessary to build with ROCM, at least on my system.

ggml-cuda.cu:7370:9: error: no matching function for call to 'hipblasGemmBatchedEx'
        cublasGemmBatchedEx(g_cublas_handles[id], CUBLAS_OP_T, CUBLAS_OP_N,
        ^~~~~~~~~~~~~~~~~~~
ggml-cuda.cu:32:29: note: expanded from macro 'cublasGemmBatchedEx'
#define cublasGemmBatchedEx hipblasGemmBatchedEx
                            ^~~~~~~~~~~~~~~~~~~~
ggml-cuda.cu:209:32: note: expanded from macro 'CUBLAS_CHECK'
        cublasStatus_t err_ = (err);                                                    \
                               ^~~
/opt/rocm/include/hipblas/hipblas.h:18459:32: note: candidate function not viable: 11th argument ('const void *const *') would lose const qualifier
HIPBLAS_EXPORT hipblasStatus_t hipblasGemmBatchedEx(hipblasHandle_t    handle,

Similar to a previous change that got "accepted"...

@KerfuffleV2 KerfuffleV2 requested a review from ggerganov November 2, 2023 01:29
@slaren
Copy link
Member

slaren commented Nov 2, 2023

Can we do this in a way that doesn't generate warnings?

@KerfuffleV2
Copy link
Collaborator Author

Can we do this in a way that doesn't generate warnings?

The code up above it with the (const char *) produces similar warnings. Compiling that file actually produces like 10 pages of warnings for me.

Do you have a suggestion for a way to do it that doesn't cause the warnings?

@KerfuffleV2
Copy link
Collaborator Author

This is still necessary to compile with ROCM. I checked with the latest changes.

@KerfuffleV2 KerfuffleV2 added bug Something isn't working build Compilation issues labels Nov 2, 2023
@slaren
Copy link
Member

slaren commented Nov 2, 2023

Currently, building ggml-cuda.cu with CUDA 12.3 and GCC 12.3.0 doesn't generate any warnings. This adds two warnings back that I had removed in #3891. I imagine that the warnings that you are seeing only happen when building with ROCM. If we can't figure a way to fix this for both CUDA and ROCM, then I guess we can live with the warnings, but it would be nice to not have them.

@KerfuffleV2
Copy link
Collaborator Author

I imagine that the warnings that you are seeing only happen when building with ROCM.

From what I understand, ROCM's "nvcc" is clang. If you build with clang, you'll probably see the massive warning spam I mentioned.

If we can't figure a way to fix this for both CUDA and ROCM

Unfortunately, it's kind of far above my pay grade. I messed around a bit and couldn't find a solution. The HIPBLAS function needs a const void ** for the first two pointers and void ** for the third.

I feel like (and am probably wrong) this has something to do with the fact that it's one chunk of memory we're saying is both const and not? But if that's the case then you'd have to malloc more than one chunk of memory, but from what I know allocation has a non-trivial performance cost.

@ggerganov
Copy link
Member

I think we can fix the warnings if we go back to 2 separate arrays with pointers instead of 1 array. The first array will hold the const src pointers and the second will hold the dst pointers. I will give it a try later if @KerfuffleV2 is not able to figure it out.

@KerfuffleV2
Copy link
Collaborator Author

I honestly don't know anything about this except my change got it to compile. I don't really know enough to refactor the actual code.

We can hold off on merging this in hopes of a better solution from someone who actually knows what they're doing. Worst case, this pull is available. Would be nice to not leave it broken for too long though. All 7 ROCM users will be sad!

@cebtenzzre
Copy link
Collaborator

This is the warning:

/home/cebtenzzre/src/forks/llama.cpp/ggml-cuda.cu: In function ‘void ggml_cuda_mul_mat_mat_batched_cublas(const ggml_tensor*, const ggml_tensor*, ggml_tensor*)’:
/home/cebtenzzre/src/forks/llama.cpp/ggml-cuda.cu:7369:126: warning: cast from type ‘void**’ to type ‘const void**’ casts away qualifiers [-Wcast-qual]
 7369 |         CUBLAS_CHECK(
      |                                                                                                                              ^                                    
/home/cebtenzzre/src/forks/llama.cpp/ggml-cuda.cu:7369:198: warning: cast from type ‘void**’ to type ‘const void**’ casts away qualifiers [-Wcast-qual]
 7369 |         CUBLAS_CHECK(
      |                                                                                                                                                                                                      ^

This is because casting to const void ** allows you to un-const a pointer later by storing a const void * via the const void ** and then retrieving it without the const via the void **.

The hipBLAS API does seem to require that the const pointers and non-const pointers are stored in separate arrays, as there is no safe way to cast between those types.

@slaren
Copy link
Member

slaren commented Nov 2, 2023

We could just remove -Wcast-qual from the compiler flags.

@cebtenzzre
Copy link
Collaborator

We could just remove -Wcast-qual from the compiler flags.

If we do, I'd prefer to do it for only that part of the code with something like:

#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wcast-qual"
// offending code here
#pragma GCC diagnostic pop

@slaren
Copy link
Member

slaren commented Nov 2, 2023

Can you elaborate on why you think that this warning is important? What is the value of complaining about converting a non-const pointer to const, beyond acknowledging a flaw in C's type system? I very much would prefer to not litter the code with pragmas that are irrelevant to the code. This is not the first case either, I pretty much gave up on trying to make the ggml hash tables at least somewhat const-correct because of this warning.

@KerfuffleV2
Copy link
Collaborator Author

I guess what he's getting at is that it's probably not always a useless warning. Presumably someone thought it was worth adding to the compiler to warn the user about. So if it got disabled globally, maybe some mistakes would slip by that otherwise may have been caught - even if it's benign in this particular case.

Playing devil's advocate a bit, I care more about, you know... actually being able to compile the thing on my system. :)

@ggerganov
Copy link
Member

Superseded by #3913

@ggerganov ggerganov closed this Nov 2, 2023
@KerfuffleV2 KerfuffleV2 deleted the fix-rocm-gemmbatchedex branch November 17, 2023 03:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working build Compilation issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants