cublas Cuda 801 on Maxwell Titan X

Its an old card I know but hopefully there is something that can be done.

https://github.com/ggerganov/whisper.cpp/blob/master/ggml-cuda.cu#L7069-#L7071

There seems to be an issue on Maxwell cards not supporting some type of function in Cuda. Im not sure exactly what instruction is not supported but maybe someone can provide some insights?

```
whisper_init_from_file_with_params_no_state: loading model from '/usr/src/app/dist/lib/whisper/ggml-small.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head  = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 768
whisper_model_load: n_text_head   = 12
whisper_model_load: n_text_layer  = 12
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 3
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx     =  464.68 MB
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   yes
ggml_init_cublas: CUDA_USE_TENSOR_CORES: no
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX TITAN X, compute capability 5.2
whisper_model_load: model size    =  464.44 MB
whisper_init_state: kv self size  =   15.75 MB
whisper_init_state: kv cross size =   52.73 MB
whisper_init_state: compute buffer (conv)   =   25.82 MB
whisper_init_state: compute buffer (encode) =  122.14 MB
whisper_init_state: compute buffer (cross)  =    5.96 MB
whisper_init_state: compute buffer (decode) =   36.27 MB

system_info: n_threads = 4 / 6 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 | 

run: processing './tmp/7b21d44b-278c-48a1-a68c-5e27a49b2c7e.wav' (158800 samples, 9.9 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...


CUDA error 801 at /usr/src/app/whisper.cpp/ggml-cuda.cu:7071: operation not supported
current device: 0
```

In this sample I manually disabled the tensor cores by forcing GGML_CUDA_FORCE_MMQ but the issue still exists 

An important thing to note is that I compiled the library on a device with a 3070. That could likely be a root cause

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cublas Cuda 801 on Maxwell Titan X #1447

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cublas Cuda 801 on Maxwell Titan X #1447

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions