Skip to content

CUDA: add conv_2d_transpose #14287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 20, 2025
Merged

Conversation

am17an
Copy link
Collaborator

@am17an am17an commented Jun 19, 2025

Adding a conv2d_transpose kernel which has feature parity with the CPU implementation except that it supports batches. Padding should be trivial to add, but I didn't add it since the CPU version doesn't have it. I also added correctness and performance test cases

Backend Device us/run Bandwidth Speedup
CPU Ryzen 3800XT 8-core 144 491.81 0.46 GB/s 1.00
GPU RTX 3090 11 759.66 5.67 GB/s 12.28

@github-actions github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jun 19, 2025
@am17an am17an force-pushed the add_conv2d_transpose branch from f9d7ccd to da2f437 Compare June 20, 2025 01:52
@am17an am17an force-pushed the add_conv2d_transpose branch from da2f437 to b80dd1d Compare June 20, 2025 01:58
@am17an am17an requested a review from JohannesGaessler June 20, 2025 05:10
@am17an am17an requested a review from JohannesGaessler June 20, 2025 12:07
@am17an am17an merged commit c959f46 into ggml-org:master Jun 20, 2025
47 checks passed
@am17an am17an deleted the add_conv2d_transpose branch June 20, 2025 14:49
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Jun 20, 2025
* mamba2-sync: (24 commits)
sync : ggml
Add `ggml_roll` (ggml/1274)
docs : fix the link to llama.h (ggml-org#14293)
CUDA: add conv_2d_transpose (ggml-org#14287)
lint : remove trailing whitepace (ggml-org#14304)
vocab : prevent tokenizer overflow (ggml-org#14301)
sycl: add usage of enqueue_functions extension (ggml-org#14244)
Implement GGML_CPU_ALL_VARIANTS for PowerPC (ggml-org#14286)
llama : improve sep token handling (ggml-org#14272)
cuda : synchronize graph capture and cublas handle destruction (ggml-org#14288)
ggml : fix repack work size for mul_mat_id (ggml-org#14292)
ggml: Update KleidiAI to v1.9.0 (ggml-org#14277)
model : more uniform output id handling (ggml-org#14275)
ubatch : new splitting logic (ggml-org#14217)
CUDA: add conv_2d_dw (ggml-org#14265)
ggml-cpu : remove unnecesary arm feature detection (ggml-org#14281)
gguf-py : make sentencepiece optional (ggml-org#14200)
server : add server parameters for draft model cache type (ggml-org#13782)
build : suppress gcc15 compile warnings (ggml-org#14261)
sycl: Cleanup codepaths in Get Rows in sycl backend (ggml-org#14215)
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants