|
| 1 | +--- |
| 2 | +title: "Supporting Thrust API in Clad" |
| 3 | +layout: post |
| 4 | +excerpt: "This summer, I am working on adding support for Thrust API in Clad, enabling automatic differentiation of GPU-accelerated code. This work bridges the gap between high-performance CUDA parallelism and source-to-source AD transformation." |
| 5 | +sitemap: false |
| 6 | +author: Abdelrhman Elrawy |
| 7 | +permalink: blogs/gsoc25_/ |
| 8 | +banner_image: /images/blog/gsoc-banner.png |
| 9 | +date: 2025-05-18 |
| 10 | +tags: gsoc llvm clang automatic-differentiation gpu cuda thrust |
| 11 | +--- |
| 12 | + |
| 13 | +## About Me |
| 14 | + |
| 15 | +Hi! I’m Abdelrhman Elrawy, a graduate student in Applied Computing specializing in Machine Learning and Parallel Programming. I’ll be working on enabling **Thrust API support in Clad**, bringing GPU-accelerated parallel computing to the world of automatic differentiation. |
| 16 | + |
| 17 | +## Project Description |
| 18 | + |
| 19 | +[Clad](https://github.com/vgvassilev/clad) is a Clang-based tool for source-to-source automatic differentiation (AD). It enables gradient computations by transforming C++ code at compile time. |
| 20 | + |
| 21 | +However, many scientific and machine learning applications leverage **NVIDIA’s Thrust**, a C++ parallel algorithms library for GPUs, and currently, Clad doesn’t support differentiating through Thrust constructs. This limits the usability of Clad in high-performance CUDA code. |
| 22 | + |
| 23 | +My project addresses this gap by enabling Clad to: |
| 24 | + |
| 25 | +- Recognize and handle Thrust primitives like `thrust::transform` and `thrust::reduce` |
| 26 | +- Implement **custom pullback/pushforward rules** for GPU kernels |
| 27 | +- Ensure gradients maintain **parallel performance and correctness** |
| 28 | +- Benchmark and validate derivatives in real-world ML and HPC use cases |
| 29 | + |
| 30 | +## Technical Approach |
| 31 | + |
| 32 | +The project begins with a **proof-of-concept**: manually writing derivatives for common Thrust operations like `transform` and `reduce`. These are compared against finite differences to validate correctness. |
| 33 | + |
| 34 | +Following that, I’ll integrate custom differentiation logic inside Clad, building: |
| 35 | +- A `ThrustBuiltins.h` header for recognizing Thrust calls |
| 36 | +- Visitor pattern extensions in Clad’s AST traversal (e.g., `VisitCallExpr`) |
| 37 | +- GPU-compatible derivative utilities (e.g., CUDA-aware `thrust::fill`, `transform`) |
| 38 | + |
| 39 | +I'll also implement **unit tests**, real-world **mini-apps** (e.g., neural networks), and **benchmarks** to validate and demonstrate this feature. |
| 40 | + |
| 41 | +## Expected Outcomes |
| 42 | + |
| 43 | +By the end of GSoC 2025, Clad will be able to: |
| 44 | +- Differentiate through key Thrust primitives with GPU execution preserved |
| 45 | +- Provide documentation and tutorials for GPU-based automatic differentiation |
| 46 | +- Contribute a robust test suite and benchmarks to the Clad ecosystem |
| 47 | + |
| 48 | +## Related Links |
| 49 | + |
| 50 | +- [Clad GitHub](https://github.com/vgvassilev/clad) |
| 51 | +- [Project description](https://hepsoftwarefoundation.org/gsoc/2025/proposal_Clad-ThrustAPI.html) |
| 52 | +- [My GitHub](https://github.com/a-elrawy) |
0 commit comments