Skip to content

Commit 8504fdf

Browse files
authored
Add intro blog and contributor info for GSoC 2025 (#300)
* Add intro blog and contributor profile for GSoC 2025 * Updated proposal pdf and blog image * Fixed line breaks and spellcheck errors * Remove extra space
1 parent da9b098 commit 8504fdf

File tree

8 files changed

+95
-0
lines changed

8 files changed

+95
-0
lines changed

.github/actions/spelling/allow/names.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,7 @@ Svrin
7474
Tadel
7575
Taras
7676
Thessaloniki
77+
Timmaraju
7778
Universitat
7879
Unveristy
7980
Uppili
@@ -196,6 +197,7 @@ tapaswenipathak
196197
tfransham
197198
thakkar
198199
tharun
200+
timmaraju
199201
tlattner
200202
vaibhav
201203
vassil

.github/actions/spelling/allow/terms.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,13 @@ Cppyy
66
Debian
77
EPC
88
GPGPU
9+
GPT
910
GSo
1011
GSoC
1112
HSF
1213
JIT'd
1314
Jacobians
15+
LLMs
1416
LLVM
1517
NVIDIA
1618
NVMe
@@ -31,6 +33,7 @@ gitlab
3133
gridlay
3234
gsoc
3335
gpu
36+
llm
3437
llvm
3538
pushforward
3639
linkedin

_data/contributors.yml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -310,6 +310,34 @@
310310
proposal: /assets/docs/de_la_torre_gonzalez_salvador_proposal_gsoc_2025.pdf
311311
mentors: Vassil Vassilev, Lukas Breitwieser
312312

313+
- name: Rohan Timmaraju
314+
photo: Rohan_Timmaraju.jpg
315+
info: "Google Summer of Code 2025 Contributor"
316+
317+
education: "B.S. Computer Science, Columbia University"
318+
github: "https://github.com/Rohan-T144"
319+
active: 1
320+
linkedin: "https://www.linkedin.com/in/rohan-timmaraju-650ba3221/"
321+
projects:
322+
- title: "Enhancing LLM Training Efficiency with Clad for Automatic Differentiation"
323+
status: Ongoing
324+
description: |
325+
Training Large Language Models is computationally expensive, often
326+
limited by the performance limitations of Python-based frameworks. This
327+
project addresses this challenge by enhancing LLM training efficiency
328+
within a C++ environment through the integration of Clad, a Clang/LLVM
329+
compiler plugin for automatic differentiation (AD). We will develop a
330+
custom C++ tensor library specifically designed for optimal interaction
331+
with Clad. The core objective is to replace traditional runtime or
332+
manual gradient computations with Clad's efficient compile-time
333+
differentiation for key LLM operations within a GPT-2 training pipeline.
334+
This involves investigating effective strategies to bridge Clad's static
335+
analysis with dynamic neural network computations, benchmarking the
336+
resulting performance gains in speed and memory usage against a non-Clad
337+
baseline, and leveraging OpenMP for further parallelization.
338+
proposal: /assets/docs/Rohan_Timmaraju_Proposal_2025.pdf
339+
mentors: Vassil Vassilev, David Lange, Jonas Rembser, Christina Koutsou
340+
313341
- name: Abdelrhman Elrawy
314342
photo: Abdelrhman.jpg
315343
info: "Google Summer of Code 2025 Contributor"

_pages/team/rohan-timmaraju.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
---
2+
title: "Compiler Research - Team - Rohan Timmaraju"
3+
layout: gridlay
4+
excerpt: "Compiler Research: Team members"
5+
sitemap: false
6+
permalink: /team/RohanTimmaraju
7+
8+
---
9+
10+
{% include team-profile.html %}
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
---
2+
title: "Enhancing LLM Training Efficiency with Clad for Automatic Differentiation"
3+
layout: post
4+
excerpt: "This GSoC project leverages Clad to optimize LLM training in C++, aiming to boost efficiency by developing a custom tensor library and integrating Clad for compiler-level gradient calculations."
5+
sitemap: true
6+
author: Rohan Timmaraju
7+
permalink: blogs/gsoc25_rohan_introduction_blog/
8+
banner_image: /images/blog/LLM_project_banner.jpg
9+
date: 2025-05-21
10+
tags: gsoc c++ clang clad llm
11+
---
12+
13+
### Introduction
14+
15+
I am Rohan Timmaraju, a Computer Science student at Columbia University. During Google Summer of Code 2025, I will be working on the "Enhancing LLM Training Efficiency with Clad for Automatic Differentiation" project with the Compiler Research group.
16+
17+
**Mentors**: Vassil Vassilev, David Lange, Jonas Rembser, Christina Koutsou
18+
19+
### About LLM Training
20+
21+
Large Language Models (LLMs) like ChatGPT have revolutionized AI, but their training is incredibly computationally intensive. Currently, Python-based frameworks such as PyTorch and TensorFlow are the go-to tools. While they offer excellent flexibility and a rich ecosystem, their reliance on interpreted execution and dynamic computation graphs can lead to performance bottlenecks and high memory consumption. This is particularly noticeable when we consider deploying or training these models in resource-constrained environments or within C++-centric high-performance computing (HPC) setups, which are common in scientific research.
22+
23+
While C++ provides the tools for fine-grained control over system resources and has proven its capabilities in efficient LLM inference (as seen with projects like [llama.cpp](https://github.com/ggml-org/llama.cpp)), the critical component for *training* – flexible and efficient Automatic Differentiation (AD) – presents an ongoing challenge for C++ solutions.
24+
25+
### Why Use Clad?
26+
27+
This project proposes to tackle this challenge by integrating Clad, an Automatic Differentiation plugin for the Clang compiler. Unlike traditional AD libraries that often operate at runtime, Clad performs source-to-source transformation. It analyzes the C++ Abstract Syntax Tree (AST) at compile time and generates optimized C++ code for computing derivatives. This compiler-level approach has the potential to reduce runtime overhead and improve memory efficiency compared to dynamic methods.
28+
29+
To facilitate this integration, I am developing a custom C++ tensor library to be used in neural network training. Inspired by the powerful approaches of libraries such as [llm.c](https://github.com/karpathy/llm.c) and [pytorch](https://docs.pytorch.org/cppdocs/), this library is being designed from the ground up with Clad compatibility in mind. The core idea is to replace manual or internally managed gradient computations with Clad's reverse-mode AD (as in `clad::gradient`) for key LLM operations like matrix multiplications, activation functions, normalization layers, and the final loss function.
30+
31+
### Implementation Plan
32+
1. **Foundation & Baseline:** The implementation will start by implementing a complete GPT-2 training loop in C++ *without* Clad. This will serve as our performance baseline. GPT-2 is chosen here as a relatively simple open-source LLM architecture capable of being trained on local devices. This could be extended to other architectures like Llama or Mistral.
33+
2. **Core Clad Integration Strategy:** We will investigate and evaluate different strategies for applying Clad to tensor network gradient calculations, potentially also identifying potential areas where Clad itself could be enhanced for deep learning workloads.
34+
3. **Expanding Integration:** Once a promising strategy is identified and validated on simpler operations, we'll systematically integrate Clad into more complex components of the GPT-2 architecture.
35+
4. **Benchmarking & Optimization:** Benchmarking against our baseline will be crucial to quantify the performance gains (speed, memory). We'll also use profiling tools to identify bottlenecks and optimize the tensor library with Clad. OpenMP may be employed for parallelization to further boost performance.
36+
5. **Documentation & Potential Extensions:** Thorough documentation of the tensor library, the Clad integration process, and our findings will also be a primary focus. Time permitting, we'll explore extending this work to other LLM architectures like Llama.
37+
38+
39+
### Conclusion
40+
By successfully integrating Clad into a C++ LLM training pipeline, we aim to:
41+
* **Demonstrate Performance Gains:** Show tangible improvements in training speed and memory efficiency.
42+
* **Clad for ML:** Provide a significant real-world use case, potentially identifying areas for Clad's improvement in supporting ML tasks.
43+
* **Offer a C++ Alternative:** Provide a foundation for more efficient, compiler-driven LLM training within the C++ ecosystems.
44+
* **Learn and Share:** Gain insights into the practicalities of applying compiler-based AD to complex ML problems and share these learnings with the community.
45+
46+
I believe this project has the potential to make a valuable contribution to both the compiler research field and the ongoing efforts to make powerful AI models more accessible and efficient to train.
47+
48+
### Related Links
49+
50+
- [Project Description](https://hepsoftwarefoundation.org/gsoc/2025/proposal_Clad-LLM.html)
51+
- [Clad Repository](https://github.com/vgvassilev/clad)
52+
- [My GitHub Profile](https://github.com/Rohan-T144)
187 KB
Binary file not shown.

images/blog/LLM_project_banner.jpg

354 KB
Loading

images/team/Rohan_Timmaraju.jpg

294 KB
Loading

0 commit comments

Comments
 (0)