Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 0 additions & 13 deletions NOTICES.md

This file was deleted.

7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# vllm-triton-backend

:information_source: This repository was used to develop the now community-maintained [Triton Backend in vLLM V1 (`triton_attn`)](https://github.com/vllm-project/vllm/blob/main/vllm/v1/attention/backends/triton_attn.py). We consider the testing and microbenchmark scripts as well as the development tools (UBI container, proton viewer) still useful (and also use it ourselves), but the latest triton attention kernels are now maintained and developed in vLLM: [`vllm/vllm/attention/ops/`](https://github.com/vllm-project/vllm/tree/main/vllm/attention/ops). The kernels contained in this repository `vllm-triton-backend/ibm-triton-lib` are only updated on an unregular basis.
We may archive this repository in the near future.


* * *


This repo contains:

- A Triton-only attention backend for vLLM, implemented as [vLLM platform plugin](https://docs.vllm.ai/en/latest/design/plugin_system.html), see [`ibm-triton-lib/ibm_triton_lib/backend`](./ibm-triton-lib/ibm_triton_lib/backend/).
Expand Down
Binary file added doc/anatomy_of_a_triton_attention_kernel_ibm.pdf
Binary file not shown.
3 changes: 1 addition & 2 deletions scripts/offline_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,7 @@
from vllm.distributed import cleanup_dist_env_and_memory

llm = LLM(
# model="/mnt/nvme5n1p1/zrlngl/fmaas/models/llama3.1-8b-instruct/",
model="/net/storage149/autofs/css22/nmg/models/hf/meta-llama/Llama-3.1-8B-Instruct/main/",
model="./models/hf/meta-llama/Llama-3.1-8B-Instruct/main/",
# max_model_len=2048,
# enforce_eager=True,
enable_prefix_caching=False,
Expand Down
2 changes: 1 addition & 1 deletion vllm
Submodule vllm updated 2719 files