Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/source/generate_examples.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,14 @@
def fix_case(text: str) -> str:
subs = {
"api": "API",
"Cli": "CLI",
"cli": "CLI",
"cpu": "CPU",
"llm": "LLM",
"tpu": "TPU",
"aqlm": "AQLM",
"gguf": "GGUF",
"lora": "LoRA",
"rlhf": "RLHF",
"vllm": "vLLM",
"openai": "OpenAI",
"multilora": "MultiLoRA",
Expand Down
1 change: 1 addition & 0 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ features/compatibility_matrix
:maxdepth: 1

training/trl.md
training/rlhf.md

:::

Expand Down
11 changes: 11 additions & 0 deletions docs/source/training/rlhf.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather refer to openrlhf or verl. these are not complete examples, but just show some basic usage of how to enable rlhf.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've added references to TRL, OpenRLHF and verl. I've also qualified the examples saying that they are basic and should be used as inspiration if you don't want to use the previously mentioned libraries.

Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Reinforcement Learning from Human Feedback

Reinforcement Learning from Human Feedback (RLHF) is a technique that fine-tunes language models using human-generated preference data to align model outputs with desired behaviours.

vLLM can be used to generate the completions for RLHF. The best way to do this is with libraries like [TRL](https://github.com/huggingface/trl), [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF) and [verl](https://github.com/volcengine/verl).

See the following basic examples to get started if you don't want to use an existing library:

- [Training and inference processes are located on separate GPUs (inspired by OpenRLHF)](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf.html)
- [Training and inference processes are colocated on the same GPUs using Ray](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf_colocate.html)
- [Utilities for performing RLHF with vLLM](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf_utils.html)