docs: clarify remaining v0 references #26311

simon-mo · 2025-10-06T17:59:15Z

Summary

clarify the metrics design doc so the prometheus middleware note no longer references the legacy V0 engine migration
update the speculative decoding guide to state that draft-model support requires the V1 engine instead of pointing to the retired v0.10 release

Testing

not run (documentation changes only)

https://chatgpt.com/codex/tasks/task_e_68e3f11c47408329bf2324ac7b1ad7bf

gemini-code-assist

Code Review

This pull request provides a number of documentation updates to remove references to the legacy v0 engine and clarify concepts for the current v1 engine. The changes are well-executed across multiple files, improving the clarity and relevance of the documentation for users. The updates are consistent with the stated goals of the PR, and I have no further suggestions.

DarkLight1337 · 2025-10-06T18:23:46Z

docs/usage/v1_guide.md

-
-    We have started the process of deprecating V0. Please read [RFC #18571](gh-issue:18571) for more details.
-
 V1 is now enabled by default for all supported use cases, and we will gradually enable it for every use case we plan to support. Please share any feedback on [GitHub](https://github.com/vllm-project/vllm) or in the [vLLM Slack](https://inviter.co/vllm-slack).


Also update this paragraph?

DarkLight1337 · 2025-10-06T18:24:10Z

docs/usage/v1_guide.md

 | **Mamba Models**            | <nobr>🟢 (Mamba-2), 🟢 (Mamba-1)</nobr>                                            |
 | **Multimodal Models**       | <nobr>🟢 Functional</nobr>                                                         |

-vLLM V1 currently excludes model architectures with the `SupportsV0Only` protocol.


We should remove the V1 column from the Supported Models page and delete all models that don't support V1

simon-mo · 2025-10-06T18:02:48Z

docs/configuration/optimization.md

 Chunked prefill allows vLLM to process large prefills in smaller chunks and batch them together with decode requests. This feature helps improve both throughput and latency by better balancing compute-bound (prefill) and memory-bound (decode) operations.

-In vLLM V1, **chunked prefill is always enabled by default**. This is different from vLLM V0, where it was conditionally enabled based on model characteristics.
+In vLLM V1, **chunked prefill is always enabled by default** so that behavior is consistent across supported models.


Suggested change

In vLLM V1, **chunked prefill is always enabled by default** so that behavior is consistent across supported models.

In vLLM V1, **chunked prefill is always enabled by default**.

simon-mo · 2025-10-06T20:19:45Z

docs/design/metrics.md

There are probably some mistakes here. @markmc PTAL

Generally lgtm, although I guess my attitude is that design docs like these are naturally a snapshot in time of a design decision, but more discoverable than a random Google doc. It's really hard to be disciplined enough to keep a doc like this up to date

simon-mo · 2025-10-06T20:20:30Z

docs/design/multiprocessing.md

@njhill I guess this page can use a full clean up

simon-mo · 2025-10-06T20:21:53Z

docs/features/spec_decode.md

+    Speculative decoding with a draft model requires the V1 engine.
+    Older releases that predate V1 (such as the 0.10.x series) raise a `NotImplementedError`.


Suggested change

Speculative decoding with a draft model requires the V1 engine.

Older releases that predate V1 (such as the 0.10.x series) raise a `NotImplementedError`.

Speculative decoding with a draft model is not supported in vLLM V1 version.

You can use older version before the 0.10x series to continue to leverage it.

simon-mo · 2025-10-06T20:22:08Z

docs/models/supported_models.md

@DarkLight1337 PTAL

We should remove the V1 column from the Supported Models page and delete all models that don't support V1

LGTM after doing this

simon-mo · 2025-10-06T20:22:47Z

docs/usage/v1_guide.md

We can probably gradually remove this docs

mergify · 2025-10-08T12:08:19Z

Documentation preview: https://vllm--26311.org.readthedocs.build/en/26311/

markmc · 2025-10-08T15:14:35Z

docs/design/metrics.md


-### Multi-process Mode
-
-In v0, metrics are collected in the engine core process and we use multiprocess mode to make them available in the API server process. See <gh-pr:7279>.


Metrics are still collected in the API server process, but multiprocess mode was reinstated by #17546 in order to share metrics state between API server processes

markmc · 2025-10-08T15:15:16Z

docs/design/metrics.md

-This is relevant because if we move away from multiprocess mode in v1,
-we get these back. However, it's questionable how relevant these are
-if they don't aggregate these stats for all processes that make up a
-vLLM instance.


Yeah, so these are gone again

markmc · 2025-10-08T15:19:06Z

docs/design/metrics.md


 Since metrics is a big enough topic on its own, we are going to tackle
-the topic of tracing in v1 separately.
+the topic of tracing separately.


Tracing has since been reinstated - #20372

markmc · 2025-10-08T15:21:35Z

docs/design/metrics.md

Generally lgtm, although I guess my attitude is that design docs like these are naturally a snapshot in time of a design decision, but more discoverable than a random Google doc. It's really hard to be disciplined enough to keep a doc like this up to date

mergify · 2025-10-11T05:20:16Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @simon-mo.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

docs: clarify remaining v0 references

944913c

simon-mo added the codex label Oct 6, 2025 — with ChatGPT Codex Connector

mergify bot added the documentation Improvements or additions to documentation label Oct 6, 2025

gemini-code-assist bot reviewed Oct 6, 2025

View reviewed changes

DarkLight1337 reviewed Oct 6, 2025

View reviewed changes

simon-mo commented Oct 6, 2025

View reviewed changes

markmc reviewed Oct 8, 2025

View reviewed changes

mergify bot added the needs-rebase label Oct 11, 2025


		We have started the process of deprecating V0. Please read [RFC #18571](gh-issue:18571) for more details.

		V1 is now enabled by default for all supported use cases, and we will gradually enable it for every use case we plan to support. Please share any feedback on [GitHub](https://github.com/vllm-project/vllm) or in the [vLLM Slack](https://inviter.co/vllm-slack).

	In vLLM V1, chunked prefill is always enabled by default so that behavior is consistent across supported models.
	In vLLM V1, chunked prefill is always enabled by default.

		Speculative decoding with a draft model requires the V1 engine.
		Older releases that predate V1 (such as the 0.10.x series) raise a `NotImplementedError`.


		### Multi-process Mode

		In v0, metrics are collected in the engine core process and we use multiprocess mode to make them available in the API server process. See <gh-pr:7279>.

Uh oh!

docs: clarify remaining v0 references #26311

Are you sure you want to change the base?

docs: clarify remaining v0 references #26311

Uh oh!

Conversation

simon-mo commented Oct 6, 2025

Summary

Testing

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Oct 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Oct 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DarkLight1337 Oct 6, 2025 •

edited

Loading