Skip to content

Conversation

@Jialin
Copy link
Collaborator

@Jialin Jialin commented Oct 2, 2025

Purpose

We found quite some os.env stack in the trace dump. But ideally, those environment results are NOT changed after process starts, so we should be caching the results to avoid recomputation.

Environment variables cache will be refreshed after the process initialization to allow environment variable overrides during server startups.

Test Plan & Test Result

_get_num_input_tokens took 11us without the PR and 5us with the caching.

Before
Screenshot 2025-10-02 at 4 35 23 PM

After
Screenshot 2025-10-02 at 4 35 36 PM


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a performance optimization by caching the results of environment variable lookups using @functools.cache. While this is a good optimization, I've identified a critical issue where this caching can interfere with functions that modify environment variables at runtime, such as set_vllm_use_v1. This could lead to stale configuration values being used. Please see my detailed comment.

@mgoin
Copy link
Member

mgoin commented Oct 3, 2025

Unfortunately we do mutate env vars throughout setup in various situations. Do you think we could change this to cache once we get past startup? Certainly once we are serving we expect nothing to change

@yeqcharlotte
Copy link
Collaborator

Unfortunately we do mutate env vars throughout setup in various situations. Do you think we could change this to cache once we get past startup? Certainly once we are serving we expect nothing to change

Honestly would be nice to also log those in-place env updates. I've been feeling quite confused about those while dong this.

@Jialin
Copy link
Collaborator Author

Jialin commented Oct 3, 2025

Unfortunately we do mutate env vars throughout setup in various situations. Do you think we could change this to cache once we get past startup? Certainly once we are serving we expect nothing to change

@mgoin Sure thing. We could introduce new API to invalidate and re-warmup the cache, and kick it off right before startup finished. And I could think off a few places to invoke this:

  1. EngineCoreProc.init:

    vllm/vllm/v1/engine/core.py

    Lines 531 to 537 in 0879736

    # Mark the startup heap as static so that it's ignored by GC.
    # Reduces pause times of oldest generation collections.
    gc.collect()
    gc.freeze()
    # If enable, attach GC debugger after static variable freeze.
    maybe_attach_gc_debug_callback()
  2. WorkerProc.worker_main after the worker marked itself as READY:
    # Send READY once we know everything is loaded
    ready_writer.send({
    "status":
    WorkerProc.READY_STR,
    "handle":
    worker.worker_response_mq.export_handle(),
    })

Will update the PR soonish, and thanks for pointing out the on-fly environment changes before startup.

@Jialin
Copy link
Collaborator Author

Jialin commented Oct 3, 2025

Honestly would be nice to also log those in-place env updates. I've been feeling quite confused about those while dong this.

@yeqcharlotte We might need to migrate all os.environ access via a new API, then we could wire up the logging and cache invalidation properly. However, there might not be a way to enforce everyone to use the new API :/

@Jialin
Copy link
Collaborator Author

Jialin commented Oct 13, 2025

CC @mgoin @yeqcharlotte for reviews after introducing cache reloads after each process initialization.

@Jialin
Copy link
Collaborator Author

Jialin commented Oct 13, 2025

Trying to address the precommit errors in #26742 which doesn't seem to be related to this PR.

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. I'm a bit worried about unintended changes to behavior that are hard to predict or catch in CI, but I believe this is better to figure out quickly. Thank you.

Let me know when precommit is resolved and I will enable full CI

@Jialin
Copy link
Collaborator Author

Jialin commented Oct 14, 2025

I'm a bit worried about unintended changes to behavior that are hard to predict or catch in CI, but I believe this is better to figure out quickly.

Yeah, totally! But I think it should be a legit assumption that environment variable SHOULD NOT change after service startup. (And we might need to fix forward if some logic doesn't follow this assumption).

On the other hand, I bet there could be more use cases similar to my recent GC debug changes which incorrectly use the ENV_VAR instead of vllm.envs.ENV_VAR which backed by getattr cache behind the scene. I might create an issue to followup to migrate ENV_VAR -> vllm.envs.ENV_VAR.

Let me know when precommit is resolved and I will enable full CI

Will nudge you again and after #26742 landed and I rebased this PR. Thanks in advance.

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 14, 2025
@Jialin Jialin changed the title [Perf][Easy] Cache vllm.env.__getattr__ result to avoid recomputation [Perf] Cache vllm.env.__getattr__ result to avoid recomputation Oct 14, 2025
@Jialin
Copy link
Collaborator Author

Jialin commented Oct 14, 2025

@mgoin I found quite a lot failing tests and rethink about the actual usage.

  1. Before service initialization, we should expect environment variables could change at any moment
  2. After service initialization, all the variables should be locked.

So I changed the implementation to wrap envs.getattr with functools.cache after service initialization instead. So we don't need to worry about 1. at all, and we would only need to see if there's any ongoing usage which violates 2. instead.

@mgoin
Copy link
Member

mgoin commented Oct 14, 2025

I agree, we should only enforce 2

@Jialin
Copy link
Collaborator Author

Jialin commented Oct 14, 2025

Let me know when precommit is resolved and I will enable full CI

@mgoin At least all CI passed now. Please let me know if you have other concerns we should address before merging.

@mgoin
Copy link
Member

mgoin commented Oct 14, 2025

LGTM, let's create the issue to move to migrate ENV_VAR -> vllm.envs.ENV_VAR to enforce/log any deviations in the future

@mgoin mgoin merged commit 380f175 into vllm-project:main Oct 14, 2025
46 checks passed
@Jialin
Copy link
Collaborator Author

Jialin commented Oct 14, 2025

LGTM, let's create the issue to move to migrate ENV_VAR -> vllm.envs.ENV_VAR to enforce/log any deviations in the future

Created issue #26854 which should be mostly addressed by #26810

@Jialin Jialin deleted the env branch October 14, 2025 23:13
Jonahcb pushed a commit to Jonahcb/vllm that referenced this pull request Oct 15, 2025
bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
Copy link
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jialin thanks for the great work! i think we should also call enable_envs_cache inside workers? right now it seems only the engine core / executor calls enable_envs_cache. then it won't work e.g. when we use spawn to create processes, or use ray to create remote processes.

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
@Jialin
Copy link
Collaborator Author

Jialin commented Oct 28, 2025

@Jialin thanks for the great work! i think we should also call enable_envs_cache inside workers? right now it seems only the engine core / executor calls enable_envs_cache. then it won't work e.g. when we use spawn to create processes, or use ray to create remote processes.

Thanks @youkaichao for the suggestions. We also ran into similar issue when we used external launcher (e.g. torchrun) to kick off processes (CC @22quinn )

I'm trying to further extend this in #27632.

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
Zhathw pushed a commit to Zhathw/vllm that referenced this pull request Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants