Skip to content

Conversation

@jeejeelee
Copy link
Collaborator

@jeejeelee jeejeelee commented Nov 12, 2025

Purpose

In this PR:

  • Clean up all code related to lora_extra_vocab_size and lora_vocab_padding_size FIX [RFC]: Disallow extra vocab for LoRA #23474
  • Train the LoRA for meta-llama/Llama-3.2-3B-Instruct to replace the previous llama2-7B LoRA model, to test adding LoRA to the lm_head and embedding layers and reduce the CI test pressure
  • Using the Qwen/Qwen3-0.6B model for related LoRA testing can also reduce the testing pressure on CI

Will continue to clean up the related code subsequently

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Jee Jee Li <[email protected]>
@jeejeelee jeejeelee marked this pull request as draft November 12, 2025 10:39
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is part of an effort to remove the LoRA extra vocabulary feature. The changes to the model implementations in granite.py and teleflm.py are consistent with this goal, and tests related to the extra vocabulary feature have been correctly removed. However, I've found a critical issue in tests/lora/test_lora_manager.py where a change breaks a test, which will need to be addressed.

new_embeddings = load_file(
os.path.join(sql_lora_files, "new_embeddings.safetensors")
)
new_embeddings = load_file(os.path.join(sql_lora_files, ""))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

load_file(os.path.join(sql_lora_files, "")) will attempt to load from a directory path, which will raise an IsADirectoryError and cause the test to fail.

Since new embeddings are being removed, new_embeddings should likely be an empty dictionary.

Please note that changing this line to new_embeddings = {} will reveal another issue in this test: a KeyError will be raised on line 83. The test logic from line 77 onwards needs to be updated to reflect that lora.embeddings_tensor is now always None. The if/else block can be simplified to assert lora.embeddings_tensor is None.

Suggested change
new_embeddings = load_file(os.path.join(sql_lora_files, ""))
new_embeddings = {}

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 48 to 52
@pytest.mark.parametrize("device", DEVICES)
def test_from_lora_tensors(sql_lora_files, device):
tensors = load_file(os.path.join(sql_lora_files, "adapter_model.safetensors"))
new_embeddings = load_file(
os.path.join(sql_lora_files, "new_embeddings.safetensors")
)
new_embeddings = load_file(os.path.join(sql_lora_files, ""))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid loading deleted new_embeddings file

The updated test still calls load_file(os.path.join(sql_lora_files, "")), which resolves to the LoRA directory itself. safetensors.torch.load_file only accepts paths to .safetensors files and will raise IsADirectoryError, so test_from_lora_tensors now crashes before exercising any behaviour. If extra vocab embeddings are no longer used, this load should be dropped or replaced with a stub so the test can run.

Useful? React with 👍 / 👎.

@mergify mergify bot added llama Related to Llama models tpu Related to Google TPUs labels Nov 13, 2025
@mergify mergify bot removed the tpu Related to Google TPUs label Nov 16, 2025
@jeejeelee jeejeelee marked this pull request as ready for review November 16, 2025 07:37
@jeejeelee jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 16, 2025
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

self.hidden_size = model_config.get_hidden_size()
self.vocab_size = model_config.get_vocab_size()
if self.lora_config is not None:
self.vocab_size += self.lora_config.lora_extra_vocab_size

P1 Badge TPU runner still references removed LoRA extra vocab attr

The LoRAConfig dataclass no longer exposes lora_extra_vocab_size, but the TPU model runner continues to access self.lora_config.lora_extra_vocab_size when computing the model vocabulary. Any LoRA-enabled run on TPU will now raise an AttributeError during initialization, preventing TPU LoRA serving altogether. Consider removing this addition or guarding it behind a compatibility shim so TPU paths remain functional.


def _update_base_metadata(
self,
mapping: "LoRAMapping",
lora_index_to_id: list[int | None],
max_loras: int,
vocab_size: int,
extra_vocab_size: int,
):

P1 Badge Punica TPU metadata hook requires extra arg no longer passed

PunicaWrapperBase.update_metadata now calls _update_base_metadata with four positional parameters (mapping, lora_index_to_id, max_loras, vocab_size) and unconditionally sets extra_vocab_size to 0 internally. The TPU implementation still overrides _update_base_metadata with a five‑argument signature, so the call from the base class will raise TypeError: _update_base_metadata() missing 1 required positional argument whenever TPU LoRA metadata is updated. The override should drop the extra_vocab_size parameter or accept a default to keep the method compatible.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@jeejeelee jeejeelee removed the ready ONLY add when PR is ready to merge/full CI is needed label Nov 16, 2025
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
@mergify mergify bot added v1 tpu Related to Google TPUs labels Nov 16, 2025
@jeejeelee jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 16, 2025
@jeejeelee jeejeelee removed the ready ONLY add when PR is ready to merge/full CI is needed label Nov 17, 2025
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
@jeejeelee jeejeelee force-pushed the remove-lora-extra-vocab-2nd branch from 7b97113 to 5913a48 Compare November 17, 2025 05:11
@jeejeelee jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 17, 2025
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
@jeejeelee jeejeelee changed the title [LoRA][2/N]Remove LoRA extra vocab [LoRA][2/2]Remove LoRA extra vocab Nov 17, 2025
@jeejeelee jeejeelee merged commit 9875be6 into vllm-project:main Nov 21, 2025
60 checks passed
@jeejeelee jeejeelee deleted the remove-lora-extra-vocab-2nd branch November 21, 2025 01:46
LuminolT pushed a commit to LuminolT/vllm that referenced this pull request Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llama Related to Llama models ready ONLY add when PR is ready to merge/full CI is needed tpu Related to Google TPUs v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC]: Disallow extra vocab for LoRA

2 participants