-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
[LoRA][2/2]Remove LoRA extra vocab #28545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LoRA][2/2]Remove LoRA extra vocab #28545
Conversation
Signed-off-by: Jee Jee Li <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is part of an effort to remove the LoRA extra vocabulary feature. The changes to the model implementations in granite.py and teleflm.py are consistent with this goal, and tests related to the extra vocabulary feature have been correctly removed. However, I've found a critical issue in tests/lora/test_lora_manager.py where a change breaks a test, which will need to be addressed.
tests/lora/test_lora_manager.py
Outdated
| new_embeddings = load_file( | ||
| os.path.join(sql_lora_files, "new_embeddings.safetensors") | ||
| ) | ||
| new_embeddings = load_file(os.path.join(sql_lora_files, "")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
load_file(os.path.join(sql_lora_files, "")) will attempt to load from a directory path, which will raise an IsADirectoryError and cause the test to fail.
Since new embeddings are being removed, new_embeddings should likely be an empty dictionary.
Please note that changing this line to new_embeddings = {} will reveal another issue in this test: a KeyError will be raised on line 83. The test logic from line 77 onwards needs to be updated to reflect that lora.embeddings_tensor is now always None. The if/else block can be simplified to assert lora.embeddings_tensor is None.
| new_embeddings = load_file(os.path.join(sql_lora_files, "")) | |
| new_embeddings = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| @pytest.mark.parametrize("device", DEVICES) | ||
| def test_from_lora_tensors(sql_lora_files, device): | ||
| tensors = load_file(os.path.join(sql_lora_files, "adapter_model.safetensors")) | ||
| new_embeddings = load_file( | ||
| os.path.join(sql_lora_files, "new_embeddings.safetensors") | ||
| ) | ||
| new_embeddings = load_file(os.path.join(sql_lora_files, "")) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid loading deleted new_embeddings file
The updated test still calls load_file(os.path.join(sql_lora_files, "")), which resolves to the LoRA directory itself. safetensors.torch.load_file only accepts paths to .safetensors files and will raise IsADirectoryError, so test_from_lora_tensors now crashes before exercising any behaviour. If extra vocab embeddings are no longer used, this load should be dropped or replaced with a stub so the test can run.
Useful? React with 👍 / 👎.
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
vllm/vllm/v1/worker/tpu_model_runner.py
Lines 219 to 223 in 808b6e0
| self.hidden_size = model_config.get_hidden_size() | |
| self.vocab_size = model_config.get_vocab_size() | |
| if self.lora_config is not None: | |
| self.vocab_size += self.lora_config.lora_extra_vocab_size |
The LoRAConfig dataclass no longer exposes lora_extra_vocab_size, but the TPU model runner continues to access self.lora_config.lora_extra_vocab_size when computing the model vocabulary. Any LoRA-enabled run on TPU will now raise an AttributeError during initialization, preventing TPU LoRA serving altogether. Consider removing this addition or guarding it behind a compatibility shim so TPU paths remain functional.
vllm/vllm/lora/punica_wrapper/punica_tpu.py
Lines 289 to 296 in 808b6e0
| def _update_base_metadata( | |
| self, | |
| mapping: "LoRAMapping", | |
| lora_index_to_id: list[int | None], | |
| max_loras: int, | |
| vocab_size: int, | |
| extra_vocab_size: int, | |
| ): |
PunicaWrapperBase.update_metadata now calls _update_base_metadata with four positional parameters (mapping, lora_index_to_id, max_loras, vocab_size) and unconditionally sets extra_vocab_size to 0 internally. The TPU implementation still overrides _update_base_metadata with a five‑argument signature, so the call from the base class will raise TypeError: _update_base_metadata() missing 1 required positional argument whenever TPU LoRA metadata is updated. The override should drop the extra_vocab_size parameter or accept a default to keep the method compatible.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
7b97113 to
5913a48
Compare
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: LuminolT <[email protected]>
Purpose
In this PR:
lora_extra_vocab_sizeandlora_vocab_padding_sizeFIX [RFC]: Disallow extra vocab for LoRA #23474meta-llama/Llama-3.2-3B-Instructto replace the previous llama2-7B LoRA model, to test adding LoRA to thelm_headandembeddinglayers and reduce the CI test pressureQwen/Qwen3-0.6Bmodel for related LoRA testing can also reduce the testing pressure on CIWill continue to clean up the related code subsequently
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.