Skip to content

Conversation

WoosukKwon
Copy link
Collaborator

This PR adds support for the GPT-NeoX (Pythia) model, which is the backbone of many popular models including Dolly V2, Stable-LM, and Open Assistant.

@WoosukKwon
Copy link
Collaborator Author

NOTE: Dolly V2 is not supported by this PR, because it uses Bfloat16, which some of our kernel do not support. It will be added by another PR.

@WoosukKwon WoosukKwon requested a review from zhuohan123 April 26, 2023 11:03
@WoosukKwon WoosukKwon linked an issue Apr 26, 2023 that may be closed by this pull request
Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! See comments for more details.


def initialize_dummy_weights(self) -> None:
for param in self.state_dict().values():
param.data.uniform_(-0.1, 0.1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The U(-0.1, 0.1) initialization will lead to many out-of-ranges and NaNs during the model execution. Maybe use a smaller range like U(-1e-5, 1e-5)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The (-0.1, 0.1) initialization actually works. However, to be cautious, I changed the range to (-1e-3, 1e-3).

self.max_position = 8192
self.tie_word_embeddings = config.tie_word_embeddings

def get_param_size(self) -> int:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get the parameter size by counting the actual parameters after the models get initialized? Use some code like the following:

mem_params = sum([param.nelement()*param.element_size() for param in model.parameters()])

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Let's do that in another PR.

return dtype_size * total

def get_max_num_gpu_blocks(
def get_max_act_size(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, can we profile the actual max activation size by running the model once without any KV cache?

@WoosukKwon WoosukKwon merged commit a96d63c into main Apr 28, 2023
@WoosukKwon WoosukKwon deleted the gpt-neox branch April 28, 2023 08:00
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
yukavio pushed a commit to yukavio/vllm that referenced this pull request Jul 3, 2024
SUMMARY:
* "remote push" job for multi-gpu runner.
* "remote push" job for single gpu runner.
* patches for re-initialization of "ray". found other places in `vllm`
where they are passing in `ignore_reinit_error=True`, it just looked
like they missed a couple of places.
* patch "find" command to only find *.py files starting with "test_".


TEST PLAN:
runs on remote push

---------

Co-authored-by: andy-neuma <[email protected]>
dllehr-amd pushed a commit to dllehr-amd/vllm that referenced this pull request Jul 22, 2024
* update quark quantizer command

* typo

* Using scaled_mm for untuned gemm

* remove comment

* fix yapf
JHLEE17 pushed a commit to JHLEE17/vllm that referenced this pull request Aug 1, 2024
@alixiaodi alixiaodi mentioned this pull request Aug 2, 2024
wuhuikx pushed a commit to wuhuikx/vllm that referenced this pull request Mar 27, 2025
### What this PR does / why we need it?

This PR updates the dependency version of vllm-ascend on torch-npu, so
that the vllm-ascend can be installed in a later version environment
(like to torch-npu 2.6.0rc1),

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI Test

Signed-off-by: ji-huazhong <[email protected]>
heheda12345 pushed a commit to heheda12345/vllm that referenced this pull request Sep 29, 2025
wuhang2014 pushed a commit to wuhang2014/vllm that referenced this pull request Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for Stable-LM and OpenAssistant

2 participants