-
Notifications
You must be signed in to change notification settings - Fork 30.9k
Chameleon: add model #31534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chameleon: add model #31534
Conversation
Co-authored-by: Jacob Kahn <[email protected]> Co-authored-by: Leonid Shamis <[email protected]>
Co-authored-by: Arthur <[email protected]>
…d to the modeling file
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
# we need to expand on num_heads because there was not sharding done in 7B model | ||
# and we need to calculate mean/var over each head_dim | ||
# for sharded model we don't do expansion and simply do norm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should be able to bake that by updating the alpha and beta for model parallelisme
# permute key/value to use transformers RoPE implementation (see for more: https://github.com/huggingface/transformers/issues/25199) | ||
# NOTE: permutation is done same way as in llama conversion script | ||
key_states = key_states.view(-1, self.num_key_value_heads, self.head_dim // 2, 2).transpose(3, 2) | ||
query_states = query_states.view(-1, self.num_heads, self.head_dim // 2, 2).transpose(3, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should permute everything in the weights
def __init__(self, hidden_size, *args, **kwargs): | ||
super().__init__(hidden_size, *args, **kwargs) | ||
self.normalized_shape = (hidden_size[-1],) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean we are computing over say "head_dim" ?
How different is this from normal nn.layer_norm((head_dim,))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah the weights are different (hidden_size) but the applied dim is hidden_size
is that it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's only the weights that are different for 30B model. The 7B has simple repeated weights over all heads
|
||
for token in tokenizer_config["added_tokens"]: | ||
if token["content"] == "<reserved08707>": | ||
token["content"] = "<image>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zucchini-nlp @ArthurZucker We should also set token["special"] = False
so that we can decode this token.
What do you guys think?
(It's what I'm currently doing in my PR btw and I haven't encountered any errors yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Special tokens should be decodable with skip_special_tokens=False
For my understanding, why do we need to decode the image token? Afaik it shouldn't affect image generation because it's a token we added manually to keep track of where to add an image in the text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 great work
What does this PR do?
Fixes #31505.
Adds Chameleon, a vision language model from Meta AI.
Project repo: https://github.com/facebookresearch/chameleon
Paper: https://arxiv.org/abs/2405.09818v1