-
Notifications
You must be signed in to change notification settings - Fork 30.9k
Closed
Description
System Info
transformers==4.47.0
Who can help?
Hello @zucchini-nlp,
I tried using Llava-Next 7b and 13b for assistant decoding, but I encountered some errors as below.
Could you please provide advice on how to resolve these issues?
During debugging, I noticed that the assistant model successfully completed the first round of drafting, but errors occurred during the second round of drafting.
Thank you in advance for your help!
Code
from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
from PIL import Image
import requests
main_model = LlavaNextForConditionalGeneration.from_pretrained(
"llava-hf/llava-v1.6-vicuna-13b-hf",
load_in_4bit=True,
low_cpu_mem_usage=True,
).eval()
assistant_model = LlavaNextForConditionalGeneration.from_pretrained(
"llava-hf/llava-v1.6-vicuna-7b-hf",
load_in_4bit=True,
low_cpu_mem_usage=True,
).eval()
processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-vicuna-13b-hf")
url = "https://raw.githubusercontent.com/haotian-liu/LLaVA/main/images/llava_v1_5_radar.jpg"
image = Image.open(requests.get(url, stream=True).raw)
conversation = [
{
"role": "user",
"content": [
{"type": "text", "text": "What is shown in this image?"},
{"type": "image"},
],
}
]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
inputs = processor(images=image, text=prompt, return_tensors="pt").to("cuda:0")
outputs = main_model.generate(**inputs, assistant_model=assistant_model, max_new_tokens=100, num_assistant_tokens=5)
Error
File "MY_COMPUTER_PATH/codes/multi-spec/test.py", line 55, in <module> [11/1861]
outputs = main_model.generate(**inputs, assistant_model=assistant_model, max_new_tokens=100, num_assistant_tokens=5)
File "MY_COMPUTER_PATH/anaconda3/envs/spec_env/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "MY_COMPUTER_PATH/anaconda3/envs/spec_env/lib/python3.9/site-packages/transformers/generation/utils.py", line 2199, in generate
result = self._assisted_decoding(
File "MY_COMPUTER_PATH/anaconda3/envs/spec_env/lib/python3.9/site-packages/transformers/generation/utils.py", line 4271, in _assisted_decoding
candidate_input_ids, candidate_logits = candidate_generator.get_candidates(input_ids)
File "MY_COMPUTER_PATH/anaconda3/envs/spec_env/lib/python3.9/site-packages/transformers/generation/candidate_generator.py", line 243, in get_can
didates
assistant_output = self.assistant_model.generate(**assistant_generation_kwargs, **self.assistant_kwargs)
File "MY_COMPUTER_PATH/anaconda3/envs/spec_env/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "MY_COMPUTER_PATH/anaconda3/envs/spec_env/lib/python3.9/site-packages/transformers/generation/utils.py", line 2256, in generate
result = self._sample(
File "MY_COMPUTER_PATH/anaconda3/envs/spec_env/lib/python3.9/site-packages/transformers/generation/utils.py", line 3255, in _sample
outputs = self(**model_inputs, return_dict=True)
File "MY_COMPUTER_PATH/anaconda3/envs/spec_env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "MY_COMPUTER_PATH/anaconda3/envs/spec_env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "MY_COMPUTER_PATH/anaconda3/envs/spec_env/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = module._old_forward(*args, **kwargs)
File "MY_COMPUTER_PATH/anaconda3/envs/spec_env/lib/python3.9/site-packages/transformers/models/llava_next/modeling_llava_next.py", line 873, in
forward
inputs_embeds = inputs_embeds.to(image_features.dtype)
AttributeError: 'NoneType' object has no attribute 'dtype'
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
from PIL import Image
import requests
main_model = LlavaNextForConditionalGeneration.from_pretrained(
"llava-hf/llava-v1.6-vicuna-13b-hf",
load_in_4bit=True,
low_cpu_mem_usage=True,
).eval()
assistant_model = LlavaNextForConditionalGeneration.from_pretrained(
"llava-hf/llava-v1.6-vicuna-7b-hf",
load_in_4bit=True,
low_cpu_mem_usage=True,
).eval()
processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-vicuna-13b-hf")
url = "https://raw.githubusercontent.com/haotian-liu/LLaVA/main/images/llava_v1_5_radar.jpg"
image = Image.open(requests.get(url, stream=True).raw)
conversation = [
{
"role": "user",
"content": [
{"type": "text", "text": "What is shown in this image?"},
{"type": "image"},
],
}
]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
inputs = processor(images=image, text=prompt, return_tensors="pt").to("cuda:0")
outputs = main_model.generate(**inputs, assistant_model=assistant_model, max_new_tokens=100, num_assistant_tokens=5)
Expected behavior
Assistant decoding works well
zucchini-nlp