Skip to content

Assistant decoding w. Llava-Next does not work #35450

@ddehun

Description

@ddehun

System Info

transformers==4.47.0

Who can help?

Hello @zucchini-nlp,

I tried using Llava-Next 7b and 13b for assistant decoding, but I encountered some errors as below.
Could you please provide advice on how to resolve these issues?

During debugging, I noticed that the assistant model successfully completed the first round of drafting, but errors occurred during the second round of drafting.

Thank you in advance for your help!

Code

from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
from PIL import Image
import requests

main_model = LlavaNextForConditionalGeneration.from_pretrained(
    "llava-hf/llava-v1.6-vicuna-13b-hf",
    load_in_4bit=True,
    low_cpu_mem_usage=True,
).eval()

assistant_model = LlavaNextForConditionalGeneration.from_pretrained(
    "llava-hf/llava-v1.6-vicuna-7b-hf",
    load_in_4bit=True,
    low_cpu_mem_usage=True,
).eval()
processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-vicuna-13b-hf")

url = "https://raw.githubusercontent.com/haotian-liu/LLaVA/main/images/llava_v1_5_radar.jpg"
image = Image.open(requests.get(url, stream=True).raw)
conversation = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "What is shown in this image?"},
            {"type": "image"},
        ],
    }
]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
inputs = processor(images=image, text=prompt, return_tensors="pt").to("cuda:0")
outputs = main_model.generate(**inputs, assistant_model=assistant_model, max_new_tokens=100, num_assistant_tokens=5)

Error

  File "MY_COMPUTER_PATH/codes/multi-spec/test.py", line 55, in <module>                                                                 [11/1861]
    outputs = main_model.generate(**inputs, assistant_model=assistant_model, max_new_tokens=100, num_assistant_tokens=5)
  File "MY_COMPUTER_PATH/anaconda3/envs/spec_env/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "MY_COMPUTER_PATH/anaconda3/envs/spec_env/lib/python3.9/site-packages/transformers/generation/utils.py", line 2199, in generate
    result = self._assisted_decoding(
  File "MY_COMPUTER_PATH/anaconda3/envs/spec_env/lib/python3.9/site-packages/transformers/generation/utils.py", line 4271, in _assisted_decoding
    candidate_input_ids, candidate_logits = candidate_generator.get_candidates(input_ids)
  File "MY_COMPUTER_PATH/anaconda3/envs/spec_env/lib/python3.9/site-packages/transformers/generation/candidate_generator.py", line 243, in get_can
didates
    assistant_output = self.assistant_model.generate(**assistant_generation_kwargs, **self.assistant_kwargs)
  File "MY_COMPUTER_PATH/anaconda3/envs/spec_env/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "MY_COMPUTER_PATH/anaconda3/envs/spec_env/lib/python3.9/site-packages/transformers/generation/utils.py", line 2256, in generate
    result = self._sample(
  File "MY_COMPUTER_PATH/anaconda3/envs/spec_env/lib/python3.9/site-packages/transformers/generation/utils.py", line 3255, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "MY_COMPUTER_PATH/anaconda3/envs/spec_env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "MY_COMPUTER_PATH/anaconda3/envs/spec_env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "MY_COMPUTER_PATH/anaconda3/envs/spec_env/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "MY_COMPUTER_PATH/anaconda3/envs/spec_env/lib/python3.9/site-packages/transformers/models/llava_next/modeling_llava_next.py", line 873, in
forward
    inputs_embeds = inputs_embeds.to(image_features.dtype)
AttributeError: 'NoneType' object has no attribute 'dtype'

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
from PIL import Image
import requests

main_model = LlavaNextForConditionalGeneration.from_pretrained(
    "llava-hf/llava-v1.6-vicuna-13b-hf",
    load_in_4bit=True,
    low_cpu_mem_usage=True,
).eval()

assistant_model = LlavaNextForConditionalGeneration.from_pretrained(
    "llava-hf/llava-v1.6-vicuna-7b-hf",
    load_in_4bit=True,
    low_cpu_mem_usage=True,
).eval()
processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-vicuna-13b-hf")

url = "https://raw.githubusercontent.com/haotian-liu/LLaVA/main/images/llava_v1_5_radar.jpg"
image = Image.open(requests.get(url, stream=True).raw)
conversation = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "What is shown in this image?"},
            {"type": "image"},
        ],
    }
]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
inputs = processor(images=image, text=prompt, return_tensors="pt").to("cuda:0")
outputs = main_model.generate(**inputs, assistant_model=assistant_model, max_new_tokens=100, num_assistant_tokens=5)

Expected behavior

Assistant decoding works well

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions