-
Notifications
You must be signed in to change notification settings - Fork 251
Added support for Multimodal eval #1499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1499
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 815966c with merge base 4d8bab5 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Haven't sat down and give it a full test run, but left some initial thoughts
@@ -130,5 +130,5 @@ if [[ -x "$(command -v nvidia-smi)" ]]; then | |||
fi | |||
( | |||
set -x | |||
$PIP_EXECUTABLE install evaluate=="0.4.3" lm-eval=="0.4.2" psutil=="6.0.0" | |||
$PIP_EXECUTABLE install evaluate=="0.4.3" lm-eval=="0.4.7" psutil=="6.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beyond the scope of this PR, but the duplicated requirements in here vs requirements.txt will be collapsed when we introduce packaging
torchchat/cli/cli.py
Outdated
type=str, | ||
default="text", | ||
choices=["text", "text-image"], | ||
# help=argparse.SUPPRESS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# help=argparse.SUPPRESS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this arg is only used for evaluation, let's bump it into _add_evaluation_args()
below
torchchat/cli/builder.py
Outdated
@@ -71,6 +71,7 @@ class BuilderArgs: | |||
dynamic_shapes: bool = False | |||
max_seq_length: Optional[int] = None | |||
attention_backend: str = "math" | |||
modality: Optional[str] = "text" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modality
isn't super related to the builderargs, so let's leave it out. I commented in the Argparser with details
torchchat/usages/eval.py
Outdated
@@ -223,6 +482,57 @@ def eval( | |||
return eval_results | |||
|
|||
|
|||
def multi_model_eval( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this and eval()
are fairly similar. Mind combining them?
@Jack-Khuu sorry for the delay Not sure what's wrong in test-gpu-eval-sanity-check (cuda, stories15M) / linux-job though. tried running the test in main on cpu but that fails too for me |
Sorry on the delay on my side as well. I just kicked off the jobs again. Let's see what's going on |
@Jack-Khuu |
Found the problem. rolling_token_windows = list(
map(
utils.make_disjoint_window,
utils.get_rolling_token_windows(
token_list=self.tok_encode(string),
prefix_token=self.eot_token_id,
max_seq_len=self.max_length,
context_len=1,
),
)
) to (link) rolling_token_windows: List[Tuple[List[int], List[int]]] = list(
map(
utils.make_disjoint_window,
utils.get_rolling_token_windows(
token_list=self.tok_encode(string),
prefix_token=self.prefix_token_id,
max_seq_len=self.max_length,
context_len=1,
),
)
) HFLM's |
@Jack-Khuu can you re-run the tests please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes are looking great, we should be good to land soon
Were you able to validate whether normal `pytorch generate works with the tokenizer changes btw?
https://github.com/pytorch/torchchat/blob/main/docs/multimodal.md
torchchat/usages/eval.py
Outdated
from torchtune import utils | ||
from torchtune.data import ( | ||
format_content_with_images, | ||
left_pad_sequence, | ||
Message, | ||
padded_collate_tiled_images_and_mask, | ||
) | ||
from torchtune.generation import generate, sample | ||
|
||
from torchtune.modules.common_utils import local_kv_cache | ||
from torchtune.modules.model_fusion import DeepFusionModel | ||
from torchtune.modules.transforms import Transform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move the imports into VLMEvalWrapper.init()
We're consciously recognizing that doing so is considered bad style, but this reduces the import overhead/requirements for users not using torchtune
(You'll need to update the typehint in the model definition with strings since they don't get defined until init is called)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you able to validate whether normal `pytorch generate works with the tokenizer changes btw?
https://github.com/pytorch/torchchat/blob/main/docs/multimodal.md
Just did, and it threw an exception.
Need to make changes for Llama3VisionTranform
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid scope creeping this PR, how about we undo the tokenizer changes you made builder.py
and push the tokenizer resolution just within eval.py
For example within eval we can try:
elif modality == "text-image":
... llama3_2_vision_transform(path=str(self.tokenizer_path))
torchchat/usages/eval.py
Outdated
# Having the imports here allow running other evals without installing torchtune | ||
from torchtune import utils | ||
from torchtune.data import ( | ||
format_content_with_images, | ||
left_pad_sequence, | ||
Message, | ||
padded_collate_tiled_images_and_mask, | ||
) | ||
from torchtune.generation import generate, sample | ||
|
||
from torchtune.modules.common_utils import local_kv_cache | ||
from torchtune.modules.model_fusion import DeepFusionModel | ||
from torchtune.modules.transforms import Transform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need this one layer deeper inside of init
Class definition is always executed even if an class instance isn't made
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need this one layer deeper inside of init
Class definition is always executed even if an class instance isn't made
oh, I did not know this. One learns something every day :)
Let's move the imports into VLMEvalWrapper.init()
We're consciously recognizing that doing so is considered bad style, but this reduces the import overhead/requirements for users not using torchtune
(You'll need to update the typehint in the model definition with strings since they don't get defined until init is called)
I did, and then ran a text only eval: python torchchat.py eval stories15M --dtype fp32 --limit 5
after uninstalling torchtune
.
It failed because we have torchtune
imports in model.py
too:
Lines 40 to 47 in 4d8bab5
from torchtune.models.clip import clip_vision_encoder | |
from torchtune.models.llama3_1._component_builders import llama3_1 as llama3_1_builder | |
from torchtune.models.llama3_2_vision._component_builders import ( | |
llama3_2_vision_decoder, | |
llama3_2_vision_encoder, | |
) | |
from torchtune.modules.model_fusion import DeepFusionModel | |
I suppose model.py
would be used by most/all flows in torchchat. If this is the case, do we still need to move torchtune
imports inside VLMEvalWrapper
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It failed because we have torchtune imports in model.py too:
This is expected atm, we're still in the process of teasing out the torchtune dependencies, but what you have here will help that effort.
Thanks for updating the changes!!! I'll give this another test tomorrow and we should be good to merge in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verified on A100
Great work @anirudhs001 merging in
PR for #1334
Used
VLMEvalWrapper
andLlama3VisionTransform
from torchtune to support evaluation for multimodal models (llama3.2 11b only for now).Bumped up lm_eval to
lm_eval==0.4.7
to useHFMultimodalLM
, the class thatVLMEvalWrapper
inherits from.A sample run for mmmu_val_art:
And with a limit of 1 sample: