Extremely Low Performance with LLaVA-1.5-7B on MM-Vet Benchmark

Hi, thanks for your great work. I am reproducing the evaluation results with the latest codebase and also the latest LLaVA codebase. The results of other benchmarks are matched or have minor differences. However, the performance with MM-Vet is very low. Could you please check the evaluation with MM-Vet from your side? Or could you please tell me what I should be care of?  Thank you!

|Tasks|Version|Filter|n-shot|    Metric    |Value |   |Stderr|
|-----|-------|------|-----:|--------------|-----:|---|------|
|mmvet|Yaml   |none  |     0|gpt_eval_score|1.3761|±  |N/A   |

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extremely Low Performance with LLaVA-1.5-7B on MM-Vet Benchmark #80

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Extremely Low Performance with LLaVA-1.5-7B on MM-Vet Benchmark #80

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions