[Bug]: Deepseek-v3 performace on benchmark didn't match with paper

### Your current environment

<details>
<summary>The output of `python collect_env.py`</summary>

```text
Your output of `python collect_env.py` here
```

</details>


### Model Input Dumps

_No response_

### 🐛 Describe the bug

Hi guys,

I used vllm to serve deepseek-v3 while I found the benchmark didn't reproduce the result on paper. Specifically, in my case deepseek-v3 got 82 on CEval comparing to 90 on paper. 

Are there any details I missed? be willing to reveal any detail of my settings. 

### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Deepseek-v3 performace on benchmark didn't match with paper #11971

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Deepseek-v3 performace on benchmark didn't match with paper #11971

Description

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions