[Feature][Hardware][TPU]: Add Recompilation Check for vLLM on TPU

### 🚀 The feature, motivation and pitch

Ideally, post-warmup, no further compilation should occur. However, PyTorch/XLA's implicit compilation can lead to excessive recompilation during LLM serving, impacting performance. We can add an option to detect recompilation after warmup, requiring a PyTorch/XLA method like xm.num_graph_hash() to track the number of captured graphs. This number should remain constant post-warmup if no recompilation occurs.

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature][Hardware][TPU]: Add Recompilation Check for vLLM on TPU #14580

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature][Hardware][TPU]: Add Recompilation Check for vLLM on TPU #14580

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions