Improve image processing time

### Feature request

Optimize Transformers' image_processors to decrease image processing time, and reduce inference latency for vision models and vlms.

### Motivation

The Transformers library relies on PIL (Pillow) for image preprocessing, which can become a major bottleneck during inference, especially with compiled models where the preprocessing time can dominate the overall inference time.

![image](https://github.com/user-attachments/assets/10e19b7a-5e48-4bef-84a1-430458e17586)
![image-1](https://github.com/user-attachments/assets/be4f06f3-befd-44b7-b743-10f09471fa58)


In the examples above, the RT-DETR preprocessing necessitates only to resize the image, while the DETR one involves resize+normalize. 
In eager mode, image preprocessing takes a big part of the total inference time for RT-DETR, but is not the main bottleneck. However, with a compiled RT-DETR, image preprocessing takes up the majority of the inference time, underlining the necessity to optimize it. This is even clearer for DETR, where image preprocessing is already the main bottleneck in eager mode.

However, alternative libraries exist that leverage available hardware more efficiently for faster image preprocessing.
[OptimVision](https://github.com/yonigozlan/OptimVision) uses such libraries to get much better results compared to Transformers.

Much more details on OptimVision and image processing methods comparison are available on this [Notion page](https://www.notion.so/huggingface2/OptimVision-Optimize-preprocessing-time-10f1384ebcac8091a12debb87fe5f591?pvs=4).

### Your contribution

OptimVision is an experiment playground to optimize the different steps involved in inferring/training with vision models.
The current fast image preprocessing in OptimVision is a proof of concept and is not yet ready to be merged into Transformers, but that this the ultimate goal :).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve image processing time #33810

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve image processing time #33810

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions