How to using stream output with vllm?

Although there are some lib wrappered vllm like TGI, but I want to know how to using vllm with stream output enabled, currently hard to found out-of-box example on it.

Typically, with original hf transforemrs API, one can using a TextStreamer and send into generate_kwargs to do this:

```
generate_kwargs = dict(
                **input_ids,
                max_new_tokens=50 if args.bare else 800,
                streamer=streamer,  <--- streamer
                do_sample=True,
                num_beams=1,
                temperature=float(args.temp),
                top_k=40,
                top_p=float(args.top_p),
          
            )
```

Any out-of-box I can use to enable stream in vllm? Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to using stream output with vllm? #351

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

How to using stream output with vllm? #351

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions