Skip to content

How to using stream output with vllm? #351

@lucasjinreal

Description

@lucasjinreal

Although there are some lib wrappered vllm like TGI, but I want to know how to using vllm with stream output enabled, currently hard to found out-of-box example on it.

Typically, with original hf transforemrs API, one can using a TextStreamer and send into generate_kwargs to do this:

generate_kwargs = dict(
                **input_ids,
                max_new_tokens=50 if args.bare else 800,
                streamer=streamer,  <--- streamer
                do_sample=True,
                num_beams=1,
                temperature=float(args.temp),
                top_k=40,
                top_p=float(args.top_p),
          
            )

Any out-of-box I can use to enable stream in vllm? Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions