-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Closed
Description
Although there are some lib wrappered vllm like TGI, but I want to know how to using vllm with stream output enabled, currently hard to found out-of-box example on it.
Typically, with original hf transforemrs API, one can using a TextStreamer and send into generate_kwargs to do this:
generate_kwargs = dict(
**input_ids,
max_new_tokens=50 if args.bare else 800,
streamer=streamer, <--- streamer
do_sample=True,
num_beams=1,
temperature=float(args.temp),
top_k=40,
top_p=float(args.top_p),
)
Any out-of-box I can use to enable stream in vllm? Thanks
Swiffers
Metadata
Metadata
Assignees
Labels
No labels