Skip to content

[Rate Type] Concurrencies #47

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
philschmid opened this issue Sep 5, 2024 · 4 comments
Closed

[Rate Type] Concurrencies #47

philschmid opened this issue Sep 5, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@philschmid
Copy link
Contributor

Hello,

I am trying to integrate guidellm into a benchmark suite. And there we ran different load tests based on use concurrencies. We define user concurrenies as "users" that send requests after each other. Meaning send request -> wait for response -> send next request.

I first assumed that's what is done with "constant" and "rate" but there is send way more requests as they are send per second. Is there a way to customize the "user concurrency"? I assume that concurrency == synchronous type. But would be create if i could do something like

guidellm --target "http://localhost:8080/v1" --model "meta-llama/Meta-Llama-3.1-8B-Instruct"  --data-type emulated --data "prompt_tokens=550,generated_tokens=250" --max-seconds 60 --rate-type concurrent --rate 1 --rate 2 --rate 10 --rate 50 --output-path r.json
@markurtz
Copy link
Member

Hey @philschmid, I understand what you mean about this request. You'd specifically like to be able to keep a fixed number of concurrent requests over the life of the benchmark where as soon as one finishes it immediately starts a new one, is that correct? You can't easily figure out currently through the constant or poison rate types since those are set as the number of requests per second rather which you'd have to adjust those until you hit the average number of concurrent users, right?

@markurtz markurtz self-assigned this Sep 10, 2024
@markurtz markurtz added the enhancement New feature or request label Sep 10, 2024
@philschmid
Copy link
Contributor Author

Hey,

Yes. I am looking for a way to benchmark the load under e,g, 1, 2, 4, 8, 16, 32, 64, 128 concurrent users (send request -> wait for response, send again).

But looking into more benchmarks and dashboard, people seem to switch to QPS (what rate should cover). So not sure how important this is.

@markurtz
Copy link
Member

@philschmid support for this will be landing with #96

markurtz added a commit that referenced this issue Apr 11, 2025
…ation Refactor (#96)

Full refactor of GuideLLM enabling better overall performance to ensure
minimal overhead for benchmarking with a new multiprocess and threaded
scheduler along with significant updates to the output formats enabling
better analysis, visibility, and clarity.

<img width="668" alt="Screenshot 2025-04-11 at 2 26 13 PM"
src="https://github.com/user-attachments/assets/a723854a-7fe0-4eb2-9408-f632e747c3c2"
/>

Fixes:
- #92 
- #77 
- #47 
- #79

---------

Co-authored-by: Alexandre Marques <[email protected]>
Co-authored-by: Samuel Monson <[email protected]>
Co-authored-by: David Gray <[email protected]>
@markurtz
Copy link
Member

Closing this out as this has landed now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants