Skip to content

Conversation

youkaichao
Copy link
Member

@youkaichao youkaichao commented Jun 26, 2024

similar to #5584

the same benchmark command:

python benchmarks/benchmark_throughput.py --output-len 256 --input 256 --model meta-llama/Llama-2-7b-hf -tp 8

the same machine: 8*H100

before (current main): Throughput: 38.07 requests/s, 19493.23 tokens/s

after (this PR): Throughput: 38.94 requests/s, 19939.65 tokens/s

let's see if it breaks anything. we need to make sure, we only use python list when receiving/sending user's request. elsewhere, we should keep numpy array, where slicing is only a view operation. Never copy the whole sequence.

@youkaichao
Copy link
Member Author

close as it is separated into #5882 and #5942

@youkaichao youkaichao closed this Jun 27, 2024
@youkaichao youkaichao deleted the seq_data_pool branch June 27, 2024 23:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant