You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tested llama.cpp on two systems, one with 4xA100 GPU and the other with 8xH100 GPU. The test results show that the inference performance of 8xH100+nvlink(21 tokens per socond) is worse than that of 4xA100 pcie(31 token per second), which is very strange! Can anyone help explain this behavior? How can I improve H100? Thanks