Skip to content

Ratelimiter: Very high CPU usage when ratelimiter is throttling guest RX. #1439

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sandreim opened this issue Nov 28, 2019 · 6 comments · Fixed by #1444
Closed

Ratelimiter: Very high CPU usage when ratelimiter is throttling guest RX. #1439

sandreim opened this issue Nov 28, 2019 · 6 comments · Fixed by #1444
Assignees
Labels
Priority: High Indicates than an issue or pull request should be resolved ahead of issues or pull requests labelled Type: Bug Indicates an unexpected problem or unintended behavior

Comments

@sandreim
Copy link
Contributor

When rate limiting is enabled and the guest is receiving a lot of traffic that triggers throttling the emulation thread will use 100% CPU.

There are 2 possible approaches here:

  • quick fix: keep the current code structure and configure edge triggered epoll for the tap fd.
  • proper fix: rework the current state machine and use edge triggered epoll
@sandreim sandreim added Type: Bug Indicates an unexpected problem or unintended behavior Feature: IO Virtualization Priority: High Indicates than an issue or pull request should be resolved ahead of issues or pull requests labelled labels Nov 28, 2019
@pyrito
Copy link
Contributor

pyrito commented Nov 29, 2019

Hi @sandreim could you give some more context to this problem?

@sandreim sandreim changed the title Ratelimiter: Very high CPU usage when ratelimiter si throttling guest RX. Ratelimiter: Very high CPU usage when ratelimiter is throttling guest RX. Nov 29, 2019
@sandreim
Copy link
Contributor Author

sandreim commented Nov 29, 2019

Hi @pyrito,

The issue happens when the net device does not read all the the packets from the TAP device waiting for the limiter to replenish it's budget. When that happens the epoll loop will continuously fire the RX_TAP_EVENT until all packets will be read. There is already some code that unregisters the epoll event when there are no more free RX buffers but that does not cover the case when we stop processing the tap RX due to limiter budget.

@sandreim sandreim self-assigned this Nov 29, 2019
@pyrito
Copy link
Contributor

pyrito commented Nov 29, 2019

Hi @sandreim , I think @mi-yu and I can help take a stab at this problem. Just for reference, we're two students at UT Austin who are looking to make a couple of open-source contributions for our final Virtualization project.

@pyrito
Copy link
Contributor

pyrito commented Nov 30, 2019

Hi @sandreim

I was wondering if you could explain what you meant by two of your suggested fixes?

  • "configure edge triggered epoll for the tap fd", we looked through the code and found the part of the code that does not unregister the epoll event when the tap RX is being blocked due to reaching its budget. Could you clarify what you meant by configuring the edge triggered epoll?

  • "rework the current state machine and use edge triggered epoll" could you give some general context towards how to make this "proper" fix.

Thank you for your time!

@andreeaflorescu
Copy link
Member

Hey @pyrito. @sandreim already has a fix for this and he will probably post it early next week. Do you need help in finding another issue?

@pyrito
Copy link
Contributor

pyrito commented Nov 30, 2019

@andreeaflorescu , yes that would be greatly appreciated! If there are any issues of easy difficulty that we could tackle within a week or so (with a proof of concept) that would be much appreciated. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: High Indicates than an issue or pull request should be resolved ahead of issues or pull requests labelled Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants