Skip to content

[API server] handle logs request in coroutine #5366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Conversation

aylei
Copy link
Collaborator

@aylei aylei commented Apr 25, 2025

close #4767

This PR includes the minimal changes that move /logs handling to coroutine:

  1. introduce a coroutine context, which handles cancellation, log redirection and env var overrides;
  2. run /logs in uvicorn's event loop;

Though the task is now executed directly in the unvicorn process, we still maintain a request record for logs request to keep the behavior consistent: user can still cancel a log request sky api cancel and retrieve the log again with sky api logs.

Follow ups:

Benchmark

  1. Command: python tests/load_tests/test_load_on_server.py -n 100 --apis tail_logs -c kubernetes under low server concurrency, 1c2g machine (1 long workers + 2 short workers):
# This PR
All requests completed in 16.20 seconds

----------------------------------------------------------------------------------------------------
Kind                 Count    Total(s)   Avg(s)     Min(s)     Max(s)     P95(s)     P99(s)
----------------------------------------------------------------------------------------------------
API /tail_logs       100      1642.00    16.42      16.20      18.26      17.25      18.26

# Master
All requests completed in 229.30 seconds

Latency Statistics:
----------------------------------------------------------------------------------------------------
Kind                 Count    Total(s)   Avg(s)     Min(s)     Max(s)     P95(s)     P99(s)
----------------------------------------------------------------------------------------------------
API /tail_logs       100      11675.38   116.75     3.31       229.30     218.19     227.03

There is a 7x improvement in average. The bottleneck of this PR is that each log task runs in a dedicated thread and there is only 1 uvicorn worker process, GIL contention makes the 100 logs threads cannot be fully concurrent.

  1. Command: python tests/load_tests/test_load_on_server.py -n 100 --apis tail_logs -c aws under unlimited concurrency local mode (burstable worker), 4c16g machine:
# This PR
All requests completed in 56.22 seconds

Latency Statistics:
----------------------------------------------------------------------------------------------------
Kind                 Count    Total(s)   Avg(s)     Min(s)     Max(s)     P95(s)     P99(s)
----------------------------------------------------------------------------------------------------
API /tail_logs       100      5367.31    53.67      53.45      56.06      53.76      56.04

# Master
All requests completed in 90.30 seconds

Latency Statistics:
----------------------------------------------------------------------------------------------------
Kind                 Count    Total(s)   Avg(s)     Min(s)     Max(s)     P95(s)     P99(s)
----------------------------------------------------------------------------------------------------
API /tail_logs       100      7838.55    78.39      43.78      90.20      90.00      90.10

Resources:

# This PR
PEAK USAGE:
Peak CPU: 100.0%
Peak Memory: 1.50GB (11.8%)
Memory Delta: 0.6GB
Peak Short Executor Memory: 0.16GB
Peak Short Executor Memory Average: 0.16GB
Peak Long Executor Memory: 0.00GB
Peak Long Executor Memory Average: 0.00GB

# Master
PEAK USAGE:
Peak CPU: 100.0%
Peak Memory: 7.92GB (53.7%)
Memory Delta: 6.7GB
Peak Short Executor Memory: 0.20GB
Peak Short Executor Memory Average: 0.18GB
Peak Long Executor Memory: 0.19GB
Peak Long Executor Memory Average: 0.19GB

About 10x memory efficiency. However, the test found that logs on aws instance is significantly slower than logs on kubernetes instance (I switch the benchmark env to AWS EC2 for accurate resource usage accounting). This might be related to more RPCs/CPU cycles touched by the AWS code path, I leave this as a followup as it is not actually relevant to this PR.

Tests

Tested (run the relevant ones):

  • Code formatting: install pre-commit (auto-check on commit) or bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
  • Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
  • Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

@aylei aylei changed the title [API server] handle logs request in event loop [API server] handle logs request in coroutine Apr 27, 2025
@aylei aylei marked this pull request as ready for review April 27, 2025 12:50
@aylei
Copy link
Collaborator Author

aylei commented Apr 27, 2025

/smoke-test -k test_minimal

@aylei aylei requested a review from Michaelvll April 27, 2025 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[API Server] optimize the cost of sky logs on server
1 participant