Skip to content

Add retry mechanism to telemetry requests #617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: telemetry
Choose a base branch
from

Conversation

saishreeeee
Copy link
Collaborator

@saishreeeee saishreeeee commented Jun 25, 2025

What type of PR is this?

  • Refactor
  • Feature
  • Bug Fix
  • Other

Description

Retry mechanism for telemetry requests

How is this tested?

  • Unit tests
  • E2E Tests
  • Manually
  • N/A

Related Tickets & Documents

PECOBLR-586

Signed-off-by: Sai Shree Pradhan <[email protected]>
@saishreeeee saishreeeee marked this pull request as draft June 25, 2025 06:28
@saishreeeee saishreeeee self-assigned this Jun 25, 2025
@saishreeeee saishreeeee changed the title Add retry mechanism for telemetry requests Fit and finish of telemetry (retries and server side flag integration) Jun 25, 2025
@saishreeeee saishreeeee changed the title Fit and finish of telemetry (retries and server side flag integration) Add retry mechanism to telemetry requests Jun 25, 2025
Signed-off-by: Sai Shree Pradhan <[email protected]>
@saishreeeee saishreeeee marked this pull request as ready for review June 25, 2025 12:40
@saishreeeee saishreeeee marked this pull request as draft June 26, 2025 05:06
@saishreeeee saishreeeee marked this pull request as ready for review June 27, 2025 07:14

from databricks.sql.telemetry.telemetry_client import TelemetryClientFactory

telemetry_client = TelemetryClientFactory.get_telemetry_client(session_id_hex)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?


# Check that the request was retried (should be 2 calls: initial + 1 retry)
assert mock_get_conn.return_value.getresponse.call_count == 2
assert "Retrying after" in caplog.text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we collapse these tests a bit i.e. test 503, 429, retry-after header, retry count etc in a single test (first response returns 503, second does 429 etc), appreciate the detailed testing but i would like it to be more readable and maintainable in the long term

self.wait_for_async_request()
TelemetryClientFactory.close(client._session_id_hex)

# Based on the logs, 400 IS being retried (this is the actual behavior for CommandType.OTHER)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

400 is being retried? are you sure?


# 403 should not be retried based on the retry policy
mock_get_conn.return_value.getresponse.assert_called_once()
assert "Telemetry request failed with status code: 403" in caplog.text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would like to collapse 4xx error codes into a single test with something like parametrized tests

@@ -31,6 +33,24 @@
logger = logging.getLogger(__name__)


class TelemetryHTTPAdapter(HTTPAdapter):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's see if we can skip this adapter

TELEMETRY_RETRY_STOP_AFTER_ATTEMPTS_COUNT = 3
TELEMETRY_RETRY_DELAY_MIN = 0.5 # seconds
TELEMETRY_RETRY_DELAY_MAX = 5.0 # seconds
TELEMETRY_RETRY_STOP_AFTER_ATTEMPTS_DURATION = 30.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's align these values across drivers or have these consistent across the python driver

mock_response.isclosed.return_value = False
return mock_response

@pytest.mark.usefixtures("caplog")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need caplog?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants