-
Notifications
You must be signed in to change notification settings - Fork 112
Add retry mechanism to telemetry requests #617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: telemetry
Are you sure you want to change the base?
Conversation
Signed-off-by: Sai Shree Pradhan <[email protected]>
Signed-off-by: Sai Shree Pradhan <[email protected]>
Signed-off-by: Sai Shree Pradhan <[email protected]>
|
||
from databricks.sql.telemetry.telemetry_client import TelemetryClientFactory | ||
|
||
telemetry_client = TelemetryClientFactory.get_telemetry_client(session_id_hex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change?
|
||
# Check that the request was retried (should be 2 calls: initial + 1 retry) | ||
assert mock_get_conn.return_value.getresponse.call_count == 2 | ||
assert "Retrying after" in caplog.text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we collapse these tests a bit i.e. test 503, 429, retry-after header, retry count etc in a single test (first response returns 503, second does 429 etc), appreciate the detailed testing but i would like it to be more readable and maintainable in the long term
self.wait_for_async_request() | ||
TelemetryClientFactory.close(client._session_id_hex) | ||
|
||
# Based on the logs, 400 IS being retried (this is the actual behavior for CommandType.OTHER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
400 is being retried? are you sure?
|
||
# 403 should not be retried based on the retry policy | ||
mock_get_conn.return_value.getresponse.assert_called_once() | ||
assert "Telemetry request failed with status code: 403" in caplog.text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would like to collapse 4xx error codes into a single test with something like parametrized tests
@@ -31,6 +33,24 @@ | |||
logger = logging.getLogger(__name__) | |||
|
|||
|
|||
class TelemetryHTTPAdapter(HTTPAdapter): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's see if we can skip this adapter
TELEMETRY_RETRY_STOP_AFTER_ATTEMPTS_COUNT = 3 | ||
TELEMETRY_RETRY_DELAY_MIN = 0.5 # seconds | ||
TELEMETRY_RETRY_DELAY_MAX = 5.0 # seconds | ||
TELEMETRY_RETRY_STOP_AFTER_ATTEMPTS_DURATION = 30.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's align these values across drivers or have these consistent across the python driver
mock_response.isclosed.return_value = False | ||
return mock_response | ||
|
||
@pytest.mark.usefixtures("caplog") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need caplog?
What type of PR is this?
Description
Retry mechanism for telemetry requests
How is this tested?
Related Tickets & Documents
PECOBLR-586