Skip to content

Add functionality for export of latency logs via telemetry #608

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 90 commits into from
Jul 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
65a75f4
added functionality for export of failure logs
saishreeeee Jun 10, 2025
5305308
changed logger.error to logger.debug in exc.py
saishreeeee Jun 11, 2025
ba83c33
Fix telemetry loss during Python shutdown
saishreeeee Jun 11, 2025
131db92
unit tests for export_failure_log
saishreeeee Jun 12, 2025
3abc40d
try-catch blocks to make telemetry failures non-blocking for connecto…
saishreeeee Jun 12, 2025
ffa4787
removed redundant try/catch blocks, added try/catch block to initiali…
saishreeeee Jun 12, 2025
cc077f3
skip null fields in telemetry request
saishreeeee Jun 12, 2025
2c6fd44
removed dup import, renamed func, changed a filter_null_values to lamda
saishreeeee Jun 12, 2025
89540a1
removed unnecassary class variable and a redundant try/except block
saishreeeee Jun 12, 2025
52a1152
public functions defined at interface level
saishreeeee Jun 12, 2025
3dcdcfa
changed export_event and flush to private functions
saishreeeee Jun 13, 2025
b2714c9
formatting
saishreeeee Jun 13, 2025
377a87b
changed connection_uuid to thread local in thrift backend
saishreeeee Jun 13, 2025
c9376b8
made errors more specific
saishreeeee Jun 13, 2025
bbfadf2
revert change to connection_uuid
saishreeeee Jun 13, 2025
9bce26b
reverting change in close in telemetry client
saishreeeee Jun 13, 2025
ef4514d
JsonSerializableMixin
saishreeeee Jun 13, 2025
8924835
isdataclass check in JsonSerializableMixin
saishreeeee Jun 13, 2025
65361e7
convert TelemetryClientFactory to module-level functions, replace Noo…
saishreeeee Jun 16, 2025
1722a77
renamed connection_uuid as session_id_hex
saishreeeee Jun 16, 2025
e841434
added NotImplementedError to abstract class, added unit tests
saishreeeee Jun 16, 2025
2f89266
formatting
saishreeeee Jun 16, 2025
5564bbb
added PEP-249 link, changed NoopTelemetryClient implementation
saishreeeee Jun 17, 2025
1e4e8cf
removed unused import
saishreeeee Jun 17, 2025
55b29bc
made telemetry client close a module-level function
saishreeeee Jun 17, 2025
93bf170
unit tests verbose
saishreeeee Jun 17, 2025
45f5ccf
debug logs in unit tests
saishreeeee Jun 17, 2025
8ff1c1f
debug logs in unit tests
saishreeeee Jun 17, 2025
8bdd324
removed ABC from mixin, added try/catch block around executor shutdown
saishreeeee Jun 17, 2025
f99f7ea
checking stuff
saishreeeee Jun 17, 2025
b972c8a
finding out
saishreeeee Jun 17, 2025
7ca3636
finding out more
saishreeeee Jun 17, 2025
0ac8ed2
more more finding out more nice
saishreeeee Jun 17, 2025
c457a09
locks are useless anyways
saishreeeee Jun 17, 2025
5f07a84
haha
saishreeeee Jun 17, 2025
1115e25
normal
saishreeeee Jun 17, 2025
de1ed87
:= looks like walrus horizontally
saishreeeee Jun 17, 2025
554aeaf
one more
saishreeeee Jun 17, 2025
fffac5f
walrus again
saishreeeee Jun 17, 2025
b77208a
old stuff without walrus seems to fail
saishreeeee Jun 17, 2025
733c288
manually do the walrussing
saishreeeee Jun 17, 2025
ca8b958
change 3.13t, v2
saishreeeee Jun 17, 2025
3eabac9
formatting, added walrus
saishreeeee Jun 17, 2025
fb9ef43
formatting
saishreeeee Jun 17, 2025
1e795aa
removed walrus, removed test before stalling test
saishreeeee Jun 17, 2025
2c293a5
changed order of stalling test
saishreeeee Jun 18, 2025
d237255
removed debugging, added TelemetryClientFactory
saishreeeee Jun 18, 2025
f101b19
remove more debugging
saishreeeee Jun 18, 2025
a094659
latency logs funcitionality
saishreeeee Jun 19, 2025
695a07d
merge
saishreeeee Jun 19, 2025
fc918d6
fixed type of return value in get_session_id_hex() in thrift backend
saishreeeee Jun 19, 2025
d7c75d7
debug on TelemetryClientFactory lock
saishreeeee Jun 19, 2025
b6b0f89
formatting
saishreeeee Jun 19, 2025
50a1206
type notation for _waiters
saishreeeee Jun 19, 2025
39a0530
called connection.close() in test_arraysize_buffer_size_passthrough
saishreeeee Jun 19, 2025
413427f
run all unit tests
saishreeeee Jun 19, 2025
6b1d1b8
more debugging
saishreeeee Jun 19, 2025
8f5e5ba
removed the connection.close() from that test, put debug statement be…
saishreeeee Jun 19, 2025
2dc00ba
more debug
saishreeeee Jun 19, 2025
1ff03d4
more more more
saishreeeee Jun 19, 2025
6ff07c8
why
saishreeeee Jun 19, 2025
395049a
whywhy
saishreeeee Jun 19, 2025
4466821
thread name
saishreeeee Jun 19, 2025
34b63e4
added teardown to all tests except finalizer test (gc collect)
saishreeeee Jun 20, 2025
49082fb
added the get_attribute functions to the classes
saishreeeee Jun 20, 2025
ed1db9d
removed tearDown, added connection.close() to first test
saishreeeee Jun 20, 2025
9fa5a89
finally
saishreeeee Jun 21, 2025
14433c4
remove debugging
saishreeeee Jun 22, 2025
ef4ca13
added test for export_latency_log, made mock of thrift backend with r…
saishreeeee Jun 23, 2025
152e0da
Merge branch 'telemetry' into PECOBLR-554
saishreeeee Jun 23, 2025
b5bf165
added multi threaded tests
saishreeeee Jun 23, 2025
307a8cc
formatting
saishreeeee Jun 23, 2025
0fd46d4
added TelemetryExtractor, removed multithreaded tests
saishreeeee Jun 25, 2025
f6f50b2
formatting
saishreeeee Jun 25, 2025
1163ebe
fixes in test
saishreeeee Jun 25, 2025
4b6ace0
fix in telemetry extractor
saishreeeee Jun 25, 2025
7171718
Merge branch 'telemetry' into PECOBLR-554
saishreeeee Jun 25, 2025
a059a03
Merge branch 'telemetry' into PECOBLR-554
saishreeeee Jun 25, 2025
4d56141
added doc strings to latency_logger, abstracted export_telemetry_log
saishreeeee Jun 30, 2025
27295c2
statement type, unit test fix
saishreeeee Jun 30, 2025
b558bc8
unit test fix
saishreeeee Jun 30, 2025
01853bc
statement type changes
saishreeeee Jul 1, 2025
45f74d0
test_fetches fix
saishreeeee Jul 1, 2025
149d4a8
added mocks to resolve the errors caused by log_latency decorator in …
saishreeeee Jul 1, 2025
e031663
removed function in test_fetches cuz it is only used once
saishreeeee Jul 1, 2025
142b9a8
added _safe_call which returns None in case of errors in the get func…
saishreeeee Jul 2, 2025
2a26965
removed the changes in test_client and test_fetches
saishreeeee Jul 2, 2025
ae90dee
removed the changes in test_fetches
saishreeeee Jul 2, 2025
acc9904
test_telemetry
saishreeeee Jul 3, 2025
a847122
removed test
saishreeeee Jul 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion src/databricks/sql/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,8 @@
DriverConnectionParameters,
HostDetails,
)

from databricks.sql.telemetry.latency_logger import log_latency
from databricks.sql.telemetry.models.enums import StatementType

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -745,6 +746,7 @@ def _handle_staging_operation(
session_id_hex=self.connection.get_session_id_hex(),
)

@log_latency(StatementType.SQL)
def _handle_staging_put(
self, presigned_url: str, local_file: str, headers: Optional[dict] = None
):
Expand Down Expand Up @@ -784,6 +786,7 @@ def _handle_staging_put(
+ "but not yet applied on the server. It's possible this command may fail later."
)

@log_latency(StatementType.SQL)
def _handle_staging_get(
self, local_file: str, presigned_url: str, headers: Optional[dict] = None
):
Expand Down Expand Up @@ -811,6 +814,7 @@ def _handle_staging_get(
with open(local_file, "wb") as fp:
fp.write(r.content)

@log_latency(StatementType.SQL)
def _handle_staging_remove(
self, presigned_url: str, headers: Optional[dict] = None
):
Expand All @@ -824,6 +828,7 @@ def _handle_staging_remove(
session_id_hex=self.connection.get_session_id_hex(),
)

@log_latency(StatementType.QUERY)
def execute(
self,
operation: str,
Expand Down Expand Up @@ -914,6 +919,7 @@ def execute(

return self

@log_latency(StatementType.QUERY)
def execute_async(
self,
operation: str,
Expand Down Expand Up @@ -1039,6 +1045,7 @@ def executemany(self, operation, seq_of_parameters):
self.execute(operation, parameters)
return self

@log_latency(StatementType.METADATA)
def catalogs(self) -> "Cursor":
"""
Get all available catalogs.
Expand All @@ -1062,6 +1069,7 @@ def catalogs(self) -> "Cursor":
)
return self

@log_latency(StatementType.METADATA)
def schemas(
self, catalog_name: Optional[str] = None, schema_name: Optional[str] = None
) -> "Cursor":
Expand Down Expand Up @@ -1090,6 +1098,7 @@ def schemas(
)
return self

@log_latency(StatementType.METADATA)
def tables(
self,
catalog_name: Optional[str] = None,
Expand Down Expand Up @@ -1125,6 +1134,7 @@ def tables(
)
return self

@log_latency(StatementType.METADATA)
def columns(
self,
catalog_name: Optional[str] = None,
Expand Down Expand Up @@ -1379,6 +1389,7 @@ def _fill_results_buffer(self):
self.results = results
self.has_more_rows = has_more_rows

@log_latency()
def _convert_columnar_table(self, table):
column_names = [c[0] for c in self.description]
ResultRow = Row(*column_names)
Expand All @@ -1391,6 +1402,7 @@ def _convert_columnar_table(self, table):

return result

@log_latency()
def _convert_arrow_table(self, table):
column_names = [c[0] for c in self.description]
ResultRow = Row(*column_names)
Expand Down Expand Up @@ -1433,6 +1445,7 @@ def _convert_arrow_table(self, table):
def rownumber(self):
return self._next_row_index

@log_latency()
def fetchmany_arrow(self, size: int) -> "pyarrow.Table":
"""
Fetch the next set of rows of a query result, returning a PyArrow table.
Expand Down Expand Up @@ -1475,6 +1488,7 @@ def merge_columnar(self, result1, result2):
]
return ColumnTable(merged_result, result1.column_names)

@log_latency()
def fetchmany_columnar(self, size: int):
"""
Fetch the next set of rows of a query result, returning a Columnar Table.
Expand All @@ -1500,6 +1514,7 @@ def fetchmany_columnar(self, size: int):

return results

@log_latency()
def fetchall_arrow(self) -> "pyarrow.Table":
"""Fetch all (remaining) rows of a query result, returning them as a PyArrow table."""
results = self.results.remaining_rows()
Expand All @@ -1526,6 +1541,7 @@ def fetchall_arrow(self) -> "pyarrow.Table":
return pyarrow.Table.from_pydict(data)
return results

@log_latency()
def fetchall_columnar(self):
"""Fetch all (remaining) rows of a query result, returning them as a Columnar table."""
results = self.results.remaining_rows()
Expand Down
231 changes: 231 additions & 0 deletions src/databricks/sql/telemetry/latency_logger.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,231 @@
import time
import functools
from typing import Optional
from databricks.sql.telemetry.telemetry_client import TelemetryClientFactory
from databricks.sql.telemetry.models.event import (
SqlExecutionEvent,
)
from databricks.sql.telemetry.models.enums import ExecutionResultFormat, StatementType
from databricks.sql.utils import ColumnQueue, CloudFetchQueue, ArrowQueue
from uuid import UUID


class TelemetryExtractor:
"""
Base class for extracting telemetry information from various object types.

This class serves as a proxy that delegates attribute access to the wrapped object
while providing a common interface for extracting telemetry-related data.
"""

def __init__(self, obj):
"""
Initialize the extractor with an object to wrap.

Args:
obj: The object to extract telemetry information from.
"""
self._obj = obj

def __getattr__(self, name):
"""
Delegate attribute access to the wrapped object.

Args:
name (str): The name of the attribute to access.

Returns:
The attribute value from the wrapped object.
"""
return getattr(self._obj, name)

def get_session_id_hex(self):
pass

def get_statement_id(self):
pass

def get_is_compressed(self):
pass

def get_execution_result(self):
pass

def get_retry_count(self):
pass


class CursorExtractor(TelemetryExtractor):
"""
Telemetry extractor specialized for Cursor objects.

Extracts telemetry information from database cursor objects, including
statement IDs, session information, compression settings, and result formats.
"""

def get_statement_id(self) -> Optional[str]:
return self.query_id

def get_session_id_hex(self) -> Optional[str]:
return self.connection.get_session_id_hex()

def get_is_compressed(self) -> bool:
return self.connection.lz4_compression

def get_execution_result(self) -> ExecutionResultFormat:
if self.active_result_set is None:
return ExecutionResultFormat.FORMAT_UNSPECIFIED

if isinstance(self.active_result_set.results, ColumnQueue):
return ExecutionResultFormat.COLUMNAR_INLINE
elif isinstance(self.active_result_set.results, CloudFetchQueue):
return ExecutionResultFormat.EXTERNAL_LINKS
elif isinstance(self.active_result_set.results, ArrowQueue):
return ExecutionResultFormat.INLINE_ARROW
return ExecutionResultFormat.FORMAT_UNSPECIFIED

def get_retry_count(self) -> int:
if (
hasattr(self.thrift_backend, "retry_policy")
and self.thrift_backend.retry_policy
):
return len(self.thrift_backend.retry_policy.history)
return 0


class ResultSetExtractor(TelemetryExtractor):
"""
Telemetry extractor specialized for ResultSet objects.

Extracts telemetry information from database result set objects, including
operation IDs, session information, compression settings, and result formats.
"""

def get_statement_id(self) -> Optional[str]:
if self.command_id:
return str(UUID(bytes=self.command_id.operationId.guid))
return None

def get_session_id_hex(self) -> Optional[str]:
return self.connection.get_session_id_hex()

def get_is_compressed(self) -> bool:
return self.lz4_compressed

def get_execution_result(self) -> ExecutionResultFormat:
if isinstance(self.results, ColumnQueue):
return ExecutionResultFormat.COLUMNAR_INLINE
elif isinstance(self.results, CloudFetchQueue):
return ExecutionResultFormat.EXTERNAL_LINKS
elif isinstance(self.results, ArrowQueue):
return ExecutionResultFormat.INLINE_ARROW
return ExecutionResultFormat.FORMAT_UNSPECIFIED

def get_retry_count(self) -> int:
if (
hasattr(self.thrift_backend, "retry_policy")
and self.thrift_backend.retry_policy
):
return len(self.thrift_backend.retry_policy.history)
return 0


def get_extractor(obj):
"""
Factory function to create the appropriate telemetry extractor for an object.

Determines the object type and returns the corresponding specialized extractor
that can extract telemetry information from that object type.

Args:
obj: The object to create an extractor for. Can be a Cursor, ResultSet,
or any other object.

Returns:
TelemetryExtractor: A specialized extractor instance:
- CursorExtractor for Cursor objects
- ResultSetExtractor for ResultSet objects
- Throws an NotImplementedError for all other objects
"""
if obj.__class__.__name__ == "Cursor":
return CursorExtractor(obj)
elif obj.__class__.__name__ == "ResultSet":
return ResultSetExtractor(obj)
else:
raise NotImplementedError(f"No extractor found for {obj.__class__.__name__}")


def log_latency(statement_type: StatementType = StatementType.NONE):
"""
Decorator for logging execution latency and telemetry information.

This decorator measures the execution time of a method and sends telemetry
data about the operation, including latency, statement information, and
execution context.

The decorator automatically:
- Measures execution time using high-precision performance counters
- Extracts telemetry information from the method's object (self)
- Creates a SqlExecutionEvent with execution details
- Sends the telemetry data asynchronously via TelemetryClient

Args:
statement_type (StatementType): The type of SQL statement being executed.

Usage:
@log_latency(StatementType.SQL)
def execute(self, query):
# Method implementation
pass

Returns:
function: A decorator that wraps methods to add latency logging.

Note:
The wrapped method's object (self) must be compatible with the
telemetry extractor system (e.g., Cursor or ResultSet objects).
"""

def decorator(func):
@functools.wraps(func)
def wrapper(self, *args, **kwargs):
start_time = time.perf_counter()
result = None
try:
result = func(self, *args, **kwargs)
return result
finally:

def _safe_call(func_to_call):
"""Calls a function and returns a default value on any exception."""
try:
return func_to_call()
except Exception:
return None

end_time = time.perf_counter()
duration_ms = int((end_time - start_time) * 1000)

extractor = get_extractor(self)
session_id_hex = _safe_call(extractor.get_session_id_hex)
statement_id = _safe_call(extractor.get_statement_id)

sql_exec_event = SqlExecutionEvent(
statement_type=statement_type,
is_compressed=_safe_call(extractor.get_is_compressed),
execution_result=_safe_call(extractor.get_execution_result),
retry_count=_safe_call(extractor.get_retry_count),
)

telemetry_client = TelemetryClientFactory.get_telemetry_client(
session_id_hex
)
telemetry_client.export_latency_log(
latency_ms=duration_ms,
sql_execution_event=sql_exec_event,
sql_statement_id=statement_id,
)

return wrapper

return decorator
Loading
Loading