Skip to content

Normalise type code #652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 84 commits into from
Jul 31, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
5bf5d4c
Separate Session related functionality from Connection class (#571)
varun-edachali-dbx May 28, 2025
400a8bd
Introduce Backend Interface (DatabricksClient) (#573)
varun-edachali-dbx May 30, 2025
3c78ed7
Implement ResultSet Abstraction (backend interfaces for fetch phase) …
varun-edachali-dbx Jun 3, 2025
9625229
Introduce Sea HTTP Client and test script (#583)
varun-edachali-dbx Jun 4, 2025
0887bc1
Introduce `SeaDatabricksClient` (Session Implementation) (#582)
varun-edachali-dbx Jun 9, 2025
6d63df0
Normalise Execution Response (clean backend interfaces) (#587)
varun-edachali-dbx Jun 11, 2025
ba8d9fd
Introduce models for `SeaDatabricksClient` (#595)
varun-edachali-dbx Jun 12, 2025
bb3f15a
Introduce preliminary SEA Result Set (#588)
varun-edachali-dbx Jun 12, 2025
19f1fae
Merge branch 'main' into sea-migration
varun-edachali-dbx Jun 17, 2025
6c5ba6d
remove invalid ExecuteResponse import
varun-edachali-dbx Jun 17, 2025
5e5147b
Separate Session related functionality from Connection class (#571)
varun-edachali-dbx May 28, 2025
57370b3
Introduce Backend Interface (DatabricksClient) (#573)
varun-edachali-dbx May 30, 2025
75752bf
Implement ResultSet Abstraction (backend interfaces for fetch phase) …
varun-edachali-dbx Jun 3, 2025
450b80d
remove un-necessary initialisation assertions
varun-edachali-dbx Jun 18, 2025
a926f02
remove un-necessary line break s
varun-edachali-dbx Jun 18, 2025
55ad001
more un-necessary line breaks
varun-edachali-dbx Jun 18, 2025
fa15730
constrain diff of test_closing_connection_closes_commands
varun-edachali-dbx Jun 18, 2025
019c7fb
reduce diff of test_closing_connection_closes_commands
varun-edachali-dbx Jun 18, 2025
726abe7
use pytest-like assertions for test_closing_connection_closes_commands
varun-edachali-dbx Jun 18, 2025
bf6d41c
ensure command_id is not None
varun-edachali-dbx Jun 18, 2025
5afa733
line breaks after multi-line pyfocs
varun-edachali-dbx Jun 18, 2025
e3dfd36
ensure non null operationHandle for commandId creation
varun-edachali-dbx Jun 18, 2025
63360b3
use command_id methods instead of explicit guid_to_hex_id conversion
varun-edachali-dbx Jun 18, 2025
13ffb8d
remove un-necessary artifacts in test_session, add back assertion
varun-edachali-dbx Jun 18, 2025
a74d279
Implement SeaDatabricksClient (Complete Execution Spec) (#590)
varun-edachali-dbx Jun 18, 2025
d759050
add from __future__ import annotations to remove string literals arou…
varun-edachali-dbx Jun 19, 2025
1e21434
move docstring of DatabricksClient within class
varun-edachali-dbx Jun 24, 2025
cd4015b
move ThriftResultSet import to top of file
varun-edachali-dbx Jun 24, 2025
ed8b610
make backend/utils __init__ file empty
varun-edachali-dbx Jun 24, 2025
94d951e
use from __future__ import annotations to remove string literals arou…
varun-edachali-dbx Jun 24, 2025
c20058e
use lazy logging
varun-edachali-dbx Jun 24, 2025
fe3acb1
replace getters with property tag
varun-edachali-dbx Jun 24, 2025
9fb6a76
Merge branch 'main' into backend-refactors
varun-edachali-dbx Jun 24, 2025
61dfc4d
set active_command_id to None, not active_op_handle
varun-edachali-dbx Jun 24, 2025
64fb9b2
align test_session with pytest instead of unittest
varun-edachali-dbx Jun 24, 2025
cbf63f9
Merge branch 'main' into sea-migration
varun-edachali-dbx Jun 26, 2025
59b4825
remove duplicate test, correct active_command_id attribute
varun-edachali-dbx Jun 26, 2025
e380654
SeaDatabricksClient: Add Metadata Commands (#593)
varun-edachali-dbx Jun 26, 2025
677a7b0
SEA volume operations fix: assign `manifest.is_volume_operation` to `…
varun-edachali-dbx Jun 26, 2025
45585d4
Introduce manual SEA test scripts for Exec Phase (#589)
varun-edachali-dbx Jun 27, 2025
70c7dc8
Complete Fetch Phase (for `INLINE` disposition and `JSON_ARRAY` forma…
varun-edachali-dbx Jul 2, 2025
abf9aab
Merge branch 'main' into sea-migration
varun-edachali-dbx Jul 3, 2025
9b4b606
Merge branch 'main' into backend-refactors
varun-edachali-dbx Jul 3, 2025
4f11ff0
Introduce `row_limit` param (#607)
varun-edachali-dbx Jul 7, 2025
45f5c26
Merge branch 'main' into backend-refactors
varun-edachali-dbx Jul 10, 2025
2c9368a
formatting (black)
varun-edachali-dbx Jul 10, 2025
9b1b1f5
remove repetition from Session.__init__
varun-edachali-dbx Jul 10, 2025
77e23d3
Merge branch 'backend-refactors' into sea-migration
varun-edachali-dbx Jul 11, 2025
3bd3aef
fix merge artifacts
varun-edachali-dbx Jul 11, 2025
6d4701f
correct patch paths
varun-edachali-dbx Jul 11, 2025
dc1cb6d
fix type issues
varun-edachali-dbx Jul 14, 2025
5d04cd0
Merge branch 'main' into sea-migration
varun-edachali-dbx Jul 15, 2025
922c448
explicitly close result queue
varun-edachali-dbx Jul 15, 2025
1a0575a
Complete Fetch Phase (`EXTERNAL_LINKS` disposition and `ARROW` format…
varun-edachali-dbx Jul 16, 2025
c07beb1
SEA Session Configuration Fix: Explicitly convert values to `str` (#…
varun-edachali-dbx Jul 16, 2025
640cc82
SEA: add support for `Hybrid` disposition (#631)
varun-edachali-dbx Jul 17, 2025
8fbca9d
SEA: Reduce network calls for synchronous commands (#633)
varun-edachali-dbx Jul 19, 2025
806e5f5
SEA: Decouple Link Fetching (#632)
varun-edachali-dbx Jul 21, 2025
b57c3f3
Chunk download latency (#634)
saishreeeee Jul 21, 2025
ef5836b
acquire lock before notif + formatting (black)
varun-edachali-dbx Jul 21, 2025
4fd2a3f
Merge branch 'main' into sea-migration
varun-edachali-dbx Jul 23, 2025
26f8947
fix imports
varun-edachali-dbx Jul 23, 2025
2d44596
add get_chunk_link s
varun-edachali-dbx Jul 23, 2025
99e7435
simplify description extraction
varun-edachali-dbx Jul 23, 2025
54ec080
pass session_id_hex to ThriftResultSet
varun-edachali-dbx Jul 23, 2025
f9f9f31
revert to main's extract description
varun-edachali-dbx Jul 23, 2025
51cef2b
validate row count for sync query tests as well
varun-edachali-dbx Jul 23, 2025
387102d
guid_hex -> hex_guid
varun-edachali-dbx Jul 23, 2025
d53d1ea
reduce diff
varun-edachali-dbx Jul 23, 2025
c7810aa
reduce diff
varun-edachali-dbx Jul 23, 2025
b3072bd
reduce diff
varun-edachali-dbx Jul 23, 2025
8be5264
set .value in compression
varun-edachali-dbx Jul 23, 2025
80692e3
reduce diff
varun-edachali-dbx Jul 23, 2025
83e45ae
is_direct_results -> has_more_rows
varun-edachali-dbx Jul 25, 2025
a636bdc
type normalisation for SEA
varun-edachali-dbx Jul 27, 2025
c23d540
Merge branch 'main' into normalise-code
varun-edachali-dbx Jul 29, 2025
152f565
fix type codes by using Thrift ttypes
varun-edachali-dbx Jul 29, 2025
bd2e83b
remove excess call to session_id_hex
varun-edachali-dbx Jul 29, 2025
b4f9e2d
remove session_id_hex args
varun-edachali-dbx Jul 29, 2025
533f74b
document disparity mapping
varun-edachali-dbx Jul 29, 2025
29e4846
ensure valid interval return
varun-edachali-dbx Jul 29, 2025
6481851
more verbose logging for type conversion fail
varun-edachali-dbx Jul 29, 2025
1332a43
Revert "more verbose logging for type conversion fail"
varun-edachali-dbx Jul 29, 2025
44be1ac
stop throwing errors from type conversion
varun-edachali-dbx Jul 29, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions src/databricks/sql/backend/sea/backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
WaitTimeout,
MetadataCommands,
)
from databricks.sql.backend.sea.utils.normalize import normalize_sea_type_to_thrift
from databricks.sql.thrift_api.TCLIService import ttypes

if TYPE_CHECKING:
Expand Down Expand Up @@ -322,6 +323,11 @@ def _extract_description_from_manifest(
# Format: (name, type_code, display_size, internal_size, precision, scale, null_ok)
name = col_data.get("name", "")
type_name = col_data.get("type_name", "")

# Normalize SEA type to Thrift conventions before any processing
type_name = normalize_sea_type_to_thrift(type_name, col_data)

# Now strip _TYPE suffix and convert to lowercase
type_name = (
type_name[:-5] if type_name.endswith("_TYPE") else type_name
).lower()
Expand Down
19 changes: 9 additions & 10 deletions src/databricks/sql/backend/sea/result_set.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,20 +92,19 @@ def _convert_json_types(self, row: List[str]) -> List[Any]:
converted_row = []

for i, value in enumerate(row):
column_name = self.description[i][0]
column_type = self.description[i][1]
precision = self.description[i][4]
scale = self.description[i][5]

try:
converted_value = SqlTypeConverter.convert_value(
value, column_type, precision=precision, scale=scale
)
converted_row.append(converted_value)
except Exception as e:
logger.warning(
f"Error converting value '{value}' to {column_type}: {e}"
)
converted_row.append(value)
converted_value = SqlTypeConverter.convert_value(
value,
column_type,
column_name=column_name,
precision=precision,
scale=scale,
)
converted_row.append(converted_value)

return converted_row

Expand Down
75 changes: 44 additions & 31 deletions src/databricks/sql/backend/sea/utils/conversion.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,60 +50,65 @@ def _convert_decimal(

class SqlType:
"""
SQL type constants
SQL type constants based on Thrift TTypeId values.

The list of types can be found in the SEA REST API Reference:
https://docs.databricks.com/api/workspace/statementexecution/executestatement
These correspond to the normalized type names that come from the SEA backend
after normalize_sea_type_to_thrift processing (lowercase, without _TYPE suffix).
"""

# Numeric types
BYTE = "byte"
SHORT = "short"
INT = "int"
LONG = "long"
FLOAT = "float"
DOUBLE = "double"
DECIMAL = "decimal"
TINYINT = "tinyint" # Maps to TTypeId.TINYINT_TYPE
SMALLINT = "smallint" # Maps to TTypeId.SMALLINT_TYPE
INT = "int" # Maps to TTypeId.INT_TYPE
BIGINT = "bigint" # Maps to TTypeId.BIGINT_TYPE
FLOAT = "float" # Maps to TTypeId.FLOAT_TYPE
DOUBLE = "double" # Maps to TTypeId.DOUBLE_TYPE
DECIMAL = "decimal" # Maps to TTypeId.DECIMAL_TYPE

# Boolean type
BOOLEAN = "boolean"
BOOLEAN = "boolean" # Maps to TTypeId.BOOLEAN_TYPE

# Date/Time types
DATE = "date"
TIMESTAMP = "timestamp"
INTERVAL = "interval"
DATE = "date" # Maps to TTypeId.DATE_TYPE
TIMESTAMP = "timestamp" # Maps to TTypeId.TIMESTAMP_TYPE
INTERVAL_YEAR_MONTH = (
"interval_year_month" # Maps to TTypeId.INTERVAL_YEAR_MONTH_TYPE
)
INTERVAL_DAY_TIME = "interval_day_time" # Maps to TTypeId.INTERVAL_DAY_TIME_TYPE

# String types
CHAR = "char"
STRING = "string"
CHAR = "char" # Maps to TTypeId.CHAR_TYPE
VARCHAR = "varchar" # Maps to TTypeId.VARCHAR_TYPE
STRING = "string" # Maps to TTypeId.STRING_TYPE

# Binary type
BINARY = "binary"
BINARY = "binary" # Maps to TTypeId.BINARY_TYPE

# Complex types
ARRAY = "array"
MAP = "map"
STRUCT = "struct"
ARRAY = "array" # Maps to TTypeId.ARRAY_TYPE
MAP = "map" # Maps to TTypeId.MAP_TYPE
STRUCT = "struct" # Maps to TTypeId.STRUCT_TYPE

# Other types
NULL = "null"
USER_DEFINED_TYPE = "user_defined_type"
NULL = "null" # Maps to TTypeId.NULL_TYPE
UNION = "union" # Maps to TTypeId.UNION_TYPE
USER_DEFINED = "user_defined" # Maps to TTypeId.USER_DEFINED_TYPE


class SqlTypeConverter:
"""
Utility class for converting SQL types to Python types.
Based on the types supported by the Databricks SDK.
Based on the Thrift TTypeId types after normalization.
"""

# SQL type to conversion function mapping
# TODO: complex types
TYPE_MAPPING: Dict[str, Callable] = {
# Numeric types
SqlType.BYTE: lambda v: int(v),
SqlType.SHORT: lambda v: int(v),
SqlType.TINYINT: lambda v: int(v),
SqlType.SMALLINT: lambda v: int(v),
SqlType.INT: lambda v: int(v),
SqlType.LONG: lambda v: int(v),
SqlType.BIGINT: lambda v: int(v),
SqlType.FLOAT: lambda v: float(v),
SqlType.DOUBLE: lambda v: float(v),
SqlType.DECIMAL: _convert_decimal,
Expand All @@ -112,30 +117,34 @@ class SqlTypeConverter:
# Date/Time types
SqlType.DATE: lambda v: datetime.date.fromisoformat(v),
SqlType.TIMESTAMP: lambda v: parser.parse(v),
SqlType.INTERVAL: lambda v: v, # Keep as string for now
SqlType.INTERVAL_YEAR_MONTH: lambda v: v, # Keep as string for now
SqlType.INTERVAL_DAY_TIME: lambda v: v, # Keep as string for now
# String types - no conversion needed
SqlType.CHAR: lambda v: v,
SqlType.VARCHAR: lambda v: v,
SqlType.STRING: lambda v: v,
# Binary type
SqlType.BINARY: lambda v: bytes.fromhex(v),
# Other types
SqlType.NULL: lambda v: None,
# Complex types and user-defined types return as-is
SqlType.USER_DEFINED_TYPE: lambda v: v,
SqlType.USER_DEFINED: lambda v: v,
}

@staticmethod
def convert_value(
value: str,
sql_type: str,
column_name: Optional[str],
**kwargs,
) -> object:
"""
Convert a string value to the appropriate Python type based on SQL type.

Args:
value: The string value to convert
sql_type: The SQL type (e.g., 'int', 'decimal')
sql_type: The SQL type (e.g., 'tinyint', 'decimal')
column_name: The name of the column being converted
**kwargs: Additional keyword arguments for the conversion function

Returns:
Expand All @@ -155,6 +164,10 @@ def convert_value(
return converter_func(value, precision, scale)
else:
return converter_func(value)
except (ValueError, TypeError, decimal.InvalidOperation) as e:
logger.warning(f"Error converting value '{value}' to {sql_type}: {e}")
except Exception as e:
warning_message = f"Error converting value '{value}' to {sql_type}"
if column_name:
warning_message += f" in column {column_name}"
warning_message += f": {e}"
logger.warning(warning_message)
return value
50 changes: 50 additions & 0 deletions src/databricks/sql/backend/sea/utils/normalize.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
"""
Type normalization utilities for SEA backend.

This module provides functionality to normalize SEA type names to match
Thrift type naming conventions.
"""

from typing import Dict, Any

# SEA types that need to be translated to Thrift types
# The list of all SEA types is available in the REST reference at:
# https://docs.databricks.com/api/workspace/statementexecution/executestatement
# The list of all Thrift types can be found in the ttypes.TTypeId definition
# The SEA types that do not align with Thrift are explicitly mapped below
SEA_TO_THRIFT_TYPE_MAP = {
"BYTE": "TINYINT",
"SHORT": "SMALLINT",
"LONG": "BIGINT",
"INTERVAL": "INTERVAL", # Default mapping, will be overridden if type_interval_type is present
}


def normalize_sea_type_to_thrift(type_name: str, col_data: Dict[str, Any]) -> str:
"""
Normalize SEA type names to match Thrift type naming conventions.

Args:
type_name: The type name from SEA (e.g., "BYTE", "LONG", "INTERVAL")
col_data: The full column data dictionary from manifest (for accessing type_interval_type)

Returns:
Normalized type name matching Thrift conventions
"""
# Early return if type doesn't need mapping
if type_name not in SEA_TO_THRIFT_TYPE_MAP:
return type_name

normalized_type = SEA_TO_THRIFT_TYPE_MAP[type_name]

# Special handling for interval types
if type_name == "INTERVAL":
type_interval_type = col_data.get("type_interval_type")
if type_interval_type:
return (
"INTERVAL_YEAR_MONTH"
if any(t in type_interval_type.upper() for t in ["YEAR", "MONTH"])
else "INTERVAL_DAY_TIME"
)

return normalized_type
4 changes: 1 addition & 3 deletions tests/unit/test_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -262,9 +262,7 @@ def test_negative_fetch_throws_exception(self):
mock_backend = Mock()
mock_backend.fetch_results.return_value = (Mock(), False, 0)

result_set = ThriftResultSet(
Mock(), Mock(), mock_backend
)
result_set = ThriftResultSet(Mock(), Mock(), mock_backend)

with self.assertRaises(ValueError) as e:
result_set.fetchmany(-1)
Expand Down
6 changes: 4 additions & 2 deletions tests/unit/test_downloader.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,14 @@ class DownloaderTests(unittest.TestCase):
def _setup_time_mock_for_download(self, mock_time, end_time):
"""Helper to setup time mock that handles logging system calls."""
call_count = [0]

def time_side_effect():
call_count[0] += 1
if call_count[0] <= 2: # First two calls (validation, start_time)
return 1000
else: # All subsequent calls (logging, duration calculation)
return end_time

mock_time.side_effect = time_side_effect

@patch("time.time", return_value=1000)
Expand Down Expand Up @@ -104,7 +106,7 @@ def test_run_get_response_not_ok(self, mock_time):
@patch("time.time")
def test_run_uncompressed_successful(self, mock_time):
self._setup_time_mock_for_download(mock_time, 1000.5)

http_client = DatabricksHttpClient.get_instance()
file_bytes = b"1234567890" * 10
settings = Mock(link_expiry_buffer_secs=0, download_timeout=0, use_proxy=False)
Expand Down Expand Up @@ -133,7 +135,7 @@ def test_run_uncompressed_successful(self, mock_time):
@patch("time.time")
def test_run_compressed_successful(self, mock_time):
self._setup_time_mock_for_download(mock_time, 1000.2)

http_client = DatabricksHttpClient.get_instance()
file_bytes = b"1234567890" * 10
compressed_bytes = b'\x04"M\x18h@d\x00\x00\x00\x00\x00\x00\x00#\x14\x00\x00\x00\xaf1234567890\n\x00BP67890\x00\x00\x00\x00'
Expand Down
60 changes: 60 additions & 0 deletions tests/unit/test_sea_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -550,6 +550,66 @@ def test_extract_description_from_manifest(self, sea_client):
assert description[1][1] == "int" # type_code
assert description[1][6] is None # null_ok

def test_extract_description_from_manifest_with_type_normalization(
self, sea_client
):
"""Test _extract_description_from_manifest with SEA to Thrift type normalization."""
manifest_obj = MagicMock()
manifest_obj.schema = {
"columns": [
{
"name": "byte_col",
"type_name": "BYTE",
},
{
"name": "short_col",
"type_name": "SHORT",
},
{
"name": "long_col",
"type_name": "LONG",
},
{
"name": "interval_ym_col",
"type_name": "INTERVAL",
"type_interval_type": "YEAR TO MONTH",
},
{
"name": "interval_dt_col",
"type_name": "INTERVAL",
"type_interval_type": "DAY TO SECOND",
},
{
"name": "interval_default_col",
"type_name": "INTERVAL",
# No type_interval_type field
},
]
}

description = sea_client._extract_description_from_manifest(manifest_obj)
assert description is not None
assert len(description) == 6

# Check normalized types
assert description[0][0] == "byte_col"
assert description[0][1] == "tinyint" # BYTE -> tinyint

assert description[1][0] == "short_col"
assert description[1][1] == "smallint" # SHORT -> smallint

assert description[2][0] == "long_col"
assert description[2][1] == "bigint" # LONG -> bigint

assert description[3][0] == "interval_ym_col"
assert description[3][1] == "interval_year_month" # INTERVAL with YEAR/MONTH

assert description[4][0] == "interval_dt_col"
assert description[4][1] == "interval_day_time" # INTERVAL with DAY/TIME

assert description[5][0] == "interval_default_col"
assert description[5][1] == "interval" # INTERVAL without subtype

def test_filter_session_configuration(self):
"""Test that _filter_session_configuration converts all values to strings."""
session_config = {
Expand Down
Loading
Loading