Skip to content

Conversation

@msrathore-db
Copy link
Collaborator

@msrathore-db msrathore-db commented Sep 3, 2025

Description

Added support for Variant type. Users can now insert and read from the Variant type columns.

Pandas being one of the most important use cases for SQLAlchemy, additional testing has been added to test for Pandas specific use cases. User needs to explicitly provide the dtype parameter for pandas.to_sql() that'll define which columns are of variant type.

Testing

Added unit and E2E tests for DatabricksVariant type

  • Table creation (DDL)
  • Data insertion and retrieval via ORM and pandas
  • Data comparison and round-trip correctness

Related tickets and documents

PECOBLR-666

Copy link
Collaborator

@jprakash-db jprakash-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the variant type in the sqlalchemy_example.py file, so that users have an example

return tuple(value)
elif isinstance(value, dict):
return tuple(value.items())
return tuple(sorted(value.items()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the sorting needed? the response from the server is in the same order that we insert right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed because parse_json makes changes to the order of the insertion. So we need to verify after sorting itself,

Comment on lines +256 to +258
for key in ['variant_simple_col', 'variant_nested_col', 'variant_array_col', 'variant_mixed_col']:
if compare[key] is not None:
compare[key] = json.loads(compare[key])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this part even needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes because we get a string so it's better to verify after converting it from JSON to check if the output matches.

Comment on lines 437 to 450
def literal_processor(self, dialect):
"""Process literal values for SQL generation.
For VARIANT columns, use PARSE_JSON() to properly insert data.
"""
def process(value):
if value is None:
return "NULL"
try:
return self.pe.escape_string(json.dumps(value, ensure_ascii=False, separators=(',', ':')))
except (TypeError, ValueError) as e:
raise ValueError(f"Cannot serialize value {value} to JSON: {e}")

return f"PARSE_JSON('{process}')"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you have bind processor, why do you need literal processor? both seems to be doing the same thing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Literal processor is called when literal_bind parameter is set to true for a SQL alchemy query. We can remove this section since the default value for the parameter is false

Comment on lines +268 to +272
dtype_mapping = {
"variant_simple_col": DatabricksVariant,
"variant_nested_col": DatabricksVariant,
"variant_array_col": DatabricksVariant,
"variant_mixed_col": DatabricksVariant
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there are other types apart from variant, such as int or array,etc. Does this dtype mapping need to provided for only the variant columns or for all

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to provide the entire mapping. If we do not provide this mapping then the data is stored as a string for complex type. However for the general types like int, float, etc we do not need to explicitly map

Copy link
Collaborator

@jprakash-db jprakash-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for making the changes

@msrathore-db msrathore-db merged commit a6f4460 into main Sep 5, 2025
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants