-
Notifications
You must be signed in to change notification settings - Fork 11
[PECOBLR-666]Added support for Variant datatype in SQLAlchemy #42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the variant type in the sqlalchemy_example.py file, so that users have an example
| return tuple(value) | ||
| elif isinstance(value, dict): | ||
| return tuple(value.items()) | ||
| return tuple(sorted(value.items())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the sorting needed? the response from the server is in the same order that we insert right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed because parse_json makes changes to the order of the insertion. So we need to verify after sorting itself,
| for key in ['variant_simple_col', 'variant_nested_col', 'variant_array_col', 'variant_mixed_col']: | ||
| if compare[key] is not None: | ||
| compare[key] = json.loads(compare[key]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this part even needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes because we get a string so it's better to verify after converting it from JSON to check if the output matches.
| def literal_processor(self, dialect): | ||
| """Process literal values for SQL generation. | ||
| For VARIANT columns, use PARSE_JSON() to properly insert data. | ||
| """ | ||
| def process(value): | ||
| if value is None: | ||
| return "NULL" | ||
| try: | ||
| return self.pe.escape_string(json.dumps(value, ensure_ascii=False, separators=(',', ':'))) | ||
| except (TypeError, ValueError) as e: | ||
| raise ValueError(f"Cannot serialize value {value} to JSON: {e}") | ||
|
|
||
| return f"PARSE_JSON('{process}')" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you have bind processor, why do you need literal processor? both seems to be doing the same thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Literal processor is called when literal_bind parameter is set to true for a SQL alchemy query. We can remove this section since the default value for the parameter is false
| dtype_mapping = { | ||
| "variant_simple_col": DatabricksVariant, | ||
| "variant_nested_col": DatabricksVariant, | ||
| "variant_array_col": DatabricksVariant, | ||
| "variant_mixed_col": DatabricksVariant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there are other types apart from variant, such as int or array,etc. Does this dtype mapping need to provided for only the variant columns or for all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to provide the entire mapping. If we do not provide this mapping then the data is stored as a string for complex type. However for the general types like int, float, etc we do not need to explicitly map
merged changes from main
…essor for variant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for making the changes
Description
Added support for Variant type. Users can now insert and read from the Variant type columns.
Pandas being one of the most important use cases for SQLAlchemy, additional testing has been added to test for Pandas specific use cases. User needs to explicitly provide the
dtypeparameter forpandas.to_sql()that'll define which columns are of variant type.Testing
Added unit and E2E tests for
DatabricksVarianttypeRelated tickets and documents
PECOBLR-666