-
Notifications
You must be signed in to change notification settings - Fork 294
Diff'ing of tables with alphanumeric pk of varying length not as effective #252
Description
Describe the bug
When diff'ing tables with alphanumeric pk of varying lengths, sometimes it runs successfully, while other times it throws error ValueError: Cannot apply String_FixedAlphanum(length=10) to 10002079489, 9984661391
I guess data-diff gets the max length of the pk column based on the first segment it gets after splitting the data in the table, so if the first segment contains the value with the max length, it runs successfully but fails when the max length is not in the first segment.
I am following the python command (this is not the exact command, but the error is in the diff_tables function):
import logging
logging.basicConfig(level=logging.INFO)
from data_diff import connect_to_table, diff_tables
table1 = connect_to_table("presto:///", "table_name", "id")
table2 = connect_to_table("snowflake:///", "table_name", "id")
for different_row in diff_tables(table1, table2):
plus_or_minus, columns = different_row
print(plus_or_minus, columns)
Here is the tracestack:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/data_diff/diff_tables.py", line 174, in _parse_key_range_result
return cls(mn), cls(mx) + 1
File "/usr/local/lib/python3.10/site-packages/data_diff/databases/database_types.py", line 118, in make_value
raise ValueError(f"Expected alphanumeric value of length {self.length}, but got '{value}'.")
ValueError: Expected alphanumeric value of length 10, but got '10002079489'.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/eb_presto_snowflake_datadiff.py", line 85, in execute
for different_row in diff_tables(
File "/usr/local/lib/python3.10/site-packages/data_diff/diff_tables.py", line 167, in diff_tables
raise error
File "/usr/local/lib/python3.10/site-packages/data_diff/diff_tables.py", line 118, in diff_tables
min_key1, max_key1 = self._parse_key_range_result(key_type, next(key_ranges))
File "/usr/local/lib/python3.10/site-packages/data_diff/diff_tables.py", line 176, in _parse_key_range_result
raise type(e)(f"Cannot apply {key_type} to {mn}, {mx}.") from e
ValueError: Cannot apply String_FixedAlphanum(length=10) to 10002079489, 9984661391.
I have another table with alphanumeric pk of varying length that doesn't run successfully at all. I get similar error:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/data_diff/diff_tables.py", line 174, in _parse_key_range_result
return cls(mn), cls(mx) + 1
File "/usr/local/lib/python3.10/site-packages/data_diff/databases/database_types.py", line 118, in make_value
raise ValueError(f"Expected alphanumeric value of length {self.length}, but got '{value}'.")
ValueError: Expected alphanumeric value of length 12, but got '1000000291643'.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/eb_presto_snowflake_datadiff.py", line 85, in execute
for different_row in diff_tables(
File "/usr/local/lib/python3.10/site-packages/data_diff/diff_tables.py", line 167, in diff_tables
raise error
File "/usr/local/lib/python3.10/site-packages/data_diff/diff_tables.py", line 118, in diff_tables
min_key1, max_key1 = self._parse_key_range_result(key_type, next(key_ranges))
File "/usr/local/lib/python3.10/site-packages/data_diff/diff_tables.py", line 176, in _parse_key_range_result
raise type(e)(f"Cannot apply {key_type} to {mn}, {mx}.") from e
ValueError: Cannot apply String_FixedAlphanum(length=12) to 1000000291643, 999999989543
Describe the environment
I am using MacOS Monterey v12.6, data-diff v0.2.8.
I am comparing tables in Presto vs Snowflake