Skip to content
This repository was archived by the owner on May 17, 2024. It is now read-only.
This repository was archived by the owner on May 17, 2024. It is now read-only.

Diff'ing of tables with alphanumeric pk of varying length not as effective #252

@matthiasekundayo-eb

Description

@matthiasekundayo-eb

Describe the bug

When diff'ing tables with alphanumeric pk of varying lengths, sometimes it runs successfully, while other times it throws error ValueError: Cannot apply String_FixedAlphanum(length=10) to 10002079489, 9984661391 I guess data-diff gets the max length of the pk column based on the first segment it gets after splitting the data in the table, so if the first segment contains the value with the max length, it runs successfully but fails when the max length is not in the first segment.

I am following the python command (this is not the exact command, but the error is in the diff_tables function):

import logging
logging.basicConfig(level=logging.INFO)

from data_diff import connect_to_table, diff_tables

table1 = connect_to_table("presto:///", "table_name", "id")
table2 = connect_to_table("snowflake:///", "table_name", "id")

for different_row in diff_tables(table1, table2):
    plus_or_minus, columns = different_row
    print(plus_or_minus, columns)

Here is the tracestack:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/data_diff/diff_tables.py", line 174, in _parse_key_range_result
    return cls(mn), cls(mx) + 1
  File "/usr/local/lib/python3.10/site-packages/data_diff/databases/database_types.py", line 118, in make_value
    raise ValueError(f"Expected alphanumeric value of length {self.length}, but got '{value}'.")
ValueError: Expected alphanumeric value of length 10, but got '10002079489'.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/eb_presto_snowflake_datadiff.py", line 85, in execute
    for different_row in diff_tables(
  File "/usr/local/lib/python3.10/site-packages/data_diff/diff_tables.py", line 167, in diff_tables
    raise error
  File "/usr/local/lib/python3.10/site-packages/data_diff/diff_tables.py", line 118, in diff_tables
    min_key1, max_key1 = self._parse_key_range_result(key_type, next(key_ranges))
  File "/usr/local/lib/python3.10/site-packages/data_diff/diff_tables.py", line 176, in _parse_key_range_result
    raise type(e)(f"Cannot apply {key_type} to {mn}, {mx}.") from e
ValueError: Cannot apply String_FixedAlphanum(length=10) to 10002079489, 9984661391.

I have another table with alphanumeric pk of varying length that doesn't run successfully at all. I get similar error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/data_diff/diff_tables.py", line 174, in _parse_key_range_result
    return cls(mn), cls(mx) + 1
  File "/usr/local/lib/python3.10/site-packages/data_diff/databases/database_types.py", line 118, in make_value
    raise ValueError(f"Expected alphanumeric value of length {self.length}, but got '{value}'.")
ValueError: Expected alphanumeric value of length 12, but got '1000000291643'.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/eb_presto_snowflake_datadiff.py", line 85, in execute
    for different_row in diff_tables(
  File "/usr/local/lib/python3.10/site-packages/data_diff/diff_tables.py", line 167, in diff_tables
    raise error
  File "/usr/local/lib/python3.10/site-packages/data_diff/diff_tables.py", line 118, in diff_tables
    min_key1, max_key1 = self._parse_key_range_result(key_type, next(key_ranges))
  File "/usr/local/lib/python3.10/site-packages/data_diff/diff_tables.py", line 176, in _parse_key_range_result
    raise type(e)(f"Cannot apply {key_type} to {mn}, {mx}.") from e
ValueError: Cannot apply String_FixedAlphanum(length=12) to 1000000291643, 999999989543

Describe the environment
I am using MacOS Monterey v12.6, data-diff v0.2.8.
I am comparing tables in Presto vs Snowflake

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions