Skip to content

Feat: Add support for concurrent table diff across all impacted models #4256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

themisvaltinos
Copy link
Contributor

This update adds support for the first point in this issue: #4198
When using the table_diff command without specifying a model sqlmesh table_diff source:prod it diffs all models impacted directly or indirectly. It does this by using the context diff between the two environments to identify affected models, then runs table_diff in each to determine which tables have changed and finally display the corresponding table diffs.

  • If the --show-sample flag is included, the output also includes sample rows.
  • If the engine supports it, it runs all the table diff concurrently.

@@ -197,7 +197,7 @@ def show_intervals(self, snapshot_intervals: t.Dict[Snapshot, SnapshotIntervals]


class DifferenceConsole(abc.ABC):
"""Console for displaying environment differences"""
"""Console for displaying differences"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benfdking is this how you intended for this interface to be used?

)

table_diffs: t.List[TableDiff] = []
with ThreadPoolExecutor(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes good point


def run_diff(snapshot_name: str) -> TableDiff:
return self.table_diff(
source=source,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So are we going to fetch both environments N more times where N is the number of snapshots that are different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right didn't realise that, I should refactor this so it doesn't call table_diff or break it up into methods to guarantee nothing is called twice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants