Skip to content

Conversation

Davda-James
Copy link
Contributor

Description

skip reprocessing of entire rows based on finegrained lineage information

Closes Issue: #1057

  • Ran precommit-hooks
  • Written tests for change

Copy link
Member

@georgeh0 georgeh0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for having a try on this!

This is something really touching the core engine and multiple components, so its complexity is more than many other issues (that's why I didn't put a "help wanted" label on it).

I shared more thoughts on this thread:
#1057 (comment)

Please let me know if you have any thoughts. Thanks!

Comment on lines +1130 to +1133
let mut lineage_fp = Fingerprinter::default();
lineage_fp = lineage_fp.with(&flow_inst.export_ops)?;
lineage_fp = lineage_fp.with(&flow_schema.schema)?;
let lineage_fingerprint = lineage_fp.into_fingerprint();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will miss certain information, like the spec changes (e.g. chunk size is changed for SplitRecursively), or behavior version change.

It will also be affected unnecessarily by some information, e.g. a field name (e.g. users may rename products to files in this example), and also ordering of ops (users may reorder some functions which will affect fields ordering in schema, they may also reordering collectors and export ops).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants