Skip to content

feat(cdk): handle include_files from configured catalog #512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: ac8/file-api/connector-builder-support-3
Choose a base branch
from

Conversation

aldogonzalez8
Copy link
Contributor

What

There is a scenario where the user may not want to include files from a file stream, so they will only get metadata. We were not handling this.

How

Read value from the stream configured catalog and check the include_files value, if false/none use NoopWriter, if true, it is fine to use the FileSystemWriter.

This only happens for read operations.

@aldogonzalez8 aldogonzalez8 self-assigned this Apr 26, 2025
@github-actions github-actions bot added the enhancement New feature or request label Apr 26, 2025
@aldogonzalez8 aldogonzalez8 requested a review from maxi297 April 27, 2025 15:43
Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally, it seems good. I have one design concern. Let me know what you think

@@ -229,7 +230,9 @@ def connection_checker(self) -> ConnectionChecker:
f"Expected to generate a ConnectionChecker component, but received {check_stream.__class__}"
)

def streams(self, config: Mapping[str, Any]) -> List[Stream]:
def streams(
self, config: Mapping[str, Any], catalog: Optional[ConfiguredAirbyteCatalog] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit ambivalent on modifying this interface because now, it does not align with the parent interface so it seems like we are breaking compatibility and it'll be hard to maintain with doing isinstance before calling which means that we will lose most of the benefits of inheritance.

Could we pass the catalog at ModelToComponentFactory.__init__ as a mandatory field? It seems like we are already passing the information here. This would avoid the interface problem, the problem of managing an optional catalog and the problem of passing include_files through kwargs. We might want to do it here too if it's not too much trouble.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants