-
Notifications
You must be signed in to change notification settings - Fork 0
[DEVX-375]: Data Ingestion Pipeline #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sainivedh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some comments
phatvo9
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @sanjaychelliah . I left some comments.
sainivedh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Tremendous effort!
Overview
The primary goal is to make Users be able to use clarifai-datautils library for data ETL process with ease. This library should be used along with the Python SDK to easily load text files(pdf, doc, etc..) , transform, chunk and upload to the Clarifai Platform. The requirement was to give users pipelines to define and use it to ingest data chunks into the Platform. For this implementation, unstructured library will be used internally.
Usage
Added
TODO: