-
Notifications
You must be signed in to change notification settings - Fork 778
Refactored ParsedMetadata
/ChunkMetadata
to reflect all options
#1132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the metadata structure for parsing and chunking operations to consolidate configuration options into hash-like summary fields instead of separate type fields.
- Replaced
parse_type
andchunk_type
fields with comprehensivesummary
fields that include all relevant options - Removed deprecated
ParsingOptions
enum and markedchunking_algorithm
as deprecated - Added
multimodal
parameter to autogenerated index names to ensure unique indexes for different parsing configurations
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
tests/test_paperqa.py | Updates test assertions to use new summary field format and adds multimodal parameter test |
src/paperqa/types.py | Refactors ChunkMetadata and ParsedMetadata to replace type fields with summary fields |
src/paperqa/settings.py | Removes deprecated ParsingOptions , marks chunking_algorithm as deprecated, adds multimodal to index naming |
src/paperqa/readers.py | Updates parsing and chunking logic to generate summary strings instead of type identifiers |
src/paperqa/docs.py | Updates condition to check summary field instead of parse_type |
packages/paper-qa-pypdf/tests/test_paperqa_pypdf.py | Updates test to check summary field format |
packages/paper-qa-pypdf/src/paperqa_pypdf/reader.py | Generates summary string with multimodal information |
packages/paper-qa-pymupdf/tests/test_paperqa_pymupdf.py | Updates test to check summary field format |
packages/paper-qa-pymupdf/src/paperqa_pymupdf/reader.py | Generates detailed summary string with parsing parameters |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
8e01f4d
to
9e1662f
Compare
return self | ||
|
||
|
||
class ParsingOptions(StrEnum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The premise behind this has been deprecated -- since it's now captured in the input settings.
In the future if there's some compatibility relationship between parsings and chunking algorithms we'll need to add this code back in. It's why it was originally left here FYI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I know what you mean, we can restore in the future when needed
5293235
to
ddc4a06
Compare
This PR cleans up some tech debt in our parsing/chunking schemes
Metadata.summary
fieldParsingOptions
and deprecatingchunking_algorithm
multimodal
to the autogenerated index name, with a test