-
Notifications
You must be signed in to change notification settings - Fork 0
Use 250 additional metadata #308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pull Request Test Coverage Report for Build 20236255255Details
💛 - Coveralls |
matt-bernhardt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggested change is about the logic in the date_range method, and whether there's a pathway for valid ranges to be missed. If I've misunderstood the data we're dealing with, though, I'm happy to change this to an approval as-is.
The other comments ask minor concerns I have that can be either ignored or dealt with in future tickets:
- Whether we need something actionable if we encounter archivespace records that are unexpected
- Whether we need consistent tags around summary elements between timdex, primo, and geo records
I also want to make sure that I'm not missing something about the handling (and non-display) of timdex citation fields.
I mentioned this in Slack, but will repeat here for posterity - thanks for breaking up the field changes into separate commits. That made this review vastly easier to understand in context.
|
|
||
| # In the highly unlikely event that there is more than one collection identifier, there's something weird going | ||
| # on with the record and we should skip it. | ||
| return if relevant_ids.count > 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking question: Do we need to alert or note anything when this happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. There is potential value in being consistent and I know the code you just landed reports on things like this that we don't expect to happen, but I feel like this is more defensive programming than anything else and the alerts. If someone in archives somehow catalogs two collection identifiers into a single collection I'm not sure if this is the best place to detect and report on that. I guess the potential value is in better understanding if our assumption this shouldn't happen is true (i.e. if it happens a lot, we learned something interesting?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meh, I've changed my mind when looking more closely at this again and will follow the logic you laid out for similar situations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added logic borrowed from multiple pmid/dois
| # If the record *does* have more than one creation/pub date, just take the first one. Note: ASpace records often | ||
| # have more than one. Sometimes they are duplicates, sometimes they are different. For now we will just take the | ||
| # first. | ||
| relevant_date = relevant_dates.first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little concerned that this block will miss valid date ranges that may be in the data. The scenario is:
relevant_datesgets populated with multiple creation / publication entries- The first of these entries is not a range, but later ones are
- Because we only grab the first entry in this list, and it is not a range, this method returns empty while a valid range got discarded.
It feels like a more robust approach would be to do all the filtering prior to the first return clause, something like .select { |date| $w[creation publication.include?(date['kind']) && date['range'].present? } ?
To be clear, I haven't looked at whether this condition exists in Timdex records - if dates are all either ranges or not-ranges, then this concern may be irrelevant and what you've written here is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. This logic was ported from Bento but Bento only ever dealt with ASpace records so it is likely we need to revisit the logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated using general idea you proposed and added a test to confirm behavior.
| <% end %> | ||
|
|
||
| <% if result[:summary].present? %> | ||
| <div class="result-summary truncate-list"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking concern: I see that we're using a div element here, while the geodata partial uses a paragraph element for the summary. I don't know if that's meaningful, and is ultimately a question for Dave I think, but there's a part of me that worries about unintended divergence in the templates that provide each instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GeoData needs to be completely reworked once we stabilize USE. I think consistency within the USE partials is the main focus right now, and yeah I'm letting Dave determine ultimate markup.
| kind | ||
| value | ||
| } | ||
| citation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking question about the citation field specifically:
Adding the citation field here, and the update to the citation method in the Timdex normalizer, don't seem to be paired with any relevant change to the timdex result partial - we're only adding a comment there that things aren't ready to display yet? (It feels like the method in the normalizer was never called, if the query never populated the field)
I agree with the comment you've added that citations in timdex need more work to be useful in this way, so I'm treating the changes in this work as part of that, but wanted to state my reading here in case I've missed a change in the UI logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct. I initially implemented the citation in the timdex partial but the data is so wonky in the citation I took it out of the view. I left it in the query intact so it will be easier to show people the weird data in the near future without having to rebuild cassettes.
| <% end %> | ||
|
|
||
| <% if result[:summary].present? %> | ||
| <div class="result-summary truncate-list"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same non-blocking concern here as for the timdex result partial above, about whether the markup should be uniform between timdex / primo / geo partials.
1929e58 to
ff3f750
Compare
matt-bernhardt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new logic in the date range handling looks good, and the test makes sense. Thanks!
![]()
Why are these changes being introduced: * Collection information is important context for users and staff Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/USE-250 How does this address that need: * Updates GraphQL query to retrieve identifiers, which includes collection number * Adds method to normalize_timdex_record to append collection number to record title
Why are these changes being introduced: * ArchivesSpace records use ranges more than individual creation dates Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/USE-250 How does this address that need: * Updates graphql query to return date ranges * Adds new normalized `date_range` field to timdex records * Updates result views to display date ranges * Update existing date range helper method to be specific to geodata Document any side effects to this change: * GeoData uses different logic I didn't want to think too hard about now so I renamed the existing method to `geo_date_range` to make it clear it is only for geodata
Why are these changes being introduced: * Summary information is useful to users in determining the relevance of a record. Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/USE-250 How does this address that need: * Displays sumamries if they exist in the normalized records and uses CSS to limit the number of lines shown in the same way we do in GeoData records. Document any side effects to this change: Added standard `line-clamp` to summaries for browsers that support it rather than just the WebKit-specific version.
Why are these changes being introduced: * This additional information is sometimes helpful to users Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/USE-250 How does this address that need: * Returns summaries from TIMDEX * Adjusts normalization rules * Updates result partials to show new data Document any side effects to this change: * Primo CDI and other sources have different formatting for the subjects (`;` separated rather than arrays of strings). We could adjust this in the future, but getting feedback on which is desired so we can normalize them both to the same format would be helpful.
Why are these changes being introduced: * Citation provides important context to articles Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/USE-250 How does this address that need: * Adds citation display to Primo results partial * Updates Primo normalization to pull citation from `ispartof` field * Removes a test that was checking for constructed citations of different formats that is no longer relevant Document any side effects to this change: * Pulling citation from Primo `ispartof` rather than trying to construct it from multiple fields.
Note: a bunch of seemingly unused cassettes were deleted and did not regenerate
Also adds logic to log to sentry if multiple collection IDs are detected
0dc2169 to
849a498
Compare
Please see individual commit messages and ticket for full scope of this work.
https://mitlibraries.atlassian.net/browse/USE-250
Adds citation (Primo only), subject headings, modifies titles for ASpace records, adds date ranges, a summaries.
Developer
Accessibility
New ENV
Approval beyond code review
Additional context needed to review
E.g., if the PR includes updated dependencies and/or data
migration, or how to confirm the feature is working.
Code Reviewer
Code
added technical debt.
Documentation
(not just this pull request message).
Testing