Use 250 additional metadata #308

JPrevost · 2025-12-12T16:50:46Z

Please see individual commit messages and ticket for full scope of this work.

https://mitlibraries.atlassian.net/browse/USE-250

Adds citation (Primo only), subject headings, modifies titles for ASpace records, adds date ranges, a summaries.

Developer

Accessibility

ANDI or WAVE has been run in accordance to our guide.
This PR contains no changes to the view layer.
New issues flagged by ANDI or WAVE have been resolved.
New issues flagged by ANDI or WAVE have been ticketed (link in the Pull Request details above).
No new accessibility issues have been flagged.

New ENV

All new ENV is documented in README.
All new ENV has been added to Heroku Pipeline, Staging and Prod.
ENV has not changed.

Approval beyond code review

UXWS/stakeholder approval has been confirmed.
UXWS/stakeholder review will be completed retroactively.
UXWS/stakeholder review is not needed.

Additional context needed to review

E.g., if the PR includes updated dependencies and/or data
migration, or how to confirm the feature is working.

Code Reviewer

Code

I have confirmed that the code works as intended.
Any CodeClimate issues have been fixed or confirmed as
added technical debt.

Documentation

The commit message is clear and follows our guidelines
(not just this pull request message).
The documentation has been updated or is unnecessary.
New dependencies are appropriate or there were no changes.

Testing

There are appropriate tests covering any new functionality.
No additional test coverage is required.

coveralls · 2025-12-12T16:52:23Z

Pull Request Test Coverage Report for Build 20236255255

Details

19 of 22 (86.36%) changed or added relevant lines in 3 files are covered.
1 unchanged line in 1 file lost coverage.
Overall coverage decreased (-0.3%) to 97.906%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
app/models/normalize_timdex_record.rb	16	19	84.21%

Files with Coverage Reduction	New Missed Lines	%
app/models/normalize_primo_record.rb	1	99.35%

Totals
Change from base Build 20235711196:	-0.3%
Covered Lines:	1169
Relevant Lines:	1194

💛 - Coveralls

matt-bernhardt

My suggested change is about the logic in the date_range method, and whether there's a pathway for valid ranges to be missed. If I've misunderstood the data we're dealing with, though, I'm happy to change this to an approval as-is.

The other comments ask minor concerns I have that can be either ignored or dealt with in future tickets:

Whether we need something actionable if we encounter archivespace records that are unexpected
Whether we need consistent tags around summary elements between timdex, primo, and geo records

I also want to make sure that I'm not missing something about the handling (and non-display) of timdex citation fields.

I mentioned this in Slack, but will repeat here for posterity - thanks for breaking up the field changes into separate commits. That made this review vastly easier to understand in context.

matt-bernhardt · 2025-12-12T19:16:01Z

app/models/normalize_timdex_record.rb

+
+    # In the highly unlikely event that there is more than one collection identifier, there's something weird going
+    # on with the record and we should skip it.
+    return if relevant_ids.count > 1


Non-blocking question: Do we need to alert or note anything when this happens?

I don't think so. There is potential value in being consistent and I know the code you just landed reports on things like this that we don't expect to happen, but I feel like this is more defensive programming than anything else and the alerts. If someone in archives somehow catalogs two collection identifiers into a single collection I'm not sure if this is the best place to detect and report on that. I guess the potential value is in better understanding if our assumption this shouldn't happen is true (i.e. if it happens a lot, we learned something interesting?)

Meh, I've changed my mind when looking more closely at this again and will follow the logic you laid out for similar situations

Added logic borrowed from multiple pmid/dois

matt-bernhardt · 2025-12-12T19:24:28Z

app/models/normalize_timdex_record.rb

+    # If the record *does* have more than one creation/pub date, just take the first one. Note: ASpace records often
+    # have more than one. Sometimes they are duplicates, sometimes they are different. For now we will just take the
+    # first.
+    relevant_date = relevant_dates.first


I'm a little concerned that this block will miss valid date ranges that may be in the data. The scenario is:

relevant_dates gets populated with multiple creation / publication entries

The first of these entries is not a range, but later ones are

Because we only grab the first entry in this list, and it is not a range, this method returns empty while a valid range got discarded.

It feels like a more robust approach would be to do all the filtering prior to the first return clause, something like .select { |date| $w[creation publication.include?(date['kind']) && date['range'].present? } ?

To be clear, I haven't looked at whether this condition exists in Timdex records - if dates are all either ranges or not-ranges, then this concern may be irrelevant and what you've written here is fine.

Good point. This logic was ported from Bento but Bento only ever dealt with ASpace records so it is likely we need to revisit the logic.

Updated using general idea you proposed and added a test to confirm behavior.

matt-bernhardt · 2025-12-12T20:17:22Z

app/views/search/_result.html.erb

      <% end %>
+
+      <% if result[:summary].present? %>
+        <div class="result-summary truncate-list">


Non-blocking concern: I see that we're using a div element here, while the geodata partial uses a paragraph element for the summary. I don't know if that's meaningful, and is ultimately a question for Dave I think, but there's a part of me that worries about unintended divergence in the templates that provide each instance.

GeoData needs to be completely reworked once we stabilize USE. I think consistency within the USE partials is the main focus right now, and yeah I'm letting Dave determine ultimate markup.

matt-bernhardt · 2025-12-12T20:39:28Z

app/models/timdex_search.rb

            kind
            value
          }
+          citation


Non-blocking question about the citation field specifically:

Adding the citation field here, and the update to the citation method in the Timdex normalizer, don't seem to be paired with any relevant change to the timdex result partial - we're only adding a comment there that things aren't ready to display yet? (It feels like the method in the normalizer was never called, if the query never populated the field)

I agree with the comment you've added that citations in timdex need more work to be useful in this way, so I'm treating the changes in this work as part of that, but wanted to state my reading here in case I've missed a change in the UI logic.

You are correct. I initially implemented the citation in the timdex partial but the data is so wonky in the citation I took it out of the view. I left it in the query intact so it will be easier to show people the weird data in the near future without having to rebuild cassettes.

matt-bernhardt · 2025-12-12T20:57:45Z

app/views/search/_result_primo.html.erb

+      <% end %>
+
+      <% if result[:summary].present? %>
+        <div class="result-summary truncate-list">


Same non-blocking concern here as for the timdex result partial above, about whether the markup should be uniform between timdex / primo / geo partials.

matt-bernhardt

The new logic in the date range handling looks good, and the test makes sense. Thanks!

Why are these changes being introduced: * Collection information is important context for users and staff Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/USE-250 How does this address that need: * Updates GraphQL query to retrieve identifiers, which includes collection number * Adds method to normalize_timdex_record to append collection number to record title

Why are these changes being introduced: * ArchivesSpace records use ranges more than individual creation dates Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/USE-250 How does this address that need: * Updates graphql query to return date ranges * Adds new normalized `date_range` field to timdex records * Updates result views to display date ranges * Update existing date range helper method to be specific to geodata Document any side effects to this change: * GeoData uses different logic I didn't want to think too hard about now so I renamed the existing method to `geo_date_range` to make it clear it is only for geodata

Why are these changes being introduced: * Summary information is useful to users in determining the relevance of a record. Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/USE-250 How does this address that need: * Displays sumamries if they exist in the normalized records and uses CSS to limit the number of lines shown in the same way we do in GeoData records. Document any side effects to this change: Added standard `line-clamp` to summaries for browsers that support it rather than just the WebKit-specific version.

Why are these changes being introduced: * This additional information is sometimes helpful to users Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/USE-250 How does this address that need: * Returns summaries from TIMDEX * Adjusts normalization rules * Updates result partials to show new data Document any side effects to this change: * Primo CDI and other sources have different formatting for the subjects (`;` separated rather than arrays of strings). We could adjust this in the future, but getting feedback on which is desired so we can normalize them both to the same format would be helpful.

Why are these changes being introduced: * Citation provides important context to articles Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/USE-250 How does this address that need: * Adds citation display to Primo results partial * Updates Primo normalization to pull citation from `ispartof` field * Removes a test that was checking for constructed citations of different formats that is no longer relevant Document any side effects to this change: * Pulling citation from Primo `ispartof` rather than trying to construct it from multiple fields.

Note: a bunch of seemingly unused cassettes were deleted and did not regenerate

Also adds logic to log to sentry if multiple collection IDs are detected

mitlib temporarily deployed to timdex-ui-pi-use-250-ad-k1l7bd December 12, 2025 16:55 Inactive

JPrevost requested review from djanelle-mit, jazairi and matt-bernhardt December 12, 2025 17:00

matt-bernhardt self-assigned this Dec 12, 2025

matt-bernhardt requested changes Dec 12, 2025

View reviewed changes

JPrevost force-pushed the use-250-additional-metadata branch from 1929e58 to ff3f750 Compare December 12, 2025 21:40

JPrevost temporarily deployed to timdex-ui-pi-use-250-ad-k1l7bd December 12, 2025 21:40 Inactive

JPrevost temporarily deployed to timdex-ui-pi-use-250-ad-k1l7bd December 12, 2025 21:46 Inactive

JPrevost requested a review from matt-bernhardt December 12, 2025 21:52

matt-bernhardt approved these changes Dec 12, 2025

View reviewed changes

JPrevost added 8 commits December 15, 2025 09:42

TIMDEX Cassette regeneration

1d8eb2d

Note: a bunch of seemingly unused cassettes were deleted and did not regenerate

Ensure date_ranges only select dates with ranges

2564dce

Also adds logic to log to sentry if multiple collection IDs are detected

Adds test for date range

849a498

JPrevost force-pushed the use-250-additional-metadata branch from 0dc2169 to 849a498 Compare December 15, 2025 14:43

JPrevost temporarily deployed to timdex-ui-pi-use-250-ad-k1l7bd December 15, 2025 14:43 Inactive

JPrevost merged commit d38b405 into main Dec 15, 2025
5 checks passed

JPrevost deleted the use-250-additional-metadata branch December 15, 2025 14:56

Use 250 additional metadata #308

Use 250 additional metadata #308

Uh oh!

Conversation

JPrevost commented Dec 12, 2025

Developer

Accessibility

New ENV

Approval beyond code review

Additional context needed to review

Code Reviewer

Code

Documentation

Testing

Uh oh!

coveralls commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 20236255255

Details

💛 - Coveralls

Uh oh!

matt-bernhardt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matt-bernhardt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

coveralls commented Dec 12, 2025 •

edited

Loading