Skip to content

Conversation

@mccle
Copy link

@mccle mccle commented Nov 12, 2025

Fixes #8599.

Description

PersistentDataset currently casts all MetaTensor objects to torch.Tensor objects and forces the use of torch.load with weights_only=True. This makes it impossible to save or load metadata to cached files, which may be necessary for accurate post-transform operations.

To address this, this PR introduces the track_meta and weights_only arguments directly to PersistentDataset. They are internally passed to convert_to_tensor and torch.load, respectively. A ValueError is raised when track_meta=True and weights_only=True, since MetaTensor objects cannot be loaded with weights_only=True and the cached files would be continually deleted and rewritten.

These changes restore the ability to cache MetaTensor objects by allowing explicit control over data casting and torch.load behavior. The default values of track_meta=False and weights_only=True will preserve the current behavior of PersistentDataset.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 12, 2025

Walkthrough

The pull request introduces two new optional parameters to PersistentDataset: track_meta and weights_only. These flags control metadata preservation and tensor loading behavior during caching operations. The constructor enforces a validation rule that raises ValueError when both track_meta=True and weights_only=True are set simultaneously. The parameters are propagated through cache loading (via torch.load) and cache saving (via torch.save) paths. Test coverage is expanded with parameterized test cases validating the new flag combinations and error conditions.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Review the validation constraint logic in the constructor—understand why track_meta=True and weights_only=True cannot coexist
  • Verify parameter propagation paths in _cachecheck method for both cache read and write operations
  • Confirm test cases properly exercise all valid combinations and error scenarios

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly summarizes the main change: adding track_meta and weights_only arguments to PersistentDataset for MetaTensor support.
Description check ✅ Passed Description follows template with all required sections completed: fix reference, detailed explanation, types of changes marked appropriately, and test/documentation updates confirmed.
Linked Issues check ✅ Passed PR addresses all objectives from #8599: restores MetaTensor caching capability, prevents metadata loss, allows control over torch.load behavior, and maintains backward compatibility with safe defaults.
Out of Scope Changes check ✅ Passed All changes are scoped to PersistentDataset implementation and corresponding tests; no unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/data/test_persistentdataset.py (1)

179-211: Consider validating metadata preservation.

The test correctly validates the type of the returned object, but doesn't verify that metadata is actually preserved when track_meta=True. Consider adding an assertion to check that the MetaTensor contains expected metadata (e.g., affine, filename).

Example enhancement:

             im = test_dataset[0]["image"]
             self.assertIsInstance(im, expected_type)
+            if track_meta and isinstance(im, MetaTensor):
+                self.assertIsNotNone(im.meta.get("filename_or_obj"))
monai/data/dataset.py (1)

446-503: Consider adding support for track_meta and weights_only in CacheNTransDataset.

CacheNTransDataset inherits _cachecheck from PersistentDataset, which uses torch.save/torch.load. Users may want to cache MetaTensors with this dataset type as well.

Add the parameters to the constructor:

 def __init__(
     self,
     data: Sequence,
     transform: Sequence[Callable] | Callable,
     cache_n_trans: int,
     cache_dir: Path | str | None,
     hash_func: Callable[..., bytes] = pickle_hashing,
     pickle_module: str = "pickle",
     pickle_protocol: int = DEFAULT_PROTOCOL,
     hash_transform: Callable[..., bytes] | None = None,
     reset_ops_id: bool = True,
+    track_meta: bool = False,
+    weights_only: bool = True,
 ) -> None:

Then pass them to super:

 super().__init__(
     data=data,
     transform=transform,
     cache_dir=cache_dir,
     hash_func=hash_func,
     pickle_module=pickle_module,
     pickle_protocol=pickle_protocol,
     hash_transform=hash_transform,
     reset_ops_id=reset_ops_id,
+    track_meta=track_meta,
+    weights_only=weights_only,
 )
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between a8a7e57 and 341538d.

📒 Files selected for processing (2)
  • monai/data/dataset.py (5 hunks)
  • tests/data/test_persistentdataset.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.

Files:

  • tests/data/test_persistentdataset.py
  • monai/data/dataset.py
🪛 Ruff (0.14.4)
monai/data/dataset.py

295-298: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: packaging
  • GitHub Check: quick-py3 (macOS-latest)
  • GitHub Check: build-docs
  • GitHub Check: flake8-py3 (codeformat)
  • GitHub Check: quick-py3 (ubuntu-latest)
  • GitHub Check: quick-py3 (windows-latest)
  • GitHub Check: flake8-py3 (mypy)
  • GitHub Check: flake8-py3 (pytype)
  • GitHub Check: min-dep-os (windows-latest)
  • GitHub Check: min-dep-os (macOS-latest)
  • GitHub Check: min-dep-py3 (3.10)
  • GitHub Check: min-dep-os (ubuntu-latest)
  • GitHub Check: min-dep-py3 (3.9)
  • GitHub Check: min-dep-pytorch (2.8.0)
  • GitHub Check: min-dep-py3 (3.12)
  • GitHub Check: min-dep-py3 (3.11)
  • GitHub Check: min-dep-pytorch (2.5.1)
  • GitHub Check: min-dep-pytorch (2.6.0)
  • GitHub Check: min-dep-pytorch (2.7.1)
🔇 Additional comments (7)
tests/data/test_persistentdataset.py (2)

23-23: LGTM!

MetaTensor import is necessary for type assertions in the new test cases.


46-52: LGTM!

Test cases comprehensively cover all combinations of track_meta and weights_only flags, including the invalid combination that should raise ValueError.

monai/data/dataset.py (5)

233-234: LGTM!

New parameters have appropriate defaults that preserve backward compatibility.


269-278: LGTM!

Documentation clearly explains the new parameters and their interaction.


294-300: Validation logic is correct.

The check prevents the invalid combination that would cause cache thrashing. Error message is clear.

Minor: Static analysis suggests defining exception messages as constants or within exception classes, but this is a style preference and can be deferred.


398-398: LGTM!

Correctly propagates weights_only to torch.load.


419-419: LGTM!

Correctly propagates track_meta to convert_to_tensor when writing cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PersistentDataset not usable anymore (v1.5.1) ?

1 participant