SafeTensors serialization for PyTorch models #4163

kunigori · 2025-11-11T04:10:42Z

Adds SafeTensors-based serialization for PyTorch models (addresses #2532) and
implements metadata-driven loading to integrate cleanly with the materializer
workflow (per @bcdurak's feedback).

Changes

✅ Add safetensors optional extra in pyproject.toml
✅ Save state_dict to .safetensors when available; fallback to .pt with warning
✅ Write minimal metadata.json (class_path, serialization_format)
✅ Use TemporaryDirectory + copy_dir() for remote stores
✅ load() always returns nn.Module
✅ Backward compat: supports weights.pt, checkpoint.pt, and legacy entire_model.pt

New artifact layout

artifact_uri/
├─ weights.safetensors   # or weights.pt on fallback
└─ metadata.json         # class_path + format

Metadata

{
  "class_path": "my_package.models.MyModel",
  "serialization_format": "safetensors",
  "init_args": [],
  "init_kwargs": {},
  "factory_path": null
}

Why SafeTensors?

Security: Avoids pickle-based code execution risks
Performance: Faster, memory-mapped weight loads
Compatibility: Works with S3/GCS/Azure via artifact stores

Tests

Local run:

pytest tests/unit/integrations/pytorch/materializers/test_pytorch_module_materializer.py -v
# 4 passed in 1.88s

Coverage:

Round-trip with safetensors
Pickle fallback path
Metadata-driven load
Legacy formats (weights.pt, checkpoint.pt, entire_model.pt)
Clear error when safetensors extra is missing at load

Known limitations (Phase 1)

Zero-argument __init__() requirement: Models needing config should use
a factory method (planned for Phase 2)
Legacy artifacts without metadata (weights.pt / checkpoint.pt) require:

  model = materializer.load(data_type=MyModel)

Legacy entire_model.pt is loaded and returned as a Module directly
(no data_type needed)

Documentation

Happy to add a short guide covering why/how/limits/troubleshooting.
Which file should I update?

docs/book/component-guide/materializers/pytorch.md (materializer behavior)?
docs/book/integration-guide/pytorch.md (integration landing)?

Or would you prefer a new section?

Future work (separate PRs)

Phase 2: Support init_args / init_kwargs / factory functions
Phase 3: PyTorch Lightning materializer
Phase 4: HuggingFace Transformers support

Checklist

Tests pass locally
Code formatted (ruff check --fix + ruff format)
Also ran project scripts: bash scripts/format.sh and bash scripts/lint.sh
Type hints added (mypy clean)
Backward compatibility maintained
Rebased on develop
Documentation updated (pending guidance on location)
CLA signed

review-notebook-app · 2025-11-11T04:10:49Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

CLAassistant · 2025-11-11T04:10:53Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

yusuke kunimitsu seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

schustmi · 2025-11-11T04:18:35Z

Hey @kunigori, thanks for the PR! Can you please base your changes on the develop branch and then also change the target of this PR.

Add safetensors support for PyTorch

351d4d9

kunigori changed the base branch from main to develop November 11, 2025 04:18

kunigori mentioned this pull request Nov 11, 2025

Integrate safetensors for model serialization #2532

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SafeTensors serialization for PyTorch models #4163

SafeTensors serialization for PyTorch models #4163

Uh oh!

kunigori commented Nov 11, 2025

Uh oh!

review-notebook-app bot commented Nov 11, 2025

Uh oh!

CLAassistant commented Nov 11, 2025 •

edited

Loading

Uh oh!

schustmi commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SafeTensors serialization for PyTorch models #4163

Are you sure you want to change the base?

SafeTensors serialization for PyTorch models #4163

Uh oh!

Conversation

kunigori commented Nov 11, 2025

Changes

New artifact layout

Metadata

Why SafeTensors?

Tests

Known limitations (Phase 1)

Documentation

Future work (separate PRs)

Checklist

Uh oh!

review-notebook-app bot commented Nov 11, 2025

Uh oh!

CLAassistant commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

schustmi commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Nov 11, 2025 •

edited

Loading