Skip to content

RuntimeError: Placeholder storage has not been allocated on MPS device! #18090

@InfernalAzazel

Description

@InfernalAzazel

Bug description

There was an exception when I ran the official code.

What version are you seeing the problem on?

v2.0

How to reproduce the bug

# main.py
# ! pip install torchvision
import torch, torch.nn as nn, torch.utils.data as data, torchvision as tv, torch.nn.functional as F
import lightning as L
from torch import Tensor

# --------------------------------
# Step 1: Define a LightningModule
# --------------------------------
# A LightningModule (nn.Module subclass) defines a full *system*
# (ie: an LLM, diffusion model, autoencoder, or simple image classifier).
# define any number of nn.Modules (or use your current ones)
encoder = nn.Sequential(nn.Linear(28 * 28, 128), nn.ReLU(), nn.Linear(128, 3))
decoder = nn.Sequential(nn.Linear(3, 128), nn.ReLU(), nn.Linear(128, 28 * 28))


class LitAutoEncoder(L.LightningModule):
    def __init__(self):
        super().__init__()
        self.encoder = encoder
        self.decoder = decoder

    def forward(self, x):
        # in lightning, forward defines the prediction/inference actions
        embedding = self.encoder(x)
        return embedding

    def training_step(self, batch, batch_idx):
        # training_step defines the train loop. It is independent of forward
        x, y = batch
        x = x.view(x.size(0), -1)
        z = self.encoder(x)
        x_hat = self.decoder(z)
        loss = F.mse_loss(x_hat, x)
        self.log("train_loss", loss)
        return loss

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
        return optimizer


# -------------------
# Step 2: Define data
# -------------------
dataset = tv.datasets.MNIST(".", download=True, transform=tv.transforms.ToTensor())
train, val = data.random_split(dataset, [55000, 5000])

# -------------------
# Step 3: Train
# -------------------
autoencoder = LitAutoEncoder()
trainer = L.Trainer(limit_train_batches=100, max_epochs=1)
trainer.fit(autoencoder, data.DataLoader(train), data.DataLoader(val))

checkpoint = "./lightning_logs/version_0/checkpoints/epoch=0-step=100.ckpt"
autoencoder = LitAutoEncoder.load_from_checkpoint(checkpoint, encoder=encoder, decoder=decoder)

# choose your trained nn.Module
encoder = autoencoder.encoder
encoder.eval()

# embed 4 fake images!
fake_image_batch = Tensor(4, 28 * 28)
embeddings = encoder(fake_image_batch)
print("⚡" * 20, "\nPredictions (4 image embeddings):\n", embeddings, "\n", "⚡" * 20)

Error messages and logs

/Users/kylin/Library/Caches/pypoetry/virtualenvs/lightning-demo-01-7el2zk6k-py3.10/bin/python /Users/kylin/Documents/code/github/lightning_demo_01/main.py 
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/Users/kylin/Library/Caches/pypoetry/virtualenvs/lightning-demo-01-7el2zk6k-py3.10/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/logger_connector/logger_connector.py:67: UserWarning: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `lightning.pytorch` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
  warning_cache.warn(
/Users/kylin/Library/Caches/pypoetry/virtualenvs/lightning-demo-01-7el2zk6k-py3.10/lib/python3.10/site-packages/lightning/pytorch/trainer/configuration_validator.py:68: UserWarning: You passed in a `val_dataloader` but have no `validation_step`. Skipping val loop.
  rank_zero_warn("You passed in a `val_dataloader` but have no `validation_step`. Skipping val loop.")

  | Name    | Type       | Params
---------------------------------------
0 | encoder | Sequential | 100 K 
1 | decoder | Sequential | 101 K 
---------------------------------------
202 K     Trainable params
0         Non-trainable params
202 K     Total params
0.810     Total estimated model params size (MB)
/Users/kylin/Library/Caches/pypoetry/virtualenvs/lightning-demo-01-7el2zk6k-py3.10/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:432: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 12 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
Epoch 0: 100%|██████████| 100/100 [00:00<00:00, 194.97it/s, v_num=1]
`Trainer.fit` stopped: `max_epochs=1` reached.
Traceback (most recent call last):
  File "/Users/kylin/Documents/code/github/lightning_demo_01/main.py", line 65, in <module>
    embeddings = encoder(fake_image_batch)
  File "/Users/kylin/Library/Caches/pypoetry/virtualenvs/lightning-demo-01-7el2zk6k-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/kylin/Library/Caches/pypoetry/virtualenvs/lightning-demo-01-7el2zk6k-py3.10/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
    input = module(input)
  File "/Users/kylin/Library/Caches/pypoetry/virtualenvs/lightning-demo-01-7el2zk6k-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/kylin/Library/Caches/pypoetry/virtualenvs/lightning-demo-01-7el2zk6k-py3.10/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: Placeholder storage has not been allocated on MPS device!

Process finished with exit code 1

Environment

Current environment
  • CUDA:
    • GPU: None
    • available: False
    • version: None
  • Lightning:
    • lightning: 2.0.5
    • lightning-cloud: 0.5.37
    • lightning-utilities: 0.9.0
    • pytorch-lightning: 2.0.5
    • torch: 2.0.1
    • torchmetrics: 1.0.1
    • torchvision: 0.15.2
  • Packages:
    • aiohttp: 3.8.4
    • aiosignal: 1.3.1
    • anyio: 3.7.1
    • arrow: 1.2.3
    • async-timeout: 4.0.2
    • attrs: 23.1.0
    • backoff: 2.2.1
    • beautifulsoup4: 4.12.2
    • blessed: 1.20.0
    • certifi: 2023.5.7
    • charset-normalizer: 3.2.0
    • click: 8.1.5
    • croniter: 1.4.1
    • dateutils: 0.6.12
    • deepdiff: 6.3.1
    • encoder: 1.1
    • exceptiongroup: 1.1.2
    • fastapi: 0.100.0
    • filelock: 3.12.2
    • frozenlist: 1.4.0
    • fsspec: 2023.6.0
    • h11: 0.14.0
    • idna: 3.4
    • inquirer: 3.1.3
    • itsdangerous: 2.1.2
    • jinja2: 3.1.2
    • lightning: 2.0.5
    • lightning-cloud: 0.5.37
    • lightning-utilities: 0.9.0
    • markdown-it-py: 3.0.0
    • markupsafe: 2.1.3
    • mdurl: 0.1.2
    • mpmath: 1.3.0
    • multidict: 6.0.4
    • networkx: 3.1
    • numpy: 1.25.1
    • ordered-set: 4.1.0
    • packaging: 23.1
    • pillow: 10.0.0
    • pip: 23.1.2
    • psutil: 5.9.5
    • pydantic: 1.10.11
    • pygments: 2.15.1
    • pyjwt: 2.7.0
    • python-dateutil: 2.8.2
    • python-editor: 1.0.4
    • python-multipart: 0.0.6
    • pytorch-lightning: 2.0.5
    • pytz: 2023.3
    • pyyaml: 6.0
    • readchar: 4.0.5
    • requests: 2.31.0
    • rich: 13.4.2
    • setuptools: 68.0.0
    • six: 1.16.0
    • sniffio: 1.3.0
    • soupsieve: 2.4.1
    • starlette: 0.27.0
    • starsessions: 1.3.0
    • sympy: 1.12
    • torch: 2.0.1
    • torchmetrics: 1.0.1
    • torchvision: 0.15.2
    • tqdm: 4.65.0
    • traitlets: 5.9.0
    • typing-extensions: 4.7.1
    • urllib3: 2.0.3
    • uvicorn: 0.23.0
    • wcwidth: 0.2.6
    • websocket-client: 1.6.1
    • websockets: 11.0.3
    • wheel: 0.40.0
    • yarl: 1.9.2
  • System:
    • OS: Darwin
    • architecture:
      • 64bit
    • processor: arm
    • python: 3.10.12
    • release: 22.5.0
    • version: Darwin Kernel Version 22.5.0: Thu Jun 8 22:22:23 PDT 2023; root:xnu-8796.121.3~7/RELEASE_ARM64_T6020

More info

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions