Skip to content

[distributed] NotImplementedError: Operator aten._scaled_dot_product_fused_attention_overrideable.default does not have a sharding strategy registered. #1556

@PenghuiCheng

Description

@PenghuiCheng

🚀 The feature, motivation and pitch

Error:
NotImplementedError: Operator aten._scaled_dot_product_fused_attention_overrideable.default does not have a sharding strategy registered.

cases:
test/distributed/tensor/parallel/test_tp_examples.py::DistTensorParallelExampleTest::test_transformer_req_grad_seq_parallel_float32_thaw_norm__output

tp_examples.log

Alternatives

Operator aten._scaled_dot_product_fused_attention_overrideable.default does not have a sharding strategy registered.

Additional context

No response

Metadata

Metadata

Assignees

Labels

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions