simplify embedding + first transformer block TP #314

wanchaol · 2024-05-07T21:16:51Z

as titled, we can directly specify the rowwise parallel embedding output layouts be shard on sequence dim, so that we don't need the first layer prepare input.

Switching to output_layouts = Shard(1) would also trigger reduce_scatter instead of allreduce for embedding layer, which could give some small perf wins

as titled, we can directly specify the rowwise parallel embedding output layouts be shard on sequence dim, so that we don't need the first layer prepare input. Switching to output_layouts = Shard(1) would also trigger reduce_scatter instead of allreduce for embedding layer, which could give some small perf wins

tianyu-l

nice!

bdhirsh · 2024-05-08T13:06:29Z

cool :)

…first transformer block" Following changes in pytorch/torchtitan#314, to apply a reduce-scatter instead of the more expensive all-reduce + local chunk. cross PR with pytorch/tutorials#2871 [ghstack-poisoned]

awgu · 2024-06-29T15:36:03Z

Reminder to self: We should update the comment and remove the # 3. Shard the first transformer block's inputs.

as titled, we can directly specify the rowwise parallel embedding output layouts be shard on sequence dim, so that we don't need the first layer prepare input. Switching to output_layouts = Shard(1) would also trigger reduce_scatter instead of allreduce for embedding layer, which could give some small perf wins

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 7, 2024

wanchaol requested review from awgu and tianyu-l May 7, 2024 21:17

revert toml changes

0b9a3a7

wanchaol requested a review from bdhirsh May 7, 2024 21:18

tianyu-l approved these changes May 8, 2024

View reviewed changes

wanchaol merged commit f5a3ad7 into main May 8, 2024

tianyu-l mentioned this pull request May 16, 2024

[Tensor Parallel] update examples to simplify embedding + first transformer block pytorch/examples#1259

Merged

tianyu-l mentioned this pull request Jun 25, 2024

remove folding and unfolding of batch dim and sequence dim in model.py #190

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

simplify embedding + first transformer block TP #314

simplify embedding + first transformer block TP #314

Uh oh!

wanchaol commented May 7, 2024

Uh oh!

tianyu-l left a comment

Uh oh!

bdhirsh commented May 8, 2024

Uh oh!

awgu commented Jun 29, 2024

Uh oh!

Uh oh!

simplify embedding + first transformer block TP #314

simplify embedding + first transformer block TP #314

Uh oh!

Conversation

wanchaol commented May 7, 2024

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

bdhirsh commented May 8, 2024

Uh oh!

awgu commented Jun 29, 2024

Uh oh!

Uh oh!