During multi-gpus training, each card will execute the cache text embedding operation once?

```python
text_encoders = [text_encoder_one, text_encoder_two]
tokenizers = [tokenizer_one, tokenizer_two]
train_dataset = get_train_dataset(args, accelerator)
compute_embeddings_fn = functools.partial(
    compute_embeddings,
    text_encoders=text_encoders,
    tokenizers=tokenizers,
    proportion_empty_prompts=args.proportion_empty_prompts,
)
with accelerator.main_process_first():
    train_dataset = train_dataset.map(compute_embeddings_fn, batched=True)
```

Each card will execute the above code during training, which will occupy too much disk space. Is this unreasonable? Or is it possible that I have misunderstood because currently, a fill50k training requires 15*8GB of storage space when using 8 cards.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

During multi-gpus training, each card will execute the cache text embedding operation once? #4089

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

During multi-gpus training, each card will execute the cache text embedding operation once? #4089

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions