Skip to content

During multi-gpus training, each card will execute the cache text embedding operation once? #4089

@pixeli99

Description

@pixeli99
text_encoders = [text_encoder_one, text_encoder_two]
tokenizers = [tokenizer_one, tokenizer_two]
train_dataset = get_train_dataset(args, accelerator)
compute_embeddings_fn = functools.partial(
    compute_embeddings,
    text_encoders=text_encoders,
    tokenizers=tokenizers,
    proportion_empty_prompts=args.proportion_empty_prompts,
)
with accelerator.main_process_first():
    train_dataset = train_dataset.map(compute_embeddings_fn, batched=True)

Each card will execute the above code during training, which will occupy too much disk space. Is this unreasonable? Or is it possible that I have misunderstood because currently, a fill50k training requires 15*8GB of storage space when using 8 cards.

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleIssues that haven't received updates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions