[wwb] Add text embeddings pipeline #2787

sbalandi · 2025-10-03T00:43:29Z

Description

Added possibility to run text embeddings pipeline via wwb. Also was added some logic for Qwen/Qwen3-Embedding-0.6B model.
similarity is calculated with torch.nn.functional.cosine_similarity.
embeddings are saving to separate folder as file.npy per generation.
options --embeds_pooling_type, --embeds_normalize, --embeds_padding_side were added.

example to run for BAAI/bge-small-en-v1.5:
wwb.py --base-model BAAI/bge-small-en-v1.5 --model-type text-embedding --gt-data gt_embedds.csv -v --output ./output_embeds/ --embeds_pooling_type mean --embeds_normalize --embeds_padding_side left

example to run for Qwen/Qwen3-Embedding-0.6B (--embeds_pooling_type last_token is important):
wwb.py --base-model Qwen/Qwen3-Embedding-0.6B --model-type text-embedding --gt-data gt_embedds.csv -v --output ./output_embeds/ --embeds_pooling_type last_token --embeds_normalize --embeds_padding_side left

Ticket: CVS-173900

Checklist:

Tests have been updated or added to cover the new code
This patch fully addresses the ticket.
I have made corresponding changes to the documentation

sbalandi requested a review from apaniukov October 3, 2025 00:43

github-actions bot added the category: WWB PR changes WWB label Oct 3, 2025

[wwb] Add text embeddings pipeline

899f1af

sbalandi force-pushed the qwen3_wwb branch from b6274d7 to 899f1af Compare October 3, 2025 00:46

update

26c3943

sbalandi requested a review from as-suvorov October 3, 2025 19:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[wwb] Add text embeddings pipeline #2787

[wwb] Add text embeddings pipeline #2787

sbalandi commented Oct 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

[wwb] Add text embeddings pipeline #2787

Are you sure you want to change the base?

[wwb] Add text embeddings pipeline #2787

Conversation

sbalandi commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist:

Uh oh!

Uh oh!

sbalandi commented Oct 3, 2025 •

edited

Loading