Skip to content
This repository was archived by the owner on Jul 16, 2025. It is now read-only.

feat: Add Gemini Embeddings #347

Merged
merged 1 commit into from
Jun 27, 2025
Merged

Conversation

valtzu
Copy link
Contributor

@valtzu valtzu commented Jun 26, 2025

Add Embeddings support for Google Gemini.

Only batch embedding implemented.

Related to #28

@valtzu valtzu force-pushed the gemini-embeddings branch from 0ab04bb to 46306d7 Compare June 26, 2025 19:38
@valtzu
Copy link
Contributor Author

valtzu commented Jun 26, 2025

Moved extra parameters to model's $options since those are mostly model-specific.


Also noticed that we probably need to parameterize Stores and pass $dimensions there, because trying to use those fixed-768-dimension models with 1536-dimension vector in MariaDB store fails with 👇

Incorrect vector value: 'I\x16\xD7\xBC\x0F\xDB,\xBC\xFF\x0F\xA5<\x8C\xFE\x95\xBD~#^\xBC;\x84'\xBC\x11\xFD\x00<[\x82\x15<\xD5\x9A\x0A="\x12R;\x0A\x10~<...' for column `my_database`.`my_table`.`embedding`

But that'll of course be a separate PR.

public function __construct(string $name = self::GEMINI_EMBEDDING_EXP_03_07, array $options = [])
{
if (self::GEMINI_EMBEDDING_EXP_03_07 === $name) {
$options['dimensions'] ??= 1536;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a big fan of rebuilding the api validation, if it changes we need to keep up.

What happens if you send a wrong dimensions value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really about api validation, but about the fact that the Store implementations (at least MariaDB) have hardcoded dimensions currently and thus do not work with default settings.

This can probably be removed after MariaDB store will have dimensions in $options of initialize()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge the options PR first then 👍🏻

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☝️ #348

I am not a big fan of rebuilding the api validation

Should I remove TaskType enum then also?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me the enum is ok, WDYT @chr-hertel ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, i also Like the idea of providing that - nice for DX.
We only should take into account not to block users if there are new task types released by Google.
But the implementation still support plain string for that option instead an enum?
With huggingface i went with an interface:
https://github.com/php-llm/llm-chain/blob/main/src/Platform/Bridge/HuggingFace/Task.php instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks – I kept the enum around – and yes, string is accepted too so we're not restricting future model/parameter usage.

Also rebased after #348, I think this PR is good to go now 🚀

public const EMBEDDING_001 = 'embedding-001';

/**
* @param array{dimensions?: int, task_type?: TaskType|string, title?: string} $options
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use title somewhere? If not I would remove it from the array shape

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think it'd be useful for the user of this library, like in the example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I misread your comment, I thought it was about declaring params here with array shape in general

Yes, title is used, however, I think you are right that it does not belong to model options since a single model instance is probably intended for several documents

I'll take it from ModelClient::request()'s $options instead

@valtzu valtzu force-pushed the gemini-embeddings branch 2 times, most recently from 614fbc1 to a982687 Compare June 26, 2025 20:51
@valtzu valtzu force-pushed the gemini-embeddings branch from a982687 to 034da76 Compare June 27, 2025 15:09
@valtzu valtzu force-pushed the gemini-embeddings branch from 034da76 to e3eb6fc Compare June 27, 2025 15:10
Copy link
Member

@chr-hertel chr-hertel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @valtzu - great addition :)

@chr-hertel chr-hertel merged commit 74647ec into php-llm:main Jun 27, 2025
7 checks passed
OskarStark added a commit to symfony/ai that referenced this pull request Jul 1, 2025
This PR was merged into the main branch.

Discussion
----------

feat: add Gemini Embeddings

| Q             | A
| ------------- | ---
| Bug fix?      | no
| New feature?  | yes
| Docs?         |
| Issues        |
| License       | MIT

Cherry picking php-llm/llm-chain#347

> Add Embeddings support for Google Gemini.
>
> Only batch embedding implemented.
>
> Related to #28

Commits
-------

17e25d1 feat: add Gemini Embeddings (#347)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants