Skip to content

Conversation

@jiayin-nvidia
Copy link
Contributor

What does this PR do?

NVIDIA asymmetric embedding models (e.g., nvidia/llama-3.2-nv-embedqa-1b-v2) require an input_type parameter not present in the standard OpenAI embeddings API. This PR adds the input_type="query" as default and updates the documentation to suggest using the embedding API for passage embeddings.

Resolves #2892

Test Plan

pytest -s -v tests/integration/inference/test_openai_embeddings.py   --stack-config="inference=nvidia"   --embedding-model="nvidia/llama-3.2-nv-embedqa-1b-v2"   --env NVIDIA_API_KEY={nvidia_api_key}   --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com"

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 19, 2025
@jiayin-nvidia jiayin-nvidia changed the title fix: openai embedding incompatibility for asymmetric embedding NIMs fix: fix openai_embedding for asymmetric embedding NIMs Aug 19, 2025
@jiayin-nvidia jiayin-nvidia changed the title fix: fix openai_embedding for asymmetric embedding NIMs fix: fix openai_embeddings for asymmetric embedding NIMs Aug 19, 2025
@mattf mattf merged commit 55e9959 into llamastack:main Aug 20, 2025
23 checks passed
franciscojavierarceo added a commit to franciscojavierarceo/llama-stack that referenced this pull request Aug 21, 2025
Signed-off-by: Francisco Javier Arceo <[email protected]>

chore: Enable keyword search for Milvus inline (llamastack#3073)

With milvus-io/milvus-lite#294 - Milvus Lite
supports keyword search using BM25. While introducing keyword search we
had explicitly disabled it for inline milvus. This PR removes the need
for the check, and enables `inline::milvus` for tests.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

Run llama stack with `inline::milvus` enabled:

```
pytest tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes --stack-config=http://localhost:8321 --embedding-model=all-MiniLM-L6-v2 -v
```

```
INFO     2025-08-07 17:06:20,932 tests.integration.conftest:64 tests: Setting DISABLE_CODE_SANDBOX=1 for macOS
=========================================================================================== test session starts ============================================================================================
platform darwin -- Python 3.12.11, pytest-7.4.4, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.12.11', 'Platform': 'macOS-14.7.6-arm64-arm-64bit', 'Packages': {'pytest': '7.4.4', 'pluggy': '1.5.0'}, 'Plugins': {'asyncio': '0.23.8', 'cov': '6.0.0', 'timeout': '2.2.0', 'socket': '0.7.0', 'html': '3.1.1', 'langsmith': '0.3.39', 'anyio': '4.8.0', 'metadata': '3.0.0'}}
rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack
configfile: pyproject.toml
plugins: asyncio-0.23.8, cov-6.0.0, timeout-2.2.0, socket-0.7.0, html-3.1.1, langsmith-0.3.39, anyio-4.8.0, metadata-3.0.0
asyncio: mode=Mode.AUTO
collected 3 items

tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes[None-None-all-MiniLM-L6-v2-None-384-vector] PASSED                                                   [ 33%]
tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes[None-None-all-MiniLM-L6-v2-None-384-keyword] PASSED                                                  [ 66%]
tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes[None-None-all-MiniLM-L6-v2-None-384-hybrid] PASSED                                                   [100%]

============================================================================================ 3 passed in 4.75s =============================================================================================
```

Signed-off-by: Varsha Prasad Narsing <[email protected]>
Co-authored-by: Francisco Arceo <[email protected]>

chore: Fixup main pre commit (llamastack#3204)

build: Bump version to 0.2.18

chore: Faster npm pre-commit (llamastack#3206)

Adds npm to pre-commit.yml installation and caches ui
Removes node installation during pre-commit.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <[email protected]>

chiecking in for tonight, wip moving to agents api

Signed-off-by: Francisco Javier Arceo <[email protected]>

remove log

Signed-off-by: Francisco Javier Arceo <[email protected]>

updated

Signed-off-by: Francisco Javier Arceo <[email protected]>

fix: disable ui-prettier & ui-eslint (llamastack#3207)

chore(pre-commit): add pre-commit hook to enforce llama_stack logger usage (llamastack#3061)

This PR adds a step in pre-commit to enforce using `llama_stack` logger.

Currently, various parts of the code base uses different loggers. As a
custom `llama_stack` logger exist and used in the codebase, it is better
to standardize its utilization.

Signed-off-by: Mustafa Elbehery <[email protected]>
Co-authored-by: Matthew Farrellee <[email protected]>

fix: fix ```openai_embeddings``` for asymmetric embedding NIMs (llamastack#3205)

NVIDIA asymmetric embedding models (e.g.,
`nvidia/llama-3.2-nv-embedqa-1b-v2`) require an `input_type` parameter
not present in the standard OpenAI embeddings API. This PR adds the
`input_type="query"` as default and updates the documentation to suggest
using the `embedding` API for passage embeddings.

<!-- If resolving an issue, uncomment and update the line below -->
Resolves llamastack#2892

```
pytest -s -v tests/integration/inference/test_openai_embeddings.py   --stack-config="inference=nvidia"   --embedding-model="nvidia/llama-3.2-nv-embedqa-1b-v2"   --env NVIDIA_API_KEY={nvidia_api_key}   --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com"
```

cleaning up

Signed-off-by: Francisco Javier Arceo <[email protected]>

updating session manager to cache messages locally

Signed-off-by: Francisco Javier Arceo <[email protected]>

fix linter

Signed-off-by: Francisco Javier Arceo <[email protected]>

more cleanup

Signed-off-by: Francisco Javier Arceo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

openai_embeddings does not support asymmetric embedding models for NVIDIA NIMs

2 participants