Skip to content

Conversation

@jaideepr97
Copy link
Contributor

@jaideepr97 jaideepr97 commented Nov 8, 2025

What does this PR do?

Adds support for enforcing tool usage via responses api. See https://platform.openai.com/docs/api-reference/responses/create#responses_create-tool_choice for details from official documentation.
Note: at present this PR only supports file_search and web_search as options to enforce builtin tool usage

Closes #3548

Test Plan

./scripts/unit-tests.sh tests/unit/providers/agents/meta_reference/test_response_tool_context.py

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 8, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 8, 2025

✱ Stainless preview builds

This PR will update the llama-stack-client SDKs with the following commit message.

feat: add support for tool_choice to repsponses api

Edit this comment to update it. It will appear in the SDK's changelogs.

⚠️ llama-stack-client-node studio · code · diff

There was a regression in your SDK.
generate ⚠️build ✅lint ✅test ✅

npm install https://pkg.stainless.com/s/llama-stack-client-node/581b2bfc85cd9218b7476171de1c5031e0b7d8a5/dist.tar.gz
New diagnostics (5 warning, 7 note)

⚠️ Python/DuplicateDeclaration: We generated two separated types under the same name: `InputOpenAIResponseMessageOutput`. If they are the referring to the same type, they should be extracted to the same ref and be declared as a model. Otherwise, they should be renamed with `x-stainless-naming`
⚠️ Python/DuplicateDeclaration: We generated two separated types under the same name: `InputListOpenAIResponseMessageUnionOpenAIResponseInputFunctionToolCallOutputOpenAIResponseMessageInput`. If they are the referring to the same type, they should be extracted to the same ref and be declared as a model. Otherwise, they should be renamed with `x-stainless-naming`
⚠️ Python/DuplicateDeclaration: We generated two separated types under the same name: `DataOpenAIResponseMessageOutput`. If they are the referring to the same type, they should be extracted to the same ref and be declared as a model. Otherwise, they should be renamed with `x-stainless-naming`
⚠️ Python/NameNotAllowed: Encountered response property `model_type` which may conflict with Pydantic properties.

Pydantic uses model_ as a protected namespace that shouldn't be used for attributes of our own API's models.
Please rename it using the 'renameValue' transform.

⚠️ Python/NameNotAllowed: Encountered response property `model_type` which may conflict with Pydantic properties.

Pydantic uses model_ as a protected namespace that shouldn't be used for attributes of our own API's models.
Please rename it using the 'renameValue' transform.

💡 Model/Recommended: `#/components/schemas/OpenAIResponseInputToolChoiceAllowedTools` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/responses`.
💡 Model/Recommended: `#/components/schemas/OpenAIResponseInputToolChoiceFileSearch` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/responses`.
💡 Model/Recommended: `#/components/schemas/OpenAIResponseInputToolChoiceWebSearch` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/responses`.
💡 Model/Recommended: `#/components/schemas/OpenAIResponseInputToolChoiceFunctionTool` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/responses`.
💡 Model/Recommended: `#/components/schemas/OpenAIResponseInputToolChoiceMCPTool` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/responses`.
llama-stack-client-kotlin studio · code · diff

Your SDK built successfully.
generate ⚠️lint ✅test ❗

New diagnostics (10 note)
💡 Model/Recommended: `#/components/schemas/OpenAIResponseInputToolChoiceAllowedTools` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/responses`.
💡 Model/Recommended: `#/components/schemas/OpenAIResponseInputToolChoiceFileSearch` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/responses`.
💡 Model/Recommended: `#/components/schemas/OpenAIResponseInputToolChoiceWebSearch` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/responses`.
💡 Model/Recommended: `#/components/schemas/OpenAIResponseInputToolChoiceFunctionTool` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/responses`.
💡 Model/Recommended: `#/components/schemas/OpenAIResponseInputToolChoiceMCPTool` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/responses`.
💡 Model/Recommended: `#/components/schemas/OpenAIResponseInputToolChoiceCustomTool` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/responses`.
💡 Go/SchemaUnionDiscriminatorMissing: This union schema has more than one object variant, but no [`discriminator`](https://www.stainless.com/docs/reference/openapi-support#discriminator) property, so deserializing the union may be inefficient or ambiguous.
💡 Java/SchemaUnionDiscriminatorMissing: This union schema has more than one object variant, but no [`discriminator`](https://www.stainless.com/docs/reference/openapi-support#discriminator) property, so deserializing the union may be inefficient or ambiguous.
💡 Java/SchemaUnionDiscriminatorMissing: This union schema has more than one object variant, but no [`discriminator`](https://www.stainless.com/docs/reference/openapi-support#discriminator) property, so deserializing the union may be inefficient or ambiguous.
💡 Java/SchemaUnionDiscriminatorMissing: This union schema has more than one object variant, but no [`discriminator`](https://www.stainless.com/docs/reference/openapi-support#discriminator) property, so deserializing the union may be inefficient or ambiguous.
llama-stack-client-go studio · code · diff

Your SDK built successfully.
generate ⚠️lint ❗test ❗

go get github.com/stainless-sdks/llama-stack-client-go@56702f16003886e559b06f15b1e1ef7e64dce679
New diagnostics (7 note)
💡 Model/Recommended: `#/components/schemas/OpenAIResponseInputToolChoiceAllowedTools` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/responses`.
💡 Model/Recommended: `#/components/schemas/OpenAIResponseInputToolChoiceFileSearch` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/responses`.
💡 Model/Recommended: `#/components/schemas/OpenAIResponseInputToolChoiceWebSearch` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/responses`.
💡 Model/Recommended: `#/components/schemas/OpenAIResponseInputToolChoiceFunctionTool` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/responses`.
💡 Model/Recommended: `#/components/schemas/OpenAIResponseInputToolChoiceMCPTool` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/responses`.
💡 Model/Recommended: `#/components/schemas/OpenAIResponseInputToolChoiceCustomTool` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/responses`.
💡 Go/SchemaUnionDiscriminatorMissing: This union schema has more than one object variant, but no [`discriminator`](https://www.stainless.com/docs/reference/openapi-support#discriminator) property, so deserializing the union may be inefficient or ambiguous.
llama-stack-client-python studio · code · diff

Your SDK built successfully.
generate ⚠️build ⏳lint ⏳test ⏳

New diagnostics (7 note)
💡 Model/Recommended: `#/components/schemas/OpenAIResponseInputToolChoiceAllowedTools` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/responses`.
💡 Model/Recommended: `#/components/schemas/OpenAIResponseInputToolChoiceFileSearch` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/responses`.
💡 Model/Recommended: `#/components/schemas/OpenAIResponseInputToolChoiceWebSearch` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/responses`.
💡 Model/Recommended: `#/components/schemas/OpenAIResponseInputToolChoiceFunctionTool` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/responses`.
💡 Model/Recommended: `#/components/schemas/OpenAIResponseInputToolChoiceMCPTool` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/responses`.
💡 Model/Recommended: `#/components/schemas/OpenAIResponseInputToolChoiceCustomTool` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/responses`.
💡 Go/SchemaUnionDiscriminatorMissing: This union schema has more than one object variant, but no [`discriminator`](https://www.stainless.com/docs/reference/openapi-support#discriminator) property, so deserializing the union may be inefficient or ambiguous.

This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
Last updated: 2025-12-02 14:49:15 UTC

@jaideepr97 jaideepr97 force-pushed the tool-choice branch 4 times, most recently from 9bab29b to 55bd671 Compare November 8, 2025 09:10
@jaideepr97 jaideepr97 changed the title feat: add support for tool_choice to repsponses api feat: add support for tool_choice to responses api Nov 8, 2025
@jaideepr97 jaideepr97 force-pushed the tool-choice branch 2 times, most recently from 3fd6509 to a7e1132 Compare November 10, 2025 16:21
@jaideepr97 jaideepr97 marked this pull request as ready for review November 10, 2025 19:47
Copy link
Contributor

@ashwinb ashwinb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of inline comments. Thanks for this PR!

@mergify
Copy link

mergify bot commented Nov 18, 2025

This pull request has merge conflicts that must be resolved before it can be merged. @jaideepr97 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@jaideepr97 jaideepr97 force-pushed the tool-choice branch 5 times, most recently from 6de7dc7 to 9de2b1f Compare November 22, 2025 13:59
@jaideepr97
Copy link
Contributor Author

jaideepr97 commented Nov 22, 2025

added unit tests but removed integration test from this PR for now since it requires changes in the client to pass
I'm guessing we will need to merge this PR, update the client and then add integration tests in a follow up PR

cc @ashwinb lmk if there is a different way to proceed here

@jaideepr97
Copy link
Contributor Author

Through some anecdotal testing I've been able to reproduce output produced by running queries specifying tool_choices both against openai directly as well as routing through llama stack
Also tested against a locally hosted qwen3 model

Copy link
Collaborator

@cdoern cdoern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few questions/comments. looking good overall!

response_format: OpenAIResponseFormatParam
tool_context: ToolContext | None
responses_tool_choice: OpenAIResponseInputToolChoice | None = None
chat_tool_choice: str | dict[str, Any] | None = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lots of different types here in this union, Is this going to be hard to enforce?

Copy link
Contributor Author

@jaideepr97 jaideepr97 Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't we enforcing the type check by setting this union?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we just did not have chat_tool_choice here in this struct and let it be a local in the working loop, it might be clearer? then you can even make responses_tool_choice be just tool_choice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, updated accordingly

@jaideepr97 jaideepr97 force-pushed the tool-choice branch 4 times, most recently from 1e02307 to 6fe37f7 Compare December 1, 2025 14:26
@jaideepr97 jaideepr97 force-pushed the tool-choice branch 2 times, most recently from 13756c8 to 811e573 Compare December 1, 2025 19:11
@jaideepr97
Copy link
Contributor Author

@ashwinb would you have bandwidth to give this a second look?

@ashwinb
Copy link
Contributor

ashwinb commented Dec 1, 2025

Will review in detail soon. One quick comment: could you update the PR summary and remove the "Closes: " bit. We should close the issue only once we have landed client types and added integration tests.

)
# chat_tool_choice can be str, dict-like object, or None
if isinstance(chat_tool_choice, str | type(None)):
self.ctx.chat_tool_choice = chat_tool_choice
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, why is this mutation to ctx necessary? in general, the "ctx" should be considered an immutable thing which is just a bag of parameters computed initially before hitting the main processing loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, not updating the ctx anymore
though again - this was done following the examples of other fields like chat_tools that are also getting modified and stored in the ctx earlier during the same processing loop

break

n_iter += 1
# After first iteration, reset tool_choice to "auto" to let model decide freely
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like a very model-specific thing baked deeply into the implementation with the API or documentation not making any note of it. does OpenAI talk about it, for example? I don't think we should do this at all since it is very surprising behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah to be honest this particular fix came from claude-4.5-thinking. I wouldn't know to come up with this myself

prior to this change when I tried to query against an openai model via llama stack, I was not getting usable results when i enforced tool_choice. It was ending up in an infinite loop of calling the same function over and over. After this fix was added I was able to get the same quality results from openai as querying it directly
so it seems like an important fix to maintain parity with querying openai directly. I think it's important that llama stack not introduce any performance deterioration when a user wants to query openai through llama stack. Having this fix in didn't seem to impact the result I saw from testing against qwen either, but this was not an exhaustive test by any means

I also undersatnd your concern regarding this so I'm not sure how to proceed here

Copy link
Contributor Author

@jaideepr97 jaideepr97 Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for referece:

here is a tool_choice query I'm making against gpt-4o

respB = client.responses.create(
        model=args.model,
        tools=[
            {
                "type": "mcp",
                "server_label": MCP_LABEL,
                "server_url": MCP_SERVER_URL,
                "require_approval": "never",
            }
        ],
       tool_choice={
            "type": "mcp", "server_label": MCP_LABEL, "name": "namespaces_list",
        },
        input=[{
            "role": "user",
            "content": (
                "List what kubernetes MCP tools you are allowed to use in this context.Tell me something about the cluster. \
                Try to call only the MCP tools that you have access to, and tell me which tools you called. If none are available, explain why."
            )
        }],
    )
    
    pretty_print_result("B: no restriction at the MCP tool (server) level, tool choice is mcp with server label and tool name", respB)

response without this fix:

=== B: no restriction at the MCP tool (server) level, tool choice is mcp with server label and tool name ===
Output text:
 
mcp_call[1]: kubernetes :: namespaces_list :: {}
mcp_call[2]: kubernetes :: namespaces_list :: {}
mcp_call[3]: kubernetes :: namespaces_list :: {}
mcp_call[4]: kubernetes :: namespaces_list :: {}
mcp_call[5]: kubernetes :: namespaces_list :: {}
mcp_call[6]: kubernetes :: namespaces_list :: {}
mcp_call[7]: kubernetes :: namespaces_list :: {}
mcp_call[8]: kubernetes :: namespaces_list :: {}
mcp_call[9]: kubernetes :: namespaces_list :: {}
mcp_call[10]: kubernetes :: namespaces_list :: {}

response with this fix:

=== B: no restriction at the MCP tool (server) level, tool choice is mcp with server label and tool name ===
Output text:
 Here is what I found about the Kubernetes cluster using the tools available:

### Tools Used
1. **Namespaces List**: This tool lists all the Kubernetes namespaces in the current cluster.
2. **Events List**: This tool lists all the Kubernetes events in the current cluster from all namespaces.

### Cluster Information

#### Namespaces
The cluster currently has the following namespaces:
- **default**: Active
- **kube-node-lease**: Active
- **kube-public**: Active
- **kube-system**: Active
- **local-path-storage**: Active

#### Events
Some of the notable events happening in the cluster include:
- **Node kind-control-plane**: 
  - Kubelet started successfully.
  - Warning about failed node allocatable limits update.
  - Node is ready with sufficient memory, no disk pressure, and sufficient PID.
  - Registered the node successfully.
  
- **Pod Events**:
  - Pods in the kube-system namespace like `coredns` and `kube-proxy` have been scheduled, container images pulled successfully, and started without issues.
  - There are some scheduling warnings due to untolerated taints.
  
- **Resource Management Events**:
  - Controller manager and scheduler leadership has been successfully maintained.

These observations reflect a running cluster with active namespace management and event tracking. This combination ensures efficient operations and identification of potential issues for administrative action.
mcp_call[1]: kubernetes :: namespaces_list :: {}
mcp_call[2]: kubernetes :: events_list :: {}

self.mcp_tool_to_server[t.name] = mcp_tool

# Add to reverse mapping for efficient server_label lookup
if mcp_tool.server_label not in self.server_label_to_tools:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is more of a lazy question: when does it happen that the initial dict is not sufficient to cover mcp_tool -- i.e., you are seeing a new tool during the loop? maybe I am just confused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by initial dict are you maybe referring to mcp_tool_to_server? cause this line is populating server_label_to_tools which is a separate reverse mapping so that it is easy to look up all the tools associated with a given server

I added this construct because it is required to able to quickly lookup all tools associated with a given mcp server_label to process the mcp tool choice, and it seemed easier to just build and maintain this reverse mapping as we see and process new tools rather than build it on the fly out of mcp_tool_to_server each time which could get computationally expensive

Copy link
Contributor

@ashwinb ashwinb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking much better thank you for the iteration @jaideepr97

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Responses Tool Choice

3 participants