-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Flatten allOf properties for OpenAI compatibility #3451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d7e0e18 to
47707cd
Compare
@pulphix Can you please share examples of schemas that currently result in API errors, that this PR fixes? If this only applies to OpenAI-"compatible" APIs and not OpenAI itself, we'll need to make sure it only applies to those APIs. |
|
|
||
| def transform(self, schema: JsonSchema) -> JsonSchema: | ||
| # Flatten object-only allOf to improve compatibility | ||
| schema = flatten_allof(schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not touch the Google transformer as we have a lot of changes underway in #3357.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed Change
|
|
||
| def transform(self, schema: JsonSchema) -> JsonSchema: # noqa C901 | ||
| # First, flatten object-only allOf constructs to avoid unsupported combinators in strict mode | ||
| schema = flatten_allof(schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be implemented on JsonSchemaTransformer with a keyword argument we can pass in __init__, similar to existing transformation flags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
tests/models/test_openai.py
Outdated
| transformer = OpenAIJsonSchemaTransformer(schema, strict=True) | ||
|
|
||
| # Mock super().walk() to return a result without $defs to test the fallback path | ||
| with patch.object(transformer.__class__.__bases__[0], 'walk', return_value={'$ref': '#/$defs/MyModel'}): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is a bit unclear to me and transformer.__class__.__bases__[0] feels a bit brittle. What scenario are we testing exactly? Can't we get there without patching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need a separate test file here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| ], | ||
| } | ||
|
|
||
| transformer = transformer_factory(schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unnecessary as we only test one transformer :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| transformed = transformer.walk() | ||
|
|
||
| # allOf should have been flattened by the transformer | ||
| assert 'allOf' not in transformed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use == snapshot() as much as possible so we get to see the exact result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
tests/test_json_schema_flattener.py
Outdated
| ], | ||
| } | ||
|
|
||
| flattened = flatten_allof(copy.deepcopy(schema)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to not use the function directly, only the transformer classes.
And as mentioned above: please use a single snapshot() assertion (using the inline-snapshot library) rather than many asserts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
I'm trying to integrate with notion MCP and getting the following error: {
"type": "function",
"function": {
"name": "notion__notion-update-page",
"description": "## Overview\nUpdate a Notion page's properties or content.\n## Properties\nNotion page properties are a JSON map of property names to SQLite values.\nFor pages in a database:\n- ALWAYS use the \"fetch\" tool first to get the data source schema and the\texact property names.\n- Provide a non-null value to update a property's value.\n- Omitted properties are left unchanged.\n\n**IMPORTANT**: Some property types require expanded formats:\n- Date properties: Split into \"date:{property}:start\", \"date:{property}:end\" (optional), and \"date:{property}:is_datetime\" (0 or 1)\n- Place properties: Split into \"place:{property}:name\", \"place:{property}:address\", \"place:{property}:latitude\", \"place:{property}:longitude\", and \"place:{property}:google_place_id\" (optional)\n- Number properties: Use JavaScript numbers (not strings)\n- Checkbox properties: Use \"__YES__\" for checked, \"__NO__\" for unchecked\n\n**Special property naming**: Properties named \"id\" or \"url\" (case insensitive) must be prefixed with \"userDefined:\" (e.g., \"userDefined:URL\", \"userDefined:id\")\nFor pages outside of a database:\n- The only allowed property is \"title\",\twhich is the title of the page in inline markdown format.\n\n## Content\nNotion page content is a string in Notion-flavored Markdown format. See the \"create-pages\" tool description for the full enhanced Markdown spec.\nBefore updating a page's content with this tool, use the \"fetch\" tool first to get the existing content to find out the Markdown snippets to use in the \"replace_content_range\" or \"insert_content_after\" commands.\n## Examples\n\t\t<example description=\"Update page properties\">\n\t\t{\n\t\t\t\"page_id\": \"f336d0bc-b841-465b-8045-024475c079dd\",\n\t\t\t\"command\": \"update_properties\",\n\t\t\t\"properties\": {\n\t\t\t\t\"title\": \"New Page Title\",\n\t\t\t\t\"status\": \"In Progress\",\n\t\t\t\t\"priority\": 5,\n\t\t\t\t\"checkbox\": \"__YES__\",\n\t\t\t\t\"date:deadline:start\": \"2024-12-25\",\n\t\t\t\t\"date:deadline:is_datetime\": 0,\n\t\t\t\t\"place:office:name\": \"HQ\",\n\t\t\t\t\"place:office:latitude\": 37.7749,\n\t\t\t\t\"place:office:longitude\": -122.4194\n\t\t\t}\n\t\t}\n\t\t</example>\n\t\t<example description=\"Replace the entire content of a page\">\n\t\t{\n\t\t\t\"page_id\": \"f336d0bc-b841-465b-8045-024475c079dd\",\n\t\t\t\"command\": \"replace_content\",\n\t\t\t\"new_str\": \"# New Section\nUpdated content goes here\"\n\t\t}\n\t\t</example>\n\t\t<example description=\"Replace specific content in a page\">\n\t\t{\n\t\t\t\"page_id\": \"f336d0bc-b841-465b-8045-024475c079dd\",\n\t\t\t\"command\": \"replace_content_range\",\n\t\t\t\"selection_with_ellipsis\": \"# Old Section...end of section\",\n\t\t\t\"new_str\": \"# New Section\nUpdated content goes here\"\n\t\t}\n\t\t</example>\n\t\t<example description=\"Insert content after specific text\">\n\t\t{\n\t\t\t\"page_id\": \"f336d0bc-b841-465b-8045-024475c079dd\",\n\t\t\t\"command\": \"insert_content_after\",\n\t\t\t\"selection_with_ellipsis\": \"## Previous section...\",\n\t\t\t\"new_str\": \"\n## New Section\nContent to insert goes here\"\n\t\t}\n\t\t</example>\n**Note**: For selection_with_ellipsis, provide only the first ~10 characters, an ellipsis, and the last ~10 characters. Ensure the selection is unique; use longer snippets if needed to avoid ambiguity.",
"parameters": {
"type": "object",
"properties": {
"data": {
"allOf": [
{
"type": "object",
"properties": {
"page_id": {
"type": "string",
"description": "The ID of the page to update, with or without dashes."
}
},
"required": [
"page_id"
]
},
{
"anyOf": [
{
"type": "object",
"properties": {
"command": {
"type": "string",
"enum": [
"update_properties"
]
},
"properties": {
"type": "object",
"additionalProperties": {
"type": [
"string",
"number",
"null"
]
},
"description": "A JSON object that updates the page's properties. For pages in a database, use the SQLite schema definition shown in <database>. For pages outside of a database, the only allowed property is \"title\", which is the title of the page in inline markdown format. Use null to remove a property's value."
}
},
"required": [
"command",
"properties"
],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"command": {
"type": "string",
"enum": [
"replace_content"
]
},
"new_str": {
"type": "string",
"description": "The new string to replace all content with."
}
},
"required": [
"command",
"new_str"
],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"command": {
"type": "string",
"enum": [
"replace_content_range"
]
},
"selection_with_ellipsis": {
"type": "string",
"description": "Unique start and end snippet of the string to replace in the page content, including whitespace. DO NOT provide the entire string to replace. Instead, provide up to the first ~10 characters of the string to replace, an ellipsis, and then up to the last ~10 characters of the string to replace. Make sure you provide enough of the start and end snippet to uniquely identify the string to replace. For example, to replace an entire section, use \"old_start_and_end_snippet\":\"# Section heading...last paragraph.\""
},
"new_str": {
"type": "string",
"description": "The new string to replace the old string with."
}
},
"required": [
"command",
"selection_with_ellipsis",
"new_str"
],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"command": {
"type": "string",
"enum": [
"insert_content_after"
]
},
"selection_with_ellipsis": {
"type": "string",
"description": "Unique start and end snippet of the string to match in the page content, including whitespace. DO NOT provide the entire string to match. Instead, provide up to the first ~10 characters of the string to match, an ellipsis, and then up to the last ~10 characters of the string to match. Make sure you provide enough of the start and end snippet to uniquely identify the string to match. For example, to match an entire section, use \"selection_with_ellipsis\":\"# Section heading...last paragraph.\""
},
"new_str": {
"type": "string",
"description": "The new content to insert."
}
},
"required": [
"command",
"selection_with_ellipsis",
"new_str"
],
"additionalProperties": false
}
]
}
],
"description": "The data required for updating a page"
}
},
"required": [
"data"
],
"additionalProperties": false
},
"strict": true
}
} |
eb160dc to
cea2b10
Compare
|
|
||
| JsonSchema = dict[str, Any] | ||
|
|
||
| __all__ = ['JsonSchemaTransformer', 'InlineDefsJsonSchemaTransformer', 'flatten_allof'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make flatten_allof private and not export it
| return s | ||
|
|
||
|
|
||
| def flatten_allof(schema: JsonSchema) -> JsonSchema: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may not need this method at all if the JSON transformer can call _recurse_flatten_allof directly
|
|
||
| # The transformer should resolve the $ref and use the transformed schema from $defs | ||
| # (not the original self.defs, which was the bug we fixed) | ||
| assert isinstance(result, dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use snapshot, I want to see the entire result
| # Simulate edge case: super().walk() returns $defs without root_key | ||
| # This shouldn't happen in normal flow, but we test the fallback path | ||
| with patch.object( | ||
| transformer.__class__.__bases__[0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel comfortable with this; if we can't come up with a real schema that gets us to that line, do we need it at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DouweM that was the old code before my change so I kept it as a fallback.
If you think is not anymore necessary we can remove the fallback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pulphix If I understand the new code correctly, if a ref was in self.defs previously, it's now guaranteed to be in result['$defs'], so the change should be fine without the fallback. Although we could keep the or {} that we had previously just to be sure.
We use self.defs in a few more places though, so if we now can't rely on that anymore, should we also update those other uses? Or actually set self.defs to the updated value after flattening? It could be worth verifying the interaction between the new flag and prefer_inlined_defs or recursive_defs.
| 'a': {'type': 'string'}, | ||
| 'b': {'type': 'integer'}, | ||
| }, | ||
| 'additionalProperties': True, # When any member has dict additionalProperties, result should be True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that true? If I understand the allOf schema correctly, it would only accept an object with a b key and no other keys, or an empty object.
| { | ||
| 'type': 'object', | ||
| # No properties, required, or patternProperties | ||
| 'additionalProperties': False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this imply that no properties are allowed at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DouweM
Correct, I think is not a good test, I'll change it with a more realistic one
| # Simulate edge case: super().walk() returns $defs without root_key | ||
| # This shouldn't happen in normal flow, but we test the fallback path | ||
| with patch.object( | ||
| transformer.__class__.__bases__[0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pulphix If I understand the new code correctly, if a ref was in self.defs previously, it's now guaranteed to be in result['$defs'], so the change should be fine without the fallback. Although we could keep the or {} that we had previously just to be sure.
We use self.defs in a few more places though, so if we now can't rely on that anymore, should we also update those other uses? Or actually set self.defs to the updated value after flattening? It could be worth verifying the interaction between the new flag and prefer_inlined_defs or recursive_defs.
Summary
This PR adds an
allOfflattener for JSON Schemas used by PydanticAI tools and structured outputs, and keeps the repo compliant withcontributing.md(tests + pre-commit all green).Motivation
Many OpenAI(-compatible) providers, especially in strict structured-output mode, expect a flat object schema and either reject or mis-handle JSON Schema combinators like
allOf/oneOf/anyOfin tool/function parameters. Without flattening, perfectly valid schemas generated by Pydantic can cause API errors or unstable model behavior.What this PR does
allOf flattener for object schemas
In
_json_schema.py, we add logic to rewriteallOfbetween object schemas into a singletype: "object"schema that is semantically equivalent in the supported cases:propertiesrequiredpatternPropertiesadditionalPropertiesconservatively:falseonly if all members haveadditionalProperties: falseAdditional behavior:
allOfcontains non-object members or obviously conflicting structures, the flattener leaves it untouched to avoid changing semantics.Provider integration
OpenAIJsonSchemaTransformernow applies the flattener at the start oftransform, so schemas passed to OpenAI (and OpenAI-compatible providers) no longer contain object-levelallOfbefore strict-mode adjustments.GoogleJsonSchemaTransformerdoes the same before applying Gemini-specific schema tweaks (inlining$defs, removing unsupported keywords, etc.).Testing
allOfallOfcontains non-object members or conflicting structurespre-commithooks run clean, keeping the repo compliant withcontributing.md.