From deebfd06403195b449ec90e5b02e7a05f26f2bd6 Mon Sep 17 00:00:00 2001 From: "Henry H. Andrews" Date: Wed, 22 May 2024 15:24:19 -0700 Subject: [PATCH 1/5] Appendix on converting data types to strings (3.0.4) It's very unclear how numbers, booleans, and other non-UTF-8-string values are converted to strings, particularly for the form media types. This adds a brief appendix that acknowledges the lack of standardization, and points to resources for the few cases that do have specifications. It highlights concerns with relying on certain JSON Schema keywords or values for serialization, and suggests defining schemas of type string and requiring applications to perform the conversion prior to schema validation as a way to control the results. This also clarifies that schema validation occurs before serialization. --- versions/3.0.4.md | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/versions/3.0.4.md b/versions/3.0.4.md index dc61a1a941..f667d093c2 100644 --- a/versions/3.0.4.md +++ b/versions/3.0.4.md @@ -1049,6 +1049,7 @@ There are four possible parameter locations specified by the `in` field: The rules for serialization of the parameter are specified in one of two ways. Parameter Objects MUST include either a `content` field or a `schema` field, but not both. +See [Appendix B](#dataTypeConversion) for a discussion of converting values of various types to string representations. ###### Common Fixed Fields @@ -1618,6 +1619,7 @@ An `encoding` attribute is introduced to give you control over the serialization #### Encoding Object A single encoding definition applied to a single schema property. +See [Appendix B](#dataTypeConversion) for a discussion of converting values of various types to string representations. ##### Fixed Fields Field Name | Type | Description @@ -3700,6 +3702,29 @@ Version | Date | Notes ## Appendix B: Data Type Conversion +Serializing typed data to plain text, which can occur in `text/plain` message bodies or `multipart` parts, as well as in the `application/x-www-form-urlencoded` format in either URL query strings or message bodies, involves significant implementation- or application-defined behavior. + +Schema Objects validate data based on the [JSON Schema data model](https://datatracker.ietf.org/doc/html/draft-bhutton-json-schema-00#section-4.2.1), which only recognizes four primitive data types: strings (which are UTF-8 except in [extremely limited circumstances](https://datatracker.ietf.org/doc/html/rfc8259#section-8.1)), numbers, booleans, and `null`. +Notably, integers are not a distinct type from other numbers, with `type: integer` being a convenience defined mathematically, rather than based on the presence or absence of a decimal point in any string representation. + +The Parameter and Encoding Objects offer features to control how to arrange values from array or object types. +They can also be used to control how strings are further encoded to avoid reserved or illegal characters. +However, there is no general-purpose specification for converting schema-validated non-UTF-8 primitive data types (or entire arrays or objects) to strings. + +Two cases do offer standards-based guidance: + +* [RFC3987 §3.1](https://datatracker.ietf.org/doc/html/rfc3987#section-3.1) provides guidance for converting non-Unicode strings to UTF-8, particularly in the context of URIs (and by extension, the form media types which use the same encoding rules) +* [RFC6570 §2.3](https://www.rfc-editor.org/rfc/rfc6570#section-2.3) specifies which values, including but not limited to `null`, are considered _undefined_ and therefore treated specially in the expansion process when serializing based on that specification + +To control the serialization of numbers, booleans, and `null` (or other values RFC6570 deems to be undefined) more precisely, schemas can be defined as `type: string` and constrained using `pattern`, `enum`, `format`, and other keywords to communicated how applications must pre-convert their data prior to schema validation. +The resulting strings would not require any further type conversion. + +The `format` keyword can assist in serialization. +Some formats (such as `date-time` or `byte`) are unambiguous, while others (such as [`decimal`](https://spec.openapis.org/registry/format/decimal.html) in the [Format Registry](https://spec.openapis.org/registry/format/)) are less clear. +However, care must be taken with `format` to ensure that the specific formats are supported by all relevant tools as unrecognized formats are ignored. + +Requiring input as pre-formatted, schema-validated strings also improves round-trip interoperability as not all programming languages and environments support the same data types. + ## Appendix C: Using RFC6570 Implementations Serialization is defined in terms of RFC6570 URI Templates in two scenarios: From b6e72042052889b16ed38a48a3c4f2859ee97dc4 Mon Sep 17 00:00:00 2001 From: Henry Andrews Date: Thu, 23 May 2024 07:33:14 -0700 Subject: [PATCH 2/5] Make Object names links (review feedback) Co-authored-by: Ralf Handl --- versions/3.0.4.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/versions/3.0.4.md b/versions/3.0.4.md index f667d093c2..1c545797aa 100644 --- a/versions/3.0.4.md +++ b/versions/3.0.4.md @@ -3707,7 +3707,7 @@ Serializing typed data to plain text, which can occur in `text/plain` message bo Schema Objects validate data based on the [JSON Schema data model](https://datatracker.ietf.org/doc/html/draft-bhutton-json-schema-00#section-4.2.1), which only recognizes four primitive data types: strings (which are UTF-8 except in [extremely limited circumstances](https://datatracker.ietf.org/doc/html/rfc8259#section-8.1)), numbers, booleans, and `null`. Notably, integers are not a distinct type from other numbers, with `type: integer` being a convenience defined mathematically, rather than based on the presence or absence of a decimal point in any string representation. -The Parameter and Encoding Objects offer features to control how to arrange values from array or object types. +The [Parameter Object](#parameterObject) and [Encoding Object](#encodingObject) offer features to control how to arrange values from array or object types. They can also be used to control how strings are further encoded to avoid reserved or illegal characters. However, there is no general-purpose specification for converting schema-validated non-UTF-8 primitive data types (or entire arrays or objects) to strings. From 1b6c4262e90dd8ec5d925ffe4a2cf51599a6d4aa Mon Sep 17 00:00:00 2001 From: Henry Andrews Date: Thu, 23 May 2024 07:33:44 -0700 Subject: [PATCH 3/5] Grammatical typo (review feedback) Co-authored-by: Ralf Handl --- versions/3.0.4.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/versions/3.0.4.md b/versions/3.0.4.md index 1c545797aa..6b4d526901 100644 --- a/versions/3.0.4.md +++ b/versions/3.0.4.md @@ -3716,7 +3716,7 @@ Two cases do offer standards-based guidance: * [RFC3987 §3.1](https://datatracker.ietf.org/doc/html/rfc3987#section-3.1) provides guidance for converting non-Unicode strings to UTF-8, particularly in the context of URIs (and by extension, the form media types which use the same encoding rules) * [RFC6570 §2.3](https://www.rfc-editor.org/rfc/rfc6570#section-2.3) specifies which values, including but not limited to `null`, are considered _undefined_ and therefore treated specially in the expansion process when serializing based on that specification -To control the serialization of numbers, booleans, and `null` (or other values RFC6570 deems to be undefined) more precisely, schemas can be defined as `type: string` and constrained using `pattern`, `enum`, `format`, and other keywords to communicated how applications must pre-convert their data prior to schema validation. +To control the serialization of numbers, booleans, and `null` (or other values RFC6570 deems to be undefined) more precisely, schemas can be defined as `type: string` and constrained using `pattern`, `enum`, `format`, and other keywords to communicate how applications must pre-convert their data prior to schema validation. The resulting strings would not require any further type conversion. The `format` keyword can assist in serialization. From 1cf5b0b199b644fa3bcfbf3bea5bad9083b51f8f Mon Sep 17 00:00:00 2001 From: "Henry H. Andrews" Date: Thu, 23 May 2024 12:06:56 -0700 Subject: [PATCH 4/5] Use correct versions of JSON Schema and JSON specs JSON Schema was accidentally the draft for 3.1, while this spec uses the older 7159 for JSON instead of 8259, so I have matched that rather than change it or cite both. --- versions/3.0.4.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/versions/3.0.4.md b/versions/3.0.4.md index 6b4d526901..2e2be7050b 100644 --- a/versions/3.0.4.md +++ b/versions/3.0.4.md @@ -3704,7 +3704,7 @@ Version | Date | Notes Serializing typed data to plain text, which can occur in `text/plain` message bodies or `multipart` parts, as well as in the `application/x-www-form-urlencoded` format in either URL query strings or message bodies, involves significant implementation- or application-defined behavior. -Schema Objects validate data based on the [JSON Schema data model](https://datatracker.ietf.org/doc/html/draft-bhutton-json-schema-00#section-4.2.1), which only recognizes four primitive data types: strings (which are UTF-8 except in [extremely limited circumstances](https://datatracker.ietf.org/doc/html/rfc8259#section-8.1)), numbers, booleans, and `null`. +Schema Objects validate data based on the [JSON Schema data model](https://datatracker.ietf.org/doc/html/draft-wright-json-schema-00#section-4.2), which only recognizes four primitive data types: strings (which are [only broadly interoperable as UTF-8](https://datatracker.ietf.org/doc/html/rfc7159#section-8.1)), numbers, booleans, and `null`. Notably, integers are not a distinct type from other numbers, with `type: integer` being a convenience defined mathematically, rather than based on the presence or absence of a decimal point in any string representation. The [Parameter Object](#parameterObject) and [Encoding Object](#encodingObject) offer features to control how to arrange values from array or object types. From aacbbc942e9b233f6aec658abaa7ad8bf3783b68 Mon Sep 17 00:00:00 2001 From: "Henry H. Andrews" Date: Fri, 24 May 2024 17:13:43 -0700 Subject: [PATCH 5/5] Add note about RFC6570 type conversions The spec doesn't address it, but implementations often have their own rules. --- versions/3.0.4.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/versions/3.0.4.md b/versions/3.0.4.md index 2e2be7050b..5beacd3522 100644 --- a/versions/3.0.4.md +++ b/versions/3.0.4.md @@ -3716,6 +3716,9 @@ Two cases do offer standards-based guidance: * [RFC3987 §3.1](https://datatracker.ietf.org/doc/html/rfc3987#section-3.1) provides guidance for converting non-Unicode strings to UTF-8, particularly in the context of URIs (and by extension, the form media types which use the same encoding rules) * [RFC6570 §2.3](https://www.rfc-editor.org/rfc/rfc6570#section-2.3) specifies which values, including but not limited to `null`, are considered _undefined_ and therefore treated specially in the expansion process when serializing based on that specification +Implementations of RFC6570 often have their own conventions for converting non-string values, but these are implementation-specific and not defined by the RFC itself. +This is one reason for the OpenAPI Specification to leave these conversions as implementation-defined: It allows using RFC6570 implementations regardless of how they choose to perform the conversions. + To control the serialization of numbers, booleans, and `null` (or other values RFC6570 deems to be undefined) more precisely, schemas can be defined as `type: string` and constrained using `pattern`, `enum`, `format`, and other keywords to communicate how applications must pre-convert their data prior to schema validation. The resulting strings would not require any further type conversion.