Skip to content

Formalize how to express null value in xml #3959

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ahmednfwela opened this issue Jul 16, 2024 · 10 comments · May be fixed by #4612
Open

Formalize how to express null value in xml #3959

ahmednfwela opened this issue Jul 16, 2024 · 10 comments · May be fixed by #4612
Labels
media and encoding Issues regarding media type support and how to encode data (outside of query/path params) xml
Milestone

Comments

@ahmednfwela
Copy link

Since XML has no concept of null, how can we handle validating a null value (both as an attribute and as an element) ?

consider the following 3.0 schema :

person:
  type: object
  required:
    - name
    - attrName
  properties:
    name:
      type: string
      nullable: true
    attrName:
      type: string
      nullable: true
      xml:
        attribute: true

notice how required here prevents us from removing the attribute/element

I have thought about this, and here are some of the approaches I came up with:

For elements

Approach 1: self closing tags

<person>
  <name />
</person>

Pros: Makes sense to whoever reads it
Cons: Nothing i can think of, but maybe some xml parsers can consider a self-closing tag equivalent to empty string and don't distinguish between them, which means they don't survive round tripping:
e.g.
<name /> gets represented on the way back to: <name></name>

Approach 2: empty string

<person>
  <name></name>
</person>

Cons: if the property is of type string, it's not possible to distinguish between a non-null empty string and a null.
Workaround: force strings to be wrapped around double quotes, e.g.

  • this is null:
<name></name>
  • this is empty string:
<name>""</name>
  • this is valid string:
<name>"hello"</name>
  • this is non valid string
<name>hello</name>

Ofc this workaround is very problematic and not good since most parsers consider "" A valid 2 character string.

Approach 3: special marker attribute

<name xsi:nil="true"></name>
<name xsi:nil="true"/>

Pros: Can represent nulls consistently without having to check the contents of the element, this is also how xml schema does it.
Cons: Size overhead of having to use xsi:nil="true" everywhere null is used.

For attributes

Approach 1: empty string

<person attrName="" />

Approach 2: disallow nullable attributes altogether

make it that xml.attribute: true and nullable: true are mutually exclusive

@ralfhandl ralfhandl added the xml label Jul 17, 2024
@ralfhandl
Copy link
Contributor

@ahmednfwela Thanks for reporting this, and for the detailed research and references.

Preliminary analysis for elements

XML 1.0, section 3.1 "Start-Tags, End-Tags, and Empty-Element Tags" states that the element forms <name></name> and <name/> are equivalent and represent an element with no content, aka an empty element, so approaches 1 and 2 are equivalent.

The meaning of "empty" seems to depend on context/implementation; for string-valued elements "empty" means the empty string.

So approach 3 (xsi:nil="true") seems to be the way forward.

Preliminary analysis for attributes

XML 1.0, section 3.3.3 "Attribute-Value Normalization" describes an algorithm that MUST be applied before the value of an attribute is passed to the application or checked for validity. This algorithm begins with a normalized value consisting of the empty string, then appends to it. Thus attribute values are always strings, potentially the empty string, and never null.

So approach 2 (disallow nullable attributes) seems to be the way forward.

@ralfhandl ralfhandl added this to the v3.2.0 milestone Jul 25, 2024
@handrews handrews added the media and encoding Issues regarding media type support and how to encode data (outside of query/path params) label Jul 29, 2024
@handrews
Copy link
Member

handrews commented May 5, 2025

@ralfhandl For attributes, is there an Option 3: omit the attribute?

@handrews
Copy link
Member

handrews commented May 5, 2025

I am asking because due to compatibility reasons, we cannot forbid type: "null" on XML attributes in 3.x, so we need an alternative to recommend as a SHOULD. Mapping null to the empty string does not seem ideal.

@ahmednfwela
Copy link
Author

@handrews How would this be distinguishable from required: false ?

@handrews
Copy link
Member

handrews commented May 5, 2025

@ahmednfwela in practice, it's probably not. But we can't do the preferred option of forbidding null for attributes in 3.2 because that would break compatibility with 3.1. (we don't want another incompatibility in a minor release after 3.0->3.1).

@ahmednfwela
Copy link
Author

so we can say that for attributes setting nullable: true ignores the value of required.

@handrews
Copy link
Member

handrews commented May 5, 2025

@ahmednfwela we can't do anything that contradicts specified behavior in OAS 3.1.

@ahmednfwela
Copy link
Author

ahmednfwela commented May 6, 2025

@ralfhandl For attributes, is there an Option 3: omit the attribute?

@handrews but this is already contradicting OAS 3.1, as a schema with required: true + nullable: true would be invalid in 3.1 if the attribute was omitted

@handrews
Copy link
Member

handrews commented May 6, 2025

Hmm... I do see your point, @ahmednfwela . I'm kind of at the point of throwing my hands up on this and saying that if you try to use type: "null" on an attribute the results are implementation-defined. Which we need to do for both of these anyway for compatibility, but at least we can give a SHOULD for elements this way.

@lornajane lornajane modified the milestones: v3.2.0, v3.3.0 May 15, 2025
@handrews
Copy link
Member

@ahmednfwela I've continued to think about this, and I think the concern about constructs like:

properties:
  someElement:
    required:
    - someAttribute
    properties:
      someAttribute:
        type: [number, "null"]
        xml:
          attribute: true

do not prevent handling null by omitting the attribute.

There are two representations of the data here: The in-memory data which needs to be in a structure that can be modeled by JSON Schema, and the XML serialization, which does not. The XML Object tells us how to map between the two.

The JSON Schema constraints apply to the in-memory data. As long as the mapping from null to a missing attribute and back to null is well-defined, then the required constraint is satisfied. This basically turns the question around into the following form:

  • Serialization:
    • First validate the in-memory representation; then, if the in-memory value is null, omit the attribute
  • Parsing:
    • If the schema supports type: "null" and the attribute is missing, set the in-memory value to null
    • If the schema does not support type: "null" and the attribute is missing, then there is no available mapping, and if the attribute is also required then validation fails

There really is no reason to try to have a special null representation for attributes. It is purely an in-memory-data-structure construct, and the only logical option is to omit it.

There is the awkwardness that the value to which the missing attribute is parsed is dependent on the schema, but a 1:1 mapping is not possible. We would produce a similar overloading by mapping to the empty string, but in that case the round-trip would be lossy (there's no way to tell whether an attribute with an empty string and a schema with "type": ["string", "null"] should be parsed as the empty string or as null, and the more logical option is the empty string, which means null would round-trip to the empty string. With the above approach, null round-trips correctly.

As with all of the options explored, this is not ideal. But it strikes me as more consistent than any of the others. It round-trips correctly, and while some schema constructs don't make much sense, that is true of JSON Schema in general and is therefore acceptable. You can write a lot of schemas that are redundant or impossible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
media and encoding Issues regarding media type support and how to encode data (outside of query/path params) xml
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants