Skip to content

Whitespace formatting isn't valid and idempotent with ignore sensitivity #768

@gebsh

Description

@gebsh

In XML, only \t, \n, \r, and are considered whitespace and are affected by the xml:space attribute. However, when formatting an XML document with the xmlWhitespaceSensitivity option set to ignore, @prettier/plugin-xml uses String.prototype.trim() to remove whitespace characters, which results in removal of text that should be preserved.

plugin-xml/src/printer.js

Lines 281 to 288 in 68b3430

path.each((charDataPath) => {
const chardata = charDataPath.getValue();
if (!chardata.TEXT) {
return;
}
const content = chardata.TEXT.trim();
const printed = group(

For example, this document has a <text> element with 4 trailing U+00A0 No-Break Space characters:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
  <text>foo    </text>
  <text>bar</text>
</paragraph>

Formatting it removes these 4 trailing characters:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
  <text>foo</text>
  <text>bar</text>
</paragraph>

Due to this behavior, formatting of documents containing elements that only have non-breaking spaces causes the output to be different depending on how many formatting runs are performed. Given this input:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
  <text>    </text>
</paragraph>

This is the output after formatting the input once:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
  <text></text>
</paragraph>

And this is the output after formatting it twice:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
  <text />
</paragraph>

Here's a list of affected characters:

And an XML document that has each of these characters repeated 4 times in separate <text> elements:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>



</text>
  <text>



</text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text></text>
</paragraph>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions