-
-
Notifications
You must be signed in to change notification settings - Fork 37
Description
In XML, only \t, \n, \r, and are considered whitespace and are affected by the xml:space attribute. However, when formatting an XML document with the xmlWhitespaceSensitivity option set to ignore, @prettier/plugin-xml uses String.prototype.trim() to remove whitespace characters, which results in removal of text that should be preserved.
Lines 281 to 288 in 68b3430
| path.each((charDataPath) => { | |
| const chardata = charDataPath.getValue(); | |
| if (!chardata.TEXT) { | |
| return; | |
| } | |
| const content = chardata.TEXT.trim(); | |
| const printed = group( |
For example, this document has a <text> element with 4 trailing U+00A0 No-Break Space characters:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
<text>foo </text>
<text>bar</text>
</paragraph>Formatting it removes these 4 trailing characters:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
<text>foo</text>
<text>bar</text>
</paragraph>Due to this behavior, formatting of documents containing elements that only have non-breaking spaces causes the output to be different depending on how many formatting runs are performed. Given this input:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
<text> </text>
</paragraph>This is the output after formatting the input once:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
<text></text>
</paragraph>And this is the output after formatting it twice:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
<text />
</paragraph>Here's a list of affected characters:
- U+00A0 No-Break Space
- U+1680 Ogham Space Mark
- U+2000 En Quad
- U+2001 Em Quad
- U+2002 En Space
- U+2003 Em Space
- U+2004 Three-Per-Em Space
- U+2005 Four-Per-Em Space
- U+2006 Six-Per-Em Space
- U+2007 Figure Space
- U+2008 Punctuation Space
- U+2009 Thin Space
- U+200A Hair Space
- U+2028 Line Separator
- U+2029 Paragraph Separator
- U+202F Narrow No-Break Space
- U+205F Medium Mathematical Space
- U+3000 Ideographic Space
- U+FEFF Zero Width No-Break Space
And an XML document that has each of these characters repeated 4 times in separate <text> elements:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
<text> </text>
<text> </text>
<text> </text>
<text> </text>
<text> </text>
<text> </text>
<text> </text>
<text> </text>
<text> </text>
<text> </text>
<text> </text>
<text> </text>
<text> </text>
<text>
</text>
<text>
</text>
<text> </text>
<text> </text>
<text> </text>
<text></text>
</paragraph>