Skip to content

tidy-html5 generates invalid HTML5 when href elements contain illegal characters #352

@ncouture

Description

@ncouture

When asking tidy-html5 to fix URIs using the fix-uri option (that is active by default), invalid href element characters are present in the output of tidy-htm5.

case in point (fix-uri documentation):

fix-uri

Type: Boolean
Default: yes
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should check attribute values that carry URIs for illegal characters and if such are found, escape them as
HTML4 recommends.

example and reproduction steps

 $ echo '<meta charset="utf-8"><title>xyz<a href=":|">invalid HTML5</a>' | tidy --doct --tidy-mark no --fix-uri yes - 2> /dev/null 

output

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>xyz</title>
</head>
<body>
<a href=":|">invalid HTML5</a>
</body>
</html>
$ echo $(!!) | html5check -h

output:

Error: Bad value “:|” for attribute “href” on element “a”: Illegal character in path segment: not a URL code point.
From line 1, column 88; to line 1, column 100
There were errors. (Tried in the text/html mode.)

html5check: https://about.validator.nu/html5check.py

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions