-
Notifications
You must be signed in to change notification settings - Fork 429
Unescaped &
emitted despite using **output-xhtml** key bindings in 5.6.0 in PHP bindings
#704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@TysonAndre thank you for your issue, but I am sorry, I am a little confused as to what the actual As you point out passing the simple string And if I pass the following to current tidy, with or without <!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title>Is #704-1</title>
</head>
<body>
hello & bye
</body>
</html> I get back Although I have not yet checked any references about This suggests the commit 3a524f1, made for #207, should have also extended to But that seems the opposite of what you are seeing in the PHP binding Now I can see some reasons for this, maybe? You are parsing a What do you get if you pass my above sample to the PHP bindings? I am guessing you should get And I just remembered I have built a unix php 7.3.0-dev on 20180212, installed in So I put your above sample And that further reminded me that I also built it in Windows, using MSVC, installed in
Same result as in unix! But maybe I do not quite understand the issue... please provide more feedback... thanks... |
Strange. I was able to reproduce with 3a30f6a (master) I'm still see
Yes. For example, the below script would be rejected by PHP's libxml parsing (If you come across any options to accept unambiguous ampersands, they'd be useful to me)
The default setting for the W3C validator is HTML. Did you select XML fragments? EDIT: I see "in some contexts". Is it possible to add a flag to force escaping as |
@TysonAndre thank you for the further feedback... Let me deal with the validator first. As I am sure you are aware there are two validators, run by different technologies. There is the So if you pass just your fragment, And if you pass just the fragment to If you check the default, As stated, this implies that ampersand is valid in HTML5, and XHTML5... Now I can not locate specific W3C that documents what we could call an And as you pointed out in the Now I have pushed 3 tests to my repo in_704.html - your initial fragment, in_704-1.html - the above full xhtml5 doc, and in_704-2.html - full html5 document. These files can be validated using a URL like - https://rawgit.com/geoffmcl/tidy-test/master/test/input5/in_704-1.html, or in_704-2.html, both of which PASS. And this leads me to the suggestion that the fix for #207 is incomplete, in that at present tidy will warn and escape So your further feedback leaves 2 things -
The first is simple. As discussed in #673, and have now added PR #705, the And if I checkout the It is only if I checkout, build and install the 'issue-673' branch will I get Now concerning the building of tidy, and php, first you show the tidy build as Well first the I use Then I am not sure what That gets me And it is worth doing something like You should only have a current active link Now concerning the building of php I used And in that bin dir, if I run And as stated this php gives me So this clearly explains why we are seeing something different. That is commit 67eaeb6 is not in release And as indicated, only Would need to think about this option a little, and consequences, but it should be possible... if there is a need, a strong use case... You suggest because php's libxml doesn't accept that, but is that not a case to fix libxml, not add an option to tidy... Does this add anything? thanks... |
Sorry. I left out the install prefix assuming it was implied. To be clear, I used an install prefix in all commands. I also checked the tidy version, and I'm also confused about that. There may be something I missed. The linked spec appears to be for html5, and has limited references to XHTML. In any case, the library I'm using works with html fragments.
I would support a new option to do that. |
@TysonAndre have now merged PR #705 and hope that closes this... and #673 If I missed something please feel free to re-open, or a new issue... thanks |
This occurred when I upgraded from tidy-html5 5.4.0 to 5.6.0. I also upgraded php at the same time to 7.1.14, but php-src/ext/tidy has no recent changes
#207 (comment) seems related. I don't see any discussion about how xhtml would be affected in #207
http://validator.w3.org/check would warn about
hello & bye
as an XHTML fragment. If XHTML5 is HTML5 represented as XML, shouldn't that be automatically fixed by tidy-html5?The below script reproduces this bug with PHP 7.1.9 (Running tidy-html5 5.4.0) and 7.1.14 (Built with tidy-html5 5.6.0)
More details: This seems specific to the PHP bindings (Or maybe CLI bindings are setting options that I missed).
echo 'hello & bye' | ./tidy -asxhtml
works properly for me, generatinghello & bye
in both tidy-html5 5.4.0 and 5.6.0 (withmake clean
then rebuilding with git checkoutsWhen I rebuilt php 7.1.14's tidy extension with tidy-html5-5.4.0, the result was
hello & bye
, as expected/wanted.For details on how I built php, see #673 (comment)
The text was updated successfully, but these errors were encountered: