-
Notifications
You must be signed in to change notification settings - Fork 429
Test 427664 has different output on ARM, Intel platforms, maybe others... #266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@vielmetti thanks for reporting this difference... As you point out, this does not seem to happen in linux (64-bit Intel), nor Windows (32-bit and 64-bit builds)... But will carefully re-test these 3 situation... And I do not think it happened on linux (32-bit Intel), but that machine went down, so can not test there again... So not sure it is a 32 vs 64 bit situation... nor memory byte order... although I guess ARM is bigendian? Which is opposite to Intel... We are talking about a byte sequence after the word 'name', between the double quotes... First looking at the source - Input: input\in_427664.html Editor Shows: Ã1/2 But that is strange in that the comment in the source says - So the first question is why, how, when did this file get changed? I will check its history... Hmmm, it is also C3 31 2F 32 in the 2009 SF CVS source... so it was inherited this way by this repo... But that puzzle is nothing to do with different tidy text output shown here... Then looking at testbase: testbase\msg_427664.txt Editor Show: name "??1/2" WOW, now we are getting " ef bf bf ef bf bf 31 2f 32 "!!!!!!!!!!!! And when I run this test in Windows and Linux, I get that same strange output... so no diff... Unfortunatley I think the values show first in this post have been But as stated, I think these are So I need you to -
Please not all, just the sequence that causes the trouble... And I do not think you need to build other test cases at this stage. Quite likely, if we can solve this one difference in character encoding, it might solve them all... Meantime I will try to understand why Welcome to the crazy world of character encodings ;=)) |
Have not yet had a chance to dig further into this 427664 difference, and while the above is about a difference that seems to be between Intel and ARM CPUs, have now identified two more differences between Linux and Windows, both 64-bit Intel machines...
It seems the only solutions for 1. is to exclude it, or mangle the windows app a little to use unix path seps for this filename output... not too difficult to fix... and fix the unix script to avoid the leading The output in 2. has different line numbers, and even have a different doctype... Very, very strange!!! need to investigate more... Errata: Above item 2 erroneously had 878205, now fixed. |
@vielmetti just an update on these test differences... test 676205Found the problem with test 676205. As indicated seemed like different files were being processed, and that was it. If the file input/in_676205.html, which has no doctype, is deleted, the test input will be the correct one, namely input/in_676205.xhtml, and the output will now exactly match that in testcases/out_676205.html. I guess this was a difference between how the test file was selected in linux and windows scripts. linux was selecting the HTML, and windows the XHTML, with the correct doctype. Will push this fix, well actually just a deletion, soonest... test 431895As explained above, test 431895 can not be avoided since it is the one case where the filename is output to the message file and thus there can always be differences between the platform path separators. Correct it for one, and the others can show a difference... The only choices here are -
Any help on option 3 very welcome... should not be too difficult... test 427664As both my machines have Intel CPUs I can not replecate this problem, so can do nothing more to try to understand the difference on an ARM. However, recently there was a change in the internal moving of characters to the lexer, issue #286, and maybe this would change something to do with this? All the recent development in the So if you get a chance to pull the latest |
Well, fixing 431895 using the option 3. turned out easier than expected... The TidyEmacsFile name is set for each file in the input - The following patch should do it!
If someone wants to pick this up, and test it, maybe improve the getEmacsFilename() function, that would be great... only quickly tested it in windows... thanks... |
Also, it would neat if we could get some kind of simple command ( |
@mcepl sorry, almost missed this comment... not sure neat qualifies as a We have opened a discussion on the tests and testing procedures - see #330 Once we have all the tests sorted out, fixed, running correctly, testing and showing what we want, yes it would be possible to add a As a windows developer, I have already scripted that to -
The equivalent windows cmake command to
So either would work for me... If you want to add that to the CMakeLists.txt, in your fork, then would appreciate a patch or PR, and it will certainly be tried... but as stated, please add that to #330 discussion... It seems the only other open discussion here is also on those very tests, so will close here, and cross-reference here from there... |
This is one of two ARM related issues that I have isolated, where the output of
tidy
is different between two machines.5 systems under test: 4 on Travis (Mac and Linux, gcc and clang compilers) and 1 on Raspberry Pi 2 (Raspbian/Hypriot with gcc 4.6.3).
The test is 427664 "Missing attr values cause NULL segfault". There is no segfault, but...the output of the
tidy
command is different, specifically this test case:On all 4 Intel systems under test, this passes. On the Pi 2, I get this diff:
It looks like a byte ordering issue in the message output. The output is coming from localize.c at https://github.com/htacg/tidy-html5/blob/master/src/localize.c#L85 and it looks like it's triggering it at lexer.c at https://github.com/htacg/tidy-html5/blob/master/src/lexer.c#L3921 .
Again, this all works on all of the 64 bit Intel systems that I have, and fails only on 32 bit ARM. There's nothing obviously wrong in the code, so my next step is to create some more interesting failing test cases.
The text was updated successfully, but these errors were encountered: