-
Notifications
You must be signed in to change notification settings - Fork 429
stack exhaustion in tidy-html5 5.1.25 #343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@gaa-cifasis, hi Gus, yes this is a Tidy Feature ;=)) I guess if I searched back far enough in the bugs, in SF, I would find it maybe mentioned, maybe several times... It is really easy to create a rediculously long html sequence that will exhaust the stack... but it is usually not a very pratical html file... You are welcome to commence a re-write of library tidy and do this another way. The current library uses recursive calls as the DOM like stack/tree... I too could think of other, maybe better ways, like all of us probably, but this would be a major re-write of the parser!!! I remember reading at that time, way back in history, that if you do run across such a rediculous beast in the wild there are maybe ways to increase the stack allocation of the executable, if you really must use But hey, thanks for generating that beautiful 1.4 MB sample... and look forward to ideas on a tidy re-write to handle ANYTHING ;=)) |
How can this be a "Feature"? |
@benkasminbullock, well only in the sense that it has existed since the first tidy I have, Dave Raggett's tidy04aug00, so this stack exhaustion is built into the fabric of tidy code... built into the way it works... so it is a I do not want to label it a But this big-bad-bug, or limitation of tidy has no solution other than a full re-write, a redesign, of the parser code. It is maybe a If I was in your repo I could add labels like I have added |
@gaa-cifasis, @benkasminbullock in combing through, and closing all the old SF bugs, found this was mentioned as far back as 2005-12-01, by Lee Jensen, Bug 742! Now what Lee mentioned, and we discussed, was that maybe there could be configuration options like Already in my MSVC Debug code I have a static counter for the depth of say the calls to
A second run stopped at 5753, then 5750,... but, for sure it stops! Naturally each exit from
Of course using a static counter is not the best, since we would also have to ensure it is cleared for each document, but it could be a simple variable as part of the TidyDocument lexer structure, which is created new for each document... And we might need to count more than just Anyway, this seems like a viable possibility... What do other think? Should we bother? Thoughts, comments, even a PR welcome... thanks... |
Would the static counter be configurable? Or we need to have a really good sample from different processors, as this would certainly give different results on x86 versus 68020 (assuming we could compile it). |
@balthisar good idea to close this... since it would be quite an effort to do something meaningful in a cross platform, multi-cpu way... But really the stacked calls are the heart of tidy functionality... the best way would be a total rewrite! Employing an entirely different methodology... Did not know we had a |
Hi Lilly, Re: libtidy stack overflow As discussed in this issue - circa 2016 - As can be seen, there has been some discussion So at this time, there is NO SOLUTION pending... We, at Tidy, would certainly appreciate any further Regards, PS: Will cross post this there... On 30/07/17 23:53, Lilly Random wrote:
|
5.9.9 fixes this. |
Hello,
We found a stack exhaustion in tidy-html5 (version: 5.1.25). You can find a test case to reproduce it here [1.4MB]. Technical details are here:
$ gdb -ex 'tty /dev/null' --args ./tidy exhaustion.html
(gdb) run
Starting program: ./tidy exhaustion.html
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff4e53b2b in ?? () from /usr/lib/x86_64-linux-gnu/libasan.so.0
(gdb)
(gdb) bt
#0 0x00007ffff4e53b2b in ?? () from /usr/lib/x86_64-linux-gnu/libasan.so.0
#1 0x00007ffff4e60443 in malloc () from /usr/lib/x86_64-linux-gnu/libasan.so.0
#2 0x000000000047c581 in defaultAlloc (allocator=0x721de0 <prvTidyg_default_allocator>, size=2048)
#3 0x000000000040ecf6 in messagePos (doc=0x607c00018900, level=TidyWarning, line=6169, col=44, msg=0x497f60 "nested emphasis %s",
#4 0x000000000040f9da in messageNode (doc=0x607c00018900, level=TidyWarning, node=0x601600f086d0, msg=0x497f60 "nested emphasis %s")
#5 0x00000000004113d5 in prvTidyReportWarning (doc=0x607c00018900, element=0x601600f086d0, node=0x601600f08620, code=9)
#6 0x000000000043bcdd in prvTidyParseInline (doc=0x607c00018900, element=0x601600f086d0, mode=MixedContent)
#7 0x0000000000436ecf in ParseTag (doc=0x607c00018900, node=0x601600f086d0, mode=MixedContent)
#8 0x000000000043e07a in prvTidyParseInline (doc=0x607c00018900, element=0x601600f08780, mode=MixedContent)
#9 0x0000000000436ecf in ParseTag (doc=0x607c00018900, node=0x601600f08780, mode=MixedContent)
#10 0x000000000043e07a in prvTidyParseInline (doc=0x607c00018900, element=0x601600f08830, mode=MixedContent)
#11 0x0000000000436ecf in ParseTag (doc=0x607c00018900, node=0x601600f08830, mode=MixedContent)
#12 0x000000000043e07a in prvTidyParseInline (doc=0x607c00018900, element=0x601600f088e0, mode=MixedContent)
#13 0x0000000000436ecf in ParseTag (doc=0x607c00018900, node=0x601600f088e0, mode=MixedContent)
#14 0x000000000043e07a in prvTidyParseInline (doc=0x607c00018900, element=0x601600f08990, mode=MixedContent)
#15 0x0000000000436ecf in ParseTag (doc=0x607c00018900, node=0x601600f08990, mode=MixedContent)
#16 0x000000000043e07a in prvTidyParseInline (doc=0x607c00018900, element=0x601600f08a40, mode=MixedContent)
#17 0x0000000000436ecf in ParseTag (doc=0x607c00018900, node=0x601600f08a40, mode=MixedContent)
#18 0x000000000043e07a in prvTidyParseInline (doc=0x607c00018900, element=0x601600f08f10, mode=MixedContent)
#19 0x0000000000436ecf in ParseTag (doc=0x607c00018900, node=0x601600f08f10, mode=MixedContent)
#20 0x000000000043e07a in prvTidyParseInline (doc=0x607c00018900, element=0x601600f08fc0, mode=MixedContent)
#21 0x0000000000436ecf in ParseTag (doc=0x607c00018900, node=0x601600f08fc0, mode=MixedContent)
#22 0x000000000043e07a in prvTidyParseInline (doc=0x607c00018900, element=0x601600f09070, mode=MixedContent)
#23 0x0000000000436ecf in ParseTag (doc=0x607c00018900, node=0x601600f09070, mode=MixedContent)
---Type to continue, or q to quit---q
(.. really long back trace)
In my opinion, tidy-html5 shouldn't crash because you can force it to execute a really long sequence of functions calls (tail recursion maybe can help?).
Regards,
Gus.
The text was updated successfully, but these errors were encountered: