Skip to content

GH-130396: Work around for broken pthread_get_stackaddr_np on Emscripten #131088

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

hoodmane
Copy link
Contributor

@hoodmane hoodmane commented Mar 11, 2025

Emscripten reports having pthread_get_stackaddr_np but it does nothing. I opened an upstream PR to actually fill in the stack information: emscripten-core/emscripten#23887

Until we can update to an Emscripten version that includes this fix, we can work around it by using #defines to replace pthread_get_stackaddr_np with the modified version.

… Emscripten

Emscripten reports having `pthread_get_stackaddr_np` but it does nothing.
I opened an upstream PR to actually fill in the stack information:
emscripten-core/emscripten#23887

Until we can update to an Emscripten version that includes this fix, we
can work around it by using `#define`s to replace `pthread_get_stackaddr_np`
with the modified version.
Copy link
Contributor

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general approach makes sense to me - although I'd like confirmation from @markshannon that he's OK with this as well.

In terms of functionality: this definitely works a lot better - I'm not getting the -sASSERTIONS error we were seeing in our pairing session earlier this week. However, it's not passing a full test run for me, for reasons that possibly seem related to (or, at least, intertwined with) this change:

$ ./python.sh -m test -v --rerun -W
...
0:02:30 load avg: 0.10 [250/485] test_ioctl skipped
0:02:30 load avg: 0.10 [251/485] test_ipaddress
0:02:30 load avg: 0.10 [251/485] test_ipaddress passed
0:02:30 load avg: 0.10 [252/485] test_isinstance
wasm://wasm/02937f2a:1


RangeError: Maximum call stack size exceeded
    at wasm://wasm/02937f2a:wasm-function[12560]:0x65658a
    at wasm://wasm/02937f2a:wasm-function[2087]:0x1ceea7
    at wasm://wasm/02937f2a:wasm-function[2128]:0x1d01e1
    at wasm://wasm/02937f2a:wasm-function[3676]:0x2b02da
    at wasm://wasm/02937f2a:wasm-function[3677]:0x2b03c7
    at wasm://wasm/02937f2a:wasm-function[2289]:0x1db42d
    at wasm://wasm/02937f2a:wasm-function[3290]:0x26f6ef
    at wasm://wasm/02937f2a:wasm-function[3291]:0x27f317
    at wasm://wasm/02937f2a:wasm-function[760]:0x16d46b
    at wasm://wasm/02937f2a:wasm-function[759]:0x16d397

Node.js v20.17.0

It happens a lot earlier in the test run if you use -uall (as we would be doing in a CI config):

$ ./python.sh -m test -v -uall --rerun -W
...
0:00:22 load avg: 0.10 [ 98/485] test_build_details
0:00:22 load avg: 0.10 [ 98/485] test_build_details passed
0:00:22 load avg: 0.10 [ 99/485] test_builtin
wasm://wasm/02937f2a:1


RangeError: Maximum call stack size exceeded
    at wasm://wasm/02937f2a:wasm-function[2011]:0x1c23ef
    at wasm://wasm/02937f2a:wasm-function[3218]:0x266cf6
    at wasm://wasm/02937f2a:wasm-function[2011]:0x1c241d
    at wasm://wasm/02937f2a:wasm-function[3218]:0x266cf6
    at wasm://wasm/02937f2a:wasm-function[2011]:0x1c241d
    at wasm://wasm/02937f2a:wasm-function[3218]:0x266cf6
    at wasm://wasm/02937f2a:wasm-function[2011]:0x1c241d
    at wasm://wasm/02937f2a:wasm-function[3218]:0x266cf6
    at wasm://wasm/02937f2a:wasm-function[2011]:0x1c241d
    at wasm://wasm/02937f2a:wasm-function[3218]:0x266cf6

Node.js v20.17.0

@bedevere-app
Copy link

bedevere-app bot commented Mar 12, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@hoodmane
Copy link
Contributor Author

Pyodide gives much better stack traces on this sort of failure. I think one thing it does is call _Py_DumpTraceback(). I'll try to add that logic.

@hoodmane
Copy link
Contributor Author

When I ran the tests locally with this fix I think they all passed, though I didn't pass -uall.

@freakboy3742
Copy link
Contributor

When I ran the tests locally with this fix I think they all passed, though I didn't pass -uall.

I've been able to reproduce this failure locally (on macOS) and on the buildbot machine (linux).

@hoodmane
Copy link
Contributor Author

Indeed, I can reproduce it too. Maybe let's just add @support.skip_emscripten_stack_overflow() to test_infinite_cycle_in_bases for now, it's only one more recursion test that is broken...

@freakboy3742
Copy link
Contributor

With the fix from #131158 applied, running on macOS, the test suite completes, but I get three test failures (test_os, test_shutil and test_unicode_file_functions). However, they all appear to be for filesystem access reasons, unrelated to this PR.

I'm also seeing failures with -uall related in test_builtin, test_capi, test_descr; there are probably others; that's as far as I got with manually disabling single tests with -x. However, as with the test_isinstance failure, I get the impression that these are more traditional stack overflow issues that need @support.skip_emscripten_stack_overflow() markers (some of them already have skip_wasi_stack_overflow). We can clean up those and the filesystem tests in a separate PR.

The new stack traces are really pretty though 😄

So - I'm happy to sign off on this; once we've got confirmation from @markshannon that he doesn't object to the approach, we can merge and we're one step closer to a buildbot!

@hoodmane
Copy link
Contributor Author

The fix for this just barely missed inclusion into Emscripten 4.0.5 but it will definitely be in 4.0.6.

@markshannon
Copy link
Member

This approach seems fine to me.

Protecting against stack overflow is very much a "practicality beats purity" thing.
We want extremely few, ideally no, cases where the VM crashes, and within that constraint we want to maximize the available stack available.

I used pthread_get_stackaddr_np because it happens to work for linux and similar systems. If there is something else that works better for the webassembly platforms, please feel free to use that.

@hoodmane
Copy link
Contributor Author

The goal of Emscripten is to behave as much like linux as possible, subject to other the architecture and platform constraints and the desire to minimize code size. So in cases like this, finding discrepancies with normal linux and resolving them in Emscripten is ideal.

If there were a cheap way to ask about the true stack in addition to the spill stack, that would be great but as it is at least we can ask about the spill stack correctly. Maybe now if we set the size of the spill stack to be kind of small the stack checks will work a bit more often.

@freakboy3742 freakboy3742 merged commit f3e275f into python:main Mar 13, 2025
47 checks passed
@hoodmane hoodmane deleted the emscripten-pthread_get_stackaddr_np branch March 14, 2025 09:24
plashchynski pushed a commit to plashchynski/cpython that referenced this pull request Mar 17, 2025
… Emscripten (python#131088)

Implements a workaround implementation of `pthread_get_stackaddr_np` for Emscripten.
This will be replaced by an implementation that will be included in Emscripten 4.0.6.
seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025
… Emscripten (python#131088)

Implements a workaround implementation of `pthread_get_stackaddr_np` for Emscripten.
This will be replaced by an implementation that will be included in Emscripten 4.0.6.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants