Skip to content

Parser stack overflow on WASI with --with-pydebug #131770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
FFY00 opened this issue Mar 26, 2025 · 12 comments
Open

Parser stack overflow on WASI with --with-pydebug #131770

FFY00 opened this issue Mar 26, 2025 · 12 comments
Assignees
Labels
OS-wasi release-blocker topic-parser type-bug An unexpected behavior, bug, or error

Comments

@FFY00
Copy link
Member

FFY00 commented Mar 26, 2025

Bug report

Bug description:

$ python Tools/wasm/wasi.py configure-build-python -- --with-pydebug
...
$ python Tools/wasm/wasi.py make-build-python
...
$ python Tools/wasm/wasi.py configure-host -- --with-pydebug
...
$ python Tools/wasm/wasi.py make-host
...
$ wasmtime run --dir .::/ cross-build/wasm32-wasip1/python.wasm -m test.test_compile
Traceback (most recent call last):
  File "/Lib/runpy.py", line 189, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
                               ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "/Lib/runpy.py", line 159, in _get_module_details
    code = loader.get_code(mod_name)
  File "<frozen importlib._bootstrap_external>", line 896, in get_code
  File "<frozen importlib._bootstrap_external>", line 826, in source_to_code
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
MemoryError: Parser stack overflowed - Python source too complex to parse

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Linked PRs

@FFY00 FFY00 added type-bug An unexpected behavior, bug, or error OS-wasi topic-parser labels Mar 26, 2025
@FFY00
Copy link
Member Author

FFY00 commented Mar 26, 2025

I bisected this to 0142236, specifically, the following call, which uses the new _Py_ReachedRecursionLimitWithMargin function.

def add_level(self) -> None:
self.print("if (p->level++ == MAXSTACK || _Py_ReachedRecursionLimitWithMargin(PyThreadState_Get(), 1)) {")
with self.indent():
self.print("_Pypegen_stack_overflow(p);")
self.print("}")

PYOS_STACK_MARGIN_BYTES seems a bit low, causing it to fail. In my last run, for example, I am seeing p->level at 209, which is quite far from the MAXSTACK limit. I don't know exactly how the WASM double stack model works, but I suspect that our PYOS_STACK_MARGIN value might be a bit low. I think expecting WASI to have a big enough stack for the parser to be able to parse all regular Python source files in the repo should be a fairly reasonable expectation.

cc @markshannon

@brettcannon
Copy link
Member

So what are our options at this point?

  • Play with numbers until we find one that passes the test suite (that's normally what I do when our stack protection starts failing)?
  • Turn off the stack protection under WASI for what was changed (Implement stack overflow protection for webassembly #130397)?
  • Rip it out as I believe @hoodmane has suggested it might not actually work as intended?
  • Give up in debug builds in WASI as it's the most constant pain point and I don't know how useful it is?

I just worry the longer we don't fix #131769 the more pain it will take to get debug builds working again. Plus I don't want to go too far into the betas w/ this not working.

@FFY00
Copy link
Member Author

FFY00 commented Apr 22, 2025

In my opinion, I think we should disable the stack protection until WASM provides a mechanism that allow us to implement it correctly.

Due to WASM being sandboxed, stack overflows are not as big of a concern there compared to native runtimes, so I don't think we are losing a particularly large amount of value by disabling this functionality.

That said, I do understand it might be frustrating for @markshannon, as the one who implemented it, so I'd appreciate his thoughts on this.

@brettcannon
Copy link
Member

@hugovk I made this a deferred blocker as I don't think it matters for a beta since it's a behind-the-scenes mechanism, but I don't want to reach final w/o a solution as not being able to a debug build in WASI doesn't seem great.

@brettcannon
Copy link
Member

brettcannon commented Apr 30, 2025

I tried playing with various values to get a debug build to work to no avail.

#elif defined(__wasi__)
/* Web assembly has two stacks, so this isn't really a size */
# define PYOS_LOG2_STACK_MARGIN 9

With a value of 29 I get:

Exception ignored in the internal traceback machinery:
Traceback (most recent call last):
  File "/Lib/traceback.py", line 7, in <module>
    import textwrap
  File "/Lib/textwrap.py", line 8, in <module>
    import re
  File "/Lib/re/__init__.py", line 127, in <module>
    import functools
  File "/Lib/functools.py", line 517, in <module>
    _CacheInfo = namedtuple("CacheInfo", ["hits", "misses", "maxsize", "currsize"])
  File "/Lib/collections/__init__.py", line 447, in namedtuple
    __new__ = eval(code, namespace)
MemoryError: Parser stack overflowed - Python source too complex to parse
Traceback (most recent call last):
  File "/Lib/runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Lib/runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "/Lib/test/__main__.py", line 1, in <module>
    from test.libregrtest.main import main
  File "/Lib/test/libregrtest/main.py", line 3, in <module>
    import re
  File "/Lib/re/__init__.py", line 127, in <module>
    import functools
  File "/Lib/functools.py", line 517, in <module>
    _CacheInfo = namedtuple("CacheInfo", ["hits", "misses", "maxsize", "currsize"])
  File "/Lib/collections/__init__.py", line 447, in namedtuple
    __new__ = eval(code, namespace)
object address  : 0x190ff20
object refcount : 6
object type     : 0x12e7698
object type name: MemoryError
object repr     : MemoryError('Parser stack overflowed - Python source too complex to parse')
lost sys.stderr

And with a value of 30 I get (truncated):

Fatal Python error: _Py_CheckRecursiveCall: Unrecoverable stack overflow (used 128 kB) while calling a Python object

You can use #133219 to get a pydebug of WASI again.

@brettcannon
Copy link
Member

A comment from Hood about the mechanism being a problem: #131855 (comment)

@markshannon
Copy link
Member

With a value of 29 I get:

Increasing PYOS_LOG2_STACK_MARGIN from 9 to 29 is increasing the stack limit 1 million times. Is that what you wanted to do?

@brettcannon
Copy link
Member

With a value of 29 I get:

Increasing PYOS_LOG2_STACK_MARGIN from 9 to 29 is increasing the stack limit 1 million times. Is that what you wanted to do?

Not really, I just want the test to pass without raising a MemoryError or blowing the stack like it used to.

@brettcannon
Copy link
Member

FYI I marked this as a deferred blocker, but I would like to get it resolved before b2, so once b1 comes out I plan to make this a release blocker.

@brettcannon
Copy link
Member

From #130397, the following snippet fails:

def test_format_layoutlist(self):
    def sample(indent=0, indent_size=2):
        return ttk._format_layoutlist(
        [('a', {'other': [1, 2, 3], 'children':
            [('b', {'children':
                [('c', {'children':
                    [('d', {'nice': 'opt'})], 'something': (1, 2)
                })]
            })]
        })], indent=indent, indent_size=indent_size)[0]

with the error message of:

➲ cross-build/wasm32-wasip1/python.sh tester.py
RecursionError: Stack overflow (used 128 kB) while calling a Python object
Normalization failed: type=MemoryError args=<unknown>

Also from the same issue, this sort of parsing failure shows up under test_ast, test_ttk_textonly, and test_unparse (but the last one only occurs when running the entire test suite).

@brettcannon
Copy link
Member

Now that b1 is out I have made this a release blocker for b2.

@markshannon
Copy link
Member

This looks like a WASI configuration problem. configure.ac contains the line AS_VAR_APPEND([LDFLAGS_NODIST], [" -z stack-size=16777216 -Wl,--stack-first -Wl,--initial-memory=41943040"]) which looks like it sets the wasm stack to 16M, but LLVM sets the C stack size to only 130k. We should probably reduce the wasm stack size and increase the C stack size.

@hoodmane

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS-wasi release-blocker topic-parser type-bug An unexpected behavior, bug, or error
Projects
Development

No branches or pull requests

3 participants