Skip to content

GH-113655: Lower the C recursion limit for some platforms #124264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 23, 2024

Conversation

thesamesam
Copy link
Contributor

@thesamesam thesamesam commented Sep 20, 2024

Lower the C recursion limit for HPPA, PPC64 and SPARC, as they use
relatively large stack frames that cause e.g. test_descr to hit
a stack overflow. According to quick testing, it seems that values
around 8000 are max for HPPA and PPC64 (ELFv1 ABI) and 7000 for SPARC64.
To keep things safe, let's use 5000 for PPC64 and 4000 for SPARC.

Co-authored-by: Sam James [email protected]

@@ -206,7 +206,7 @@ struct _ts {
// A debug build is likely built with low optimization level which implies
// higher stack memory usage than a release build: use a lower limit.
# define Py_C_RECURSION_LIMIT 500
#elif defined(__s390x__)
#elif defined(__BIG_ENDIAN__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining why big endian has a lower limit, with gh-113655 reference.

@vstinner
Copy link
Member

We currently reduce the C recursion limit just for s390x (which is big-endian), but in Gentoo, we hit the same issues on ppc64be, hppa, and sparc.
At least for ppc64, it seems like the ABI used on ppc64be means you end up with bigger stack frames.

That's strange that all big endian archs allocates more memory on the stack. What can explain that only big endian archs are affected?

@mgorny
Copy link
Contributor

mgorny commented Sep 20, 2024

I think it's a coincidence, and the logic is more related to the fact that big endian architectures aren't tested much. Perhaps the more appropriate logic would be to have a "more conservative" value in the default branch, and make it higher for platforms that we know are safe.

@mgorny
Copy link
Contributor

mgorny commented Sep 20, 2024

Ok, so I've played a bit with godbolt and got the following numbers for a dummy function with no args or variables:

  • ARM (32-bit): 8 bytes (BE/LE)
  • AArch64: 16 bytes (BE/LE)
  • x86: 8 bytes (+ 4 bytes in 32-bit programs)
  • LoongArch: 16 bytes
  • MIPS: 24/32 bytes (depending on ABI, BE/LE)
  • ppc64 (BE): 128 bytes
  • ppc64le: 112 bytes
  • ppc32: 16 bytes
  • riscv: 16 bytes (32/64-bit)
  • s390x: 160 bytes
  • s390: 96 bytes
  • SPARC: 96 bytes
  • SPARC64: 176 bytes

So, looks like ppc64/s390/SPARC are the platforms with unusually large stack frames. However, I don't see any obvious relation between byte order and stack frame size here. ppc64le's is only a little smaller than ppc64 BE, so I have no clue why we're hitting the crash with one but not the other — but maybe it just happened to be on the border. On all other biendian arches, stack frame seems to be of the same size on both variants.


I've used the following code:

void b() {
}

void a() {
    b();
}

Tested with various compilers, compiler options (-mbig-endian, -mlittle-endian, -m32…). Enabled "Stack Usage" via "Add new…"

@vstinner
Copy link
Member

So, looks like ppc64/s390/SPARC are the platforms with unusually large stack frames

Would it be possible to only use lower limit on these 3 archs? Rather than testing "big endian".

@mgorny
Copy link
Contributor

mgorny commented Sep 20, 2024

Yeah, I suppose so. @thesamesam also mentioned HPPA, but godbolt doesn't cover that.

@@ -206,7 +206,7 @@ struct _ts {
// A debug build is likely built with low optimization level which implies
// higher stack memory usage than a release build: use a lower limit.
# define Py_C_RECURSION_LIMIT 500
#elif defined(__s390x__)
#elif defined(__BIG_ENDIAN__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#elif defined(__BIG_ENDIAN__)
#elif defined(__hppa__) || defined(__powerpc64__) || defined(__s390__) || defined(__sparc__)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we could try to use different values for different arches, given that ppc64 could probably use a higher value. Probably worth double-checking SPARC64, as it has even bigger frames than s390x.

@thesamesam
Copy link
Contributor Author

That's strange that all big endian archs allocates more memory on the stack. What can explain that only big endian archs are affected?

I saw this question just before I went to bed and my assumption (which I wanted to prove/disprove today) was that it was an unfortunate artefact of ABI design and that if we ended up looking at the ABIs for newer (and therefore LE) arches, we'd end up seeing some notable differences.

But mgorny (thank you!) beat me to it and I think his analysis is right. I'll look a bit more then update this PR. Thanks again both.

@mgorny
Copy link
Contributor

mgorny commented Sep 21, 2024

Ok, now for some real hardware testing of when test_descr starts crashing. The highest recursion limits I was able to use (with regular build from main, gcc, default optimization level, GNU/Linux):

  • amd64/x86/aarch64/ppc32: 32k
  • ppc64 (be): 9k (8k with PGO + forced -O2)
  • ppc64le: 11k
  • s390x: 12k (7k with PGO + -O2)
  • sparc64: 7k (also with PGO + -O2)
  • pa-risc: over 10k (host too slow to test more)
  • alpha: over 10k (host too slow to test more)

It looks like amd64/x86/etc. are hitting some other issue that looks like a 16-bit signed counter. The values I've gotten for PPC64 suggests that LE was on the edge already. Surprisingly, my value for s390x is much higher than the one currently used — FWICS it was lowered due to buildbots failing.

Honestly, I don't know what values to put here.

@mgorny
Copy link
Contributor

mgorny commented Sep 22, 2024

@vstinner, with Sam's permission, I have pushed a commit that lowers the limits on PPC64 and SPARC instead.

Lower the C recursion limit for HPPA, PPC64 and SPARC, as they use
relatively large stack frames that cause e.g. `test_descr` to hit
a stack overflow.  According to quick testing, it seems that values
around 8000 are max for HPPA and PPC64 (ELFv1 ABI) and 7000 for SPARC64.
To keep things safe, let's use 5000 for PPC64 and 4000 for SPARC.

Co-authored-by: Sam James <[email protected]>
@thesamesam
Copy link
Contributor Author

Thanks! Happy now.

@vstinner Can you take another look please?

@thesamesam thesamesam requested a review from vstinner September 23, 2024 05:42
@vstinner vstinner merged commit 0e89f7a into python:main Sep 23, 2024
36 checks passed
@vstinner
Copy link
Member

Merged, thanks. The final commit is way better than the first iteration only checking for big endian ;-)

@thesamesam thesamesam deleted the stack-size-endian branch September 23, 2024 07:04
@thesamesam thesamesam changed the title GH-113655: Lower the C recursion limit for all big-endian platforms GH-113655: Lower the C recursion limit for some platforms Sep 23, 2024
@mgorny
Copy link
Contributor

mgorny commented Sep 23, 2024

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants