Skip to content

Request SvPV_helper to be always inline #23571

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: blead
Choose a base branch
from

Conversation

khwilliamson
Copy link
Contributor

I compiled toke.s using -O0 and looked at the output. It had a full copy of SvPV_helper in it. But this function is called with various constants that make much of the code dead when constant folded. That was not being done here.

This commit changes to force inlining. Looking at a revised toke.s, I saw that SvPV_helper was inserted multiple times in the code, and that each occurrence had significantly fewer instructions than the original. I infer that the compiler did the constant folding and pruned out the dead code for each instance.

  • This set of changes does not require a perldelta entry.

It's usually a bad idea to try to work around a limitation in common
code by copy-pasting and then modifiying to taste.  Fixes/improvements
to the common code rarely get propagated to the outlier.

I wrote code in 1ef9039 that did just this for the prototype
definition of SvPV_helper, because the place where it really belongs,
embed.fnc, couldn't (and still doesn't) handle function pointers as
arguments (patches welcome).

I should have at least added a comment to the common code noting the
existence of this outlier.

It turns out that that limitation can be worked around by declaring a
typedef of the pointer, and then using that in embed.fnc.

That's what this commit does.

This commit removes the final instance of duplicating the work of
embed.fnc in the core, except for some in the regex system whose
comments say the reason is to avoid making a typedef public.  I haven't
investigated these further.
(when the compiler cooperates, that is)

I compiled toke.s using -O0 and looked at the output.  It had a full
copy of SvPV_helper in it.  But this function is called with various
constants that make much of the code dead when constant folded.  That
was not being done here.

This commit changes to force inlining.  Looking at a revised toke.s, I
saw that SvPV_helper was inserted multiple times in the code, and that
each occurrence had significantly fewer instructions than the original.
I infer that the compiler did the constant folding and pruned out the
dead code.
@bulk88
Copy link
Contributor

bulk88 commented Aug 15, 2025

I compiled toke.s using -O0 and looked at the output. It had a full copy of SvPV_helper in it. But this function is called with various constants that make much of the code dead when constant folded. That was not being done here.

This commit changes to force inlining. Looking at a revised toke.s, I saw that SvPV_helper was inserted multiple times in the code, and that each occurrence had significantly fewer instructions than the original. I infer that the compiler did the constant folding and pruned out the dead code for each instance.

I don't see the rational of this ticket.

-O0 or -Od should ALWAYS DISABLE C keyword inline or a vendor ext __forceinline, how else can you single step semicolon by semicolon that inline function in a C debugger and examine the C autos/args in variable watch window?

-DDEBUGGING builds have gotten too annoying for me to ever use because there was such an aggressive campaign to PERL_STATIC_INLINE ever last macro, more things than was really needed. Yeah some macros have a high historical chance of multi-eval macro in macro accidents, other macros historically have never ever gotten a macro in macro accident, but still got converted to PERL_STATIC_INLINE.

So -DDEBUGGING builds for me are now impossible to single step, because of so much NOOP lines (sequence points aka semicolons), and so many layers of perl interp src code dedicated to code prettyness that is doing inline-func-in-inline func-in-inline func for no engineering reason except that someone once wrote on a C++ blog "#define is evil".

If -O0 or -Od was allowing inlining/optimizations to still happen, isn't that a bug? or perhaps why use static inline at all? just go back to #define macro expressions to add 1 more 0 constant arg to the inner macro/function, and same for 0x1 0x2 constants, don't make brand new static inlines that just add ,0 or flags|0x4 and nothing more.

Don't forget about the huge PERL_ARGS_ASSERT_SAVESVPV; or PERL_ARGS_ASSERT_MORTAL_GETENV; statement which is real executable code in a -DDEBUGGING build, that you have to advance through in most tiny static inline functions, which suddenly are not so tiny once that PERL_ARGS_ASSERT_SAVESVPV; becomes a real conditional logic tree.

If F11 key press count is too annoying to line advance through a statement that no p5p person needs to debug for the last 30 years, or the tiny STATIC_INLINE is distracting to a human, on Perls built with -DDEBUGGING or -O0 or -Od, that tiny static inline should be unrolled/hoisted into a higher caller, or turned into a #define (least changed lines of code), or unrolled/copy pasted to its 1 or 2 primary child call/callee functions.

#define stops single stepability on all CCs universally, inline or __forceinline working in -O0 sounds like a immediate bug ticket with the CC vendor on why can't I single step or set a break point in -O0 mode in this code example?

If __forceinline works in -O0 or -Od, that sounds like taking advantage for your own benefit of a bug of that particular C compiler project, and that bug/feature could disappear at any time in the future.

Plus who knows if your bug works on 3-5 other OSes/toolchains/CC brands/how old is that CC or gdb/etc. Easier to switch to #define than answer the question if the bug existed 10 years ago, 5 years ago, and on 4-6 OSes you don't have access to, including some rare weird commercial unix oses.

@bulk88
Copy link
Contributor

bulk88 commented Aug 15, 2025

#define abc(a,b,c) STMT_START { a_t _a = (a); 0; 0; } STMT_END is __forceinline but you don't need to play guessing games which MSVC or GCC or CLANG from what year does what at -O1 -O0 and -O2.

#define abc(a,b,c) STMT_START { a_t _a = (a); 0; 0; } STMT_END can't replace a static inline that returns an rvalue tho.

@bulk88
Copy link
Contributor

bulk88 commented Aug 15, 2025

Another C compiler design question, lets say a developer creates a -O0 binary to debug a SEGV, emails it to a non-IT customer to diag a SEGV only the non-IT customer can reproduce not the dev, the non-IT customer runs the binary, it SEGVs, and his OS makes an automated coredump file. The non-IT customer emails that file back to the dev. The dev then opens that coredump file in his IDE.

Now comes the question, are the names of inline and non-standardized not-available-everywhere __forceinline C function calls, REQUIRED, to appear on that dev's screen, when his IDE decodes the backtrace/C stack return addresses from the coredump file the non-IT customer emailed him? Remember, the dev created a -O0 binary just for this customer and emailed it to the customer!

@bulk88
Copy link
Contributor

bulk88 commented Aug 18, 2025

This PR is using the wrong tool, or the tool with the highest risk of breakage/not accomplishing the goal of the commit message. The fool proof solutions how to reduce single step GUI IDE "noise" lines so they aren't stopped on by step 1 or step into, r listed above. Forceinline token has no guarantee it will be honored by any CC. A force inline in a force inline recursively? Wouldn't that be infinite recursion? No, the CC can refuse to do the inline to break the chain.

C preproc solutions work everywhere on ever CC always.

An infinite recursion C preprocessor TU would just blow out its 4GB/16GB address space limit and either the CC errors out or kernel ooms/segvs it. Force inline is only a tool for bad decisions made by a CC by looking machine code output and seeing it didn't do what you thought it would do.

There are regen.pl created latin1/utf8 static inline functions in re_*.cs that are 8KB-12kB large of amd64 optimized mach code by themselves. It's laughable to say a CC should obey the dev in those cases. CPP is best way to stop visual noise c debugger pointless single step lines. The symbol engine is ultimately following newlines characters of pre CPP .c files, For determining what is a line number to set a break point on. It's not really following the machine code or .i file or AST parse tree Grammer nodes.

@khwilliamson khwilliamson changed the title Force SvPV_helper to be always inline Request SvPV_helper to be always inline Aug 18, 2025
@tonycoz
Copy link
Contributor

tonycoz commented Aug 18, 2025

-O0 is doing what it says it does, I don't see any need to override it.

I'd be more inclined to produce more trivial Perl_SvPV* variants.

@khwilliamson
Copy link
Contributor Author

other macros historically have never ever gotten a macro in macro accident, but still got converted to PERL_STATIC_INLINE.

It appears to me you are suggesting that we should not be pro-active in avoiding potential problems before they bite the people that rely on our product working reliably. I'm confident that is not Perl's official policy

And it appears to me that you misunderstand the point of the conversion to inline functions. I'm unsure what you mean by "macro in macro". I'm guessing it is that if a macro expands to call another macro there is a possibility of name collision. That is indeed solved by making things into functions. But the reason we have been converting macros into functions is that functions work when called with an argument that is an expression with side effects, and macros may very well have hard-to-debug failures.

Your opinion about macros being the way to go is anathema to some others on this project. I myself think there is a place for both . But I strongly believe that any API macro should not evaluate any of its arguments more than once each. Last I looked we had something like a hundred that do.

I do not recall ever feeling the need to grouse about the assertions in debugging. And I think that considerations of how current IDEs work should be well down on our list of our priorities of what we do on this project.

The reason this particular function was created was because quite a few macros do portions of it, but some went down a maze of twisty little passages, and some went down a maze of little twisty passages. By having a single function that does the whole procedure, there is just one place to maintain. Our primary goal has to be correctness. And that implies correctness ongoing as the project evolves. Having this single function makes it easier to do this; going back to multiple macros doing portions of the implementation makes it more complex to maintain and more likely for bugs to slip in.

My experience with compilers is that they evaluate at compile time as much of all expressions that they can, and use the result as a starting point for runtime code generation. If I have an expression 1 + 1,my experience is the compiler uses 2 under -O0 instead of generating an actual addition instruction. Similarly if an expression evaluates to if (0) { ... }, my experience is that the compiler omits the dead code.

The various macros call this function with compile-time constants. If the compiler evaluates those at runtime it will find dead code and omit that.

All this commit does is to tell the compiler we want it to try hard to inline the function, and hence do constant folding at the various calls to it in the file. And that worked with a common compiler, gcc. If it doesn't work, correctness is not lost. We do that in other places too, ask for more than some compilers give, and have fallbacks to what is the best possible available behavior. I don't see the problem with it; if the compiler is capable of doing more, great; if not, we haven't lost anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants