-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Compile libpython with -fno-semantic-interposition #83161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The Fedora packaging has been modified to compile libpython with -fno-semantic-interposition flag: it makes Python up to 1.3x faster without having to touch any line of the C code! See pyperformance results: The main drawback is that -fno-semantic-interposition prevents to override Python symbols using a custom library preloaded by LD_PRELOAD. For example, override PyErr_Occurred() function. We (authors of the Fedora change) failed to find any use case for LD_PRELOAD. To be honest, I found *one* user in the last 10 years who used LD_PRELOAD to track memory allocations in Python 2.7. This use case is no longer relevant in Python 3 with PEP-445 which provides a supported C API to override Python memory allocators or to install hooks on Python memory allocators. Moreover, tracemalloc is a nice way to track memory allocations. Is there anyone aware of any special use of LD_PRELOAD for libpython? To be clear: -fno-semantic-interposition only impacts libpython. All other libraries still respect LD_PRELOAD. For example, it is still possible to override glibc malloc/free. Why -fno-semantic-interposition makes Python faster? There are multiple reasons. For of all, libpython makes a lot of function calls to libpython. Like really a lot, especially in the hot code paths. Without -fno-semantic-interposition, function calls to libpython requires to get through "interposition": for example "Procedure Linkage Table" (PLT) indirection on Linux. It prevents function inlining which has a major impact on performance (missed optimization). In short, even with PGO and LTO, libpython function calls have two performance "penalities":
I'm comparing Python performance of "statically linked Python" (Debian/Ubuntu choice: don't use ./configure --enable-shared, python is not linked to libpython) to "dynamically linked Python" (Fedora choice: use "./configure --enable-shared", python is dynamically linked to libpython). With -fno-semantic-interposition, function calls are direct and can be inlined when appropriate. You don't have to trust me, look at pyperformance benchmark results ;-) When using ./configure --enable-shared (libpython), the "python" binary is exactly one function call and that's all: int main(int argc, char **argv)
{ return Py_BytesMain(argc, argv); } So 100% of the time is only spent in libpython. For a longer rationale, see the accepted Fedora change: |
Maybe we need to offer a way to *opt out* from -fno-semantic-interposition. For example, ./configure --with-interposition. The default would be --without-interposition. |
I have seen people using LD_PRELOAD to interpose some auditing functions that can modify the actual call into libpython, or to interpose faster versions of some functions or to collect metrics (although there are better ways). If we do this by default, once functions will be inlined these use cases will be broken. |
Could these user use a "./configure --with-interposition --enable-shared" build? |
Sure, but the problem is the default value, no? Maybe it should only be default when using --with-optimizations |
I think it will add to the complexity of the --with-optimizations flag which already implies PGO and LTO. Maybe an opt-in flag would be better IMHO. |
That is why I was suggesting it: --with-optimizations for me means "activate everything that you can to make python faster". |
Just for a quick datapoint: llvm/clang do this by default and you need an explicit It seems to me that the performance gains here really outweigh any weird usage of LD_PRELOAD. |
I am very convinced of this assertion, but other users could not be, I think the discussion is how to provide/activate the option in the less intrusive way and without breaking too many use cases. To be honest, I think it would be very rare for users to use LD_PRELOAD in this way, so I am fine if we activate it by default. But I still think it would be good to discuss these cases and take them into consideration :) |
In case of malloc, every memory allocating code need to use malloc/calloc/realloc. This is official and the only way to allocate a memory. But we do not guarantee that Python core uses only public C API like PyErr_Occurred(). It can use more low-level and efficient but less safer C API internally. It can replace the function with a macro which access internal structures directly (for compiling the core only). And this is actually the case. Overridding the public C API functions not always has an effect on the core. So I think that adding -fno-semantic-interposition will likely not break many things which were not broken before. But this should be discussed on Python-Dev. I am sure some C API functions are purposed to be overridden. |
To confirm what you said: if we take the specific example of PyErr_Occurred(), I recently added a new _PyErr_Occurred() function which is declared as a static inline function. _PyErr_Occurred() cannot be overriden. static inline PyObject* _PyErr_Occurred(PyThreadState *tstate)
{
assert(tstate != NULL);
return tstate->curexc_type;
} |
Pablo:
IMHO if someone has to go so far into "hacking" Python, they should recompile Python with specific options. I'm not sure that using LD_PRELOAD to get "faster versions of some functions" is the best approach in term of performance, but I expect it to be convenient :-) Charalampos:
It doesn't enable LTO, only PGO :-) We had to disable LTO because of multiple compiler bugs last years. Serhiy:
Is it a theorical use case, or are you aware of such use case being currently used in the wild? Ammar Askar:
Oh, that's really interesting, thanks! |
We wrote an article about -fno-semantic-interposition flag that we use with GCC on RHEL8 and Fedora: In short, the flag allows the compiler to inline code and so make further optimizations, when Python is built with --enable-shared. |
Yes this should become part of --with-optimizations when building on a platform using a compiler that (a) supports it and (b) where it matters. If this is only relevant on --enable-shared builds (not the default), i'd assume also make it conditional on that. I never use --enable-shared builds myself. |
and to echo others: Do not worry about LD_PRELOAD users trying to override internals. That is not a supported use case. It is always a hack. anyone using it knows this. |
Hey Victor, should we try to land this in Python 3.10? Given that no one has brought up any big concerns aside from LD_PRELOAD based hacks and how clang has already had this as the default I think it's relatively safe to make a default for with-optimizations. |
+1 |
Victor is on vacation for some weeks, so I am creating a PR to push this forward. |
I was too eager in reviewing this :( I'm researching how to make this conditional in autotools. |
Since Fedora and RHEL build Python with -fno-semantic-interposition, we did not get any user bug report about the LD_PRELOAD use case. IMO we can safely consider that no user rely on LD_PRELOAD to override libpython symbols. Thanks for implementing the feature Pablo and Petr! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: