Skip to content
This repository was archived by the owner on Feb 13, 2025. It is now read-only.

select.select() consuming excessive process time on Ubuntu & MacOS #234

Open
adde1 opened this issue Jun 2, 2020 · 8 comments
Open

select.select() consuming excessive process time on Ubuntu & MacOS #234

adde1 opened this issue Jun 2, 2020 · 8 comments

Comments

@adde1
Copy link

adde1 commented Jun 2, 2020

Hi,

I have been using stackless for some time, but now I am stuck and need to ask for help. In short, the call to select.select() consumes excessive processing time (equal to the wall clock) in some scenarios. It seems it happens when the system get busy, but I have not been able to boil it down better than that.

The behaviour is not consistent across platforms and versions of Stackless. When I started using Stackless back around 2.7.2 I did not have this performance problem. I first got problems on MacOS around 2.7.9 but since I was anyway about to finish up my then big project I just switched to working on Ubuntu. But now I get similar symptoms on Ubuntu as well.

The core loop of my project has not changed significantly since the start. I also don't know what I could have done wrong on Python side to have select.select behave almost like if it was implemented with a loop (but only in some cases).

I would like to move onto Conda because for my new project I need numpy, scipy, and pygame at the same time (as well as FORTRAN compiler) but with the current issues I am kind of stuck.

The behaviour I get is as follows:

Ubuntu 12, 14, 16 - Stackless built locally - Intel 2500K

  • Unfortunately the machine died some time ago, and I don't remember the exact Stackless version (probably 2.7.2).
  • Running project test suite (multi-threaded): good performance, moderate CPU load
  • Running "empty loop" (framework only): low CPU load
  • Running "zita" (pygame application + framework): good performance, moderate CPU load that disappeared when idle

Ubuntu 18 - Conda environment - Ryzen 3700

  • Python 2.7.16 Stackless 3.1b3 060516 |Anaconda, Inc.| (default, Mar 23 2019, 22:01:13)
    [GCC 7.3.0] on linux2
  • Running project test suite (multi-threaded): bad performance, high CPU load
  • Running "empty loop" (framework only): low CPU load
  • Running "zita" (pygame application + framework): decent performance, high % CPU load that persists when idle

Ubuntu 18 - Stackless built locally - Ryzen 3700

  • Python 2.7.16 Stackless 3.1b3 060516 (default, Aug 17 2019, 14:48:39)
    [GCC 7.4.0] on linux2
  • Running project test suite (multi-threaded): bad performance, high CPU load
  • Running "empty loop" (framework only): low CPU load
  • Running "zita" (pygame application + framework): good performance, moderate CPU load that disappears when idle

MacOS - Conda environment - Intel Core i5 (c:a 2013)

  • Python 2.7.15 Stackless 3.1b3 060516 |Anaconda, Inc.| (default, Oct 5 2018, 08:25:48)
    [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
  • Running project test suite (multi-threaded): bad performance, high CPU load
  • Running "empty loop" (framework only): low CPU load
  • Running "zita" (pygame application + framework): poor graphics performance, high CPU load that disappears when idle

MacOS - Downloaded installer - Intel Core i5 (c:a 2013)

  • Python 2.7.9 Stackless 3.1b3 060516 (default, Oct 22 2016, 20:25:12)
    [GCC 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31)] on darwin
  • Running project test suite (multi-threaded): bad performance, high CPU load
  • Running "empty loop" (framework only): low CPU load
  • Running "zita" (pygame application + framework): poor graphics performance, high CPU load that disappears when idle

Sorry for the vague error report, but I just don't have a lot to go on. Any help will be appreciated.

Thank you in advance and best regards,

Andreas

@kristjanvalur
Copy link
Collaborator

Hi there. So, you are using the plain old select.select(), I gather, and no special stackless features? From your description the problems seems limited to Ubuntu 18 on conda, with which I am not familiar. Why do you think this problem is peculiar to Stackless? Does regular python show the same problem?

@adde1
Copy link
Author

adde1 commented Jun 5, 2020

Hi Kristjan,

I am using select.select() to switch between sockets (for interprocess/intermachine communication) and Stackless channels/tasklets. There is also a scheduling function so I rely on the timeout of select.select() for it to wake up. My guess is that you would find something similar at the core of any framework supporting inter process communication and concurrency.

The framework makes a fair amount of use of tasklets and cooperative scheduling, enough so that running on standard python is not an option and migrating to a thread based approach would be a fairly steep investment.

Of course I cannot rule out that the problem is in Ubuntu, but given the fundamental nature of select.select() and that I see the same issues on both MacOS and Linux I think it is a less likely source.

Similarly with Python, I was assuming that standard Python was implementing a fairly straight call to the underlying select.select() and there should not be many sources of bugs here. But I have also not looked at the Python implementation (and to be honest it is probably beyond my skills in C anyway).

Two quick questions for trying to pin down the problem:

  1. Does the Stackless implementation do anything special that in any way affects the select.select() statement in Python?
  2. Is there any other more modern way to incorporate sockets with stackless for concurrency that does not include a call to select.select()?
  3. Back in the days, I remember seeing a reference implementation of the socket module for Stackless. Is that still around, or was that incorporated into the Stackless distribution?

Thank you in advance and best regards :-)

And oh, I used to maintain a Windows dev environment too that unfortunately died some time ago. I'll see if I can resurrect that and if the problem exists on Windows or not.

@kristjanvalur
Copy link
Collaborator

kristjanvalur commented Jun 5, 2020 via email

@kristjanvalur
Copy link
Collaborator

kristjanvalur commented Jun 5, 2020 via email

@adde1
Copy link
Author

adde1 commented Jun 5, 2020

Thank you Kristjan,

Thank you for the confirmation that Stackless does not modify the select.select() call!

I'll try switching to poll, and dig around a bit more.

I'll keep this issue open for a little while more, I'll report back my findings.

Again, thank you :-)

@adde1
Copy link
Author

adde1 commented Jun 7, 2020

Hi,

I have now:

  1. Tested the old code on Windows 10, with Stackless from conda. It works perfectly (like it used to on the other platforms as well)
  2. Instrumented the code to see that there was no bug in the delta-time calculation. I even get the same high CPU load when I lock the timeout to 0.5 seconds (resulting in 2 iterations per second when there is no communication).
  3. Tested to replace select.select() with select.poll(). It did not make any noticable difference - the problem still persists.

The core loop looks pretty much exaclty like Kristjan describes. And it has been working for years (until recently).

The only lead I have is that with the same version of Stackless (Python 2.7.16 Stackless 3.1b3 060516) I get different result depending on if I use the build provided by conda, or if I use the build I built locally. They were built with different compilers (GCC 7.3.0 vs. GCC 7.4.0) and perhaps some differences in the dependencies that got linked in. But I don't know what to make of that.

If anyone has any thought on what to try, please let me know.

Cheers,

Andreas

@kristjanvalur
Copy link
Collaborator

kristjanvalur commented Jun 7, 2020 via email

@adde1
Copy link
Author

adde1 commented Oct 30, 2021

Hi,

After doing a bit of other work in C, I mustered up the courage to dig into the implementation of selectmodule.c

At least on Debian, the problem with the excessive load was solved when I commented out Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS. So I believe the problem is within Python and not the operating system.

I have not (yet) tried to track down why Python seem to go into some infinite loop when the other threads are allowed. I am worried this may be over my head. But we will see...

Cheers

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants