-
-
Notifications
You must be signed in to change notification settings - Fork 32.7k
Description
Bug report
Bug description:
Hello, @jaraco! The following commit 019143f slows down ConfigParser.read()
from 2 to 6 times.
_Line
is created many times when reading a file and the same regular expression is compiled for each object _strip_inline
. This amounts to a 60% speed loss. The simplest solution would be to add a __call__
method and preferably create a _Line(object)
when initializing RawConfigParser
with an empty string value. Or abandon the _Line
object altogether.
Another 40% of the performance loss comes from using cached_property
for _Line.clean
(10%), writing to _ReadState
attributes instead of local variables (15%), and breaking up the previous giant loop into new _handle
functions (15%).
I discovered this circumstance when writing an update to webbrowser. I needed to parse hundreds of small .desktop files. At first I didn't understand the reason for the increase in execution time between different distributions, so I developed 3 versions of the program:
(M)
Multiprocessing
(O)
Original Routine
(T)
Threading
And measured their performance using timeit:
Linux arch 6.10.6-arch1-1
Parsing 186 files
Python 3.11.11
(M) 5 loops, best of 5: 62.4 msec per loop
(O) 2 loops, best of 5: 110 msec per loop
(T) 2 loops, best of 5: 130 msec per loop
Python 3.12.8
(M) 5 loops, best of 5: 66.5 msec per loop
(O) 2 loops, best of 5: 118 msec per loop
(T) 2 loops, best of 5: 140 msec per loop
Python 3.13.1
(M) 2 loops, best of 5: 125 msec per loop
(O) 1 loop, best of 5: 222 msec per loop
(T) 1 loop, best of 5: 248 msec per loop
Python 3.13.1 Free-threaded
(M) 1 loop, best of 5: 331 msec per loop
(O) 1 loop, best of 5: 648 msec per loop
(T) 1 loop, best of 5: 340 msec per loop
As you can see, performance regression is 2-6 times between 3.11 and 3.13. Isolated comparison of the new and old configparser, which verifies the slowdown of free-threading by 6 times:
Python 3.13t (Old)
(M) 10 loops, best of 5: 26.7 msec per loop
(O) 5 loops, best of 5: 59.4 msec per loop
(T) 10 loops, best of 5: 26 msec per loop
Python 3.13t (New)
(M) 2 loops, best of 5: 137 msec per loop
(O) 1 loop, best of 5: 361 msec per loop
(T) 2 loops, best of 5: 125 msec per loop
I also attach a small reproducible test, just a module calling read():
import configparser
files = [
"python3.13.desktop",
"python3.10.desktop",
"htop.desktop",
"byobu.desktop",
"com.gexperts.Tilix.desktop",
"snap-handle-link.desktop",
"io.snapcraft.SessionAgent.desktop",
"remote-viewer.desktop",
"python3.12.desktop",
"google-chrome.desktop",
"vim.desktop",
"python3.11.desktop",
"virt-manager.desktop",
"info.desktop",
"ubuntu-desktop-installer_ubuntu-desktop-installer.desktop",
"firefox_firefox.desktop",
]
def main() -> None:
parser = configparser.ConfigParser(interpolation=None)
for shortcut in files:
try:
parser.clear()
if not parser.read(shortcut, encoding="utf-8"):
continue
except (UnicodeDecodeError, configparser.Error):
continue
if __name__ == "__main__":
main()
Archive with the above mentioned .desktop files:
shortcuts.zip
And a program for generating your own .desktop paths on Linux/BSD:
import glob
import os
XDG_DATA_HOME = os.environ.get(
"XDG_DATA_HOME", os.path.expanduser("~/.local/share")
)
XDG_DATA_DIRS = os.environ.get(
"XDG_DATA_DIRS", "/usr/local/share/:/usr/share/"
)
XDG_DATA_DIRS = XDG_DATA_DIRS.split(os.pathsep)
def main() -> list[str]:
files = []
for appdata in (XDG_DATA_HOME, *XDG_DATA_DIRS):
shortcuts = os.path.join(appdata, "applications")
if not os.path.isdir(shortcuts):
continue
shortcuts = os.path.join(shortcuts, "**", "*.desktop")
files.extend(glob.iglob(shortcuts, recursive=True))
return files
if __name__ == "__main__":
print(main())
Just run this example with different interpreters and you will see the difference:
$ python3.11 -m timeit -s "import nixconfig" "nixconfig.main()"
50 loops, best of 5: 5.27 msec per loop
$ python3.12 -m timeit -s "import nixconfig" "nixconfig.main()"
50 loops, best of 5: 5.33 msec per loop
$ python3.13 -m timeit -s "import nixconfig" "nixconfig.main()"
20 loops, best of 5: 11.2 msec per loop
At this point I leave the solution to this problem to you, as I have no architectural vision of configparser. However, I am willing to offer my help in proposing solutions for Pull Request.
CPython versions tested on:
3.11, 3.12, 3.13
Operating systems tested on:
Linux