Skip to content

Extremely slow with large repositories. #621

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AnalogFeelings opened this issue Mar 20, 2022 · 14 comments
Closed

Extremely slow with large repositories. #621

AnalogFeelings opened this issue Mar 20, 2022 · 14 comments
Labels
Milestone

Comments

@AnalogFeelings
Copy link

This program is very slow when running it on large repositories, for example ReactOS.

@spenserblack
Copy link
Collaborator

@o2sh Should we revisit this?

@spenserblack spenserblack added the enhancement New feature or request label Mar 24, 2022
@o2sh
Copy link
Owner

o2sh commented Mar 24, 2022

Sure, but after #211 and #309, I am running out of ideas. 😞

ping @CephalonRho @yoichi @HallerPatrick

@shuni64
Copy link
Contributor

shuni64 commented Mar 25, 2022

A lot of time still appears to be spent reading commit information in Repo::get_logs, most of which is only used by Repo::get_authors.

Reading commits in parallel seems like a way to speed this up a bit, but I'm not sure if this is possible with libgit2.
git-repository from gitoxide seems like a promising alternative.

Otherwise some form of caching could work, but I'm not sure if this is a good idea.

@o2sh
Copy link
Owner

o2sh commented Mar 25, 2022

gitoxide is indeed a promising alternative.
It might be an idea worth exploring even it means losing some features like mailmap.

@spenserblack
Copy link
Collaborator

mailmap is on their radar, at least. I guess we can open an issue to show them that there's interest for it 😆

If we do drop .mailmap support, reopening #447, we should probably make a release before we drop support. That way users that have used the .mailmap feature (#596) will have a release that's as up-to-date as possible before the breaking change.

@o2sh
Copy link
Owner

o2sh commented Mar 25, 2022

I created the issue GitoxideLabs/gitoxide#363.

If we do drop .mailmap support, reopening #447, we should probably make a release before we drop support. That way users that have used the .mailmap feature (#596) will have a release that's as up-to-date as possible before the breaking change.

That's a perfect plan 💯

@Byron
Copy link
Collaborator

Byron commented Mar 26, 2022

Reading commits in parallel seems like a way to speed this up a bit, but I'm not sure if this is possible with libgit2.
git-repository from gitoxide seems like a promising alternative.

It looks like major speedups are possible even on a single thread for commit graph traversal, for example I'd expect onefetch to go from ~19s on the v5.16 linux kernel checkout to something like 11s.

If more/all operations are done in parallel, like the syntax analysis, it should become as fast as the slowest of these operations, the commit graph traversal which clocks in at about ~7s.

I wouldn't know how to parallelize the commit graph traversal though - the only way I can imagine this to work is to traverse different branches on multiple threads. This usually comes at the overhead of avoiding them to do duplicate work which requires a parallel hashset (like dashmap) which will limit the amount of threads that are effective there. That also depends on the graph, a linear history with a single trunk can't be sped up at all. Anyway, it's sounds like an interesting task to implement high-performance parallel traversal, maybe gitoxide can provide one once all other options are exhausted here.

Screen Shot 2022-03-26 at 09 15 31

The above is my profiling run on the linux kernel v5.16. Most of it is the commit graph traversal, the spike towards the end is tokei, and there is about ~1s of releasing memory (which can and should probably be avoided with process::abort() or by leaking the values with mem::forget()).

All in all, I think switching to gitoxide and running the syntax analysis alone in parallel, along with process::abort() to avoid memory deallocation at the end, one should be able to get roughly 2x the speed on the linux kernel.

@o2sh
Copy link
Owner

o2sh commented Mar 26, 2022

Thanks a lot for your input @Byron, I've created two issues as a follow up #628 and #629.

@Byron
Copy link
Collaborator

Byron commented Mar 27, 2022

I couldn't resist to do a quick measurement on reactos which seems tame compared to the linux kernel. The numbers, however, are even more promising.

Screen Shot 2022-03-27 at 08 14 38

It seems that ~1.5s are spent on the commit graph traversal, a task which could be accomplished in ~0.4s with gitoxide.

Screen Shot 2022-03-27 at 08 15 09

Interestingly tokei finished in about ~0.5s so it appears that if both would run in parallel, one should finish the onefetch invocation in about 0.5s, down from ~1.5s . Can't wait to see this happen!

@Byron Byron added this to the v2.13.0 milestone Apr 6, 2022
@o2sh
Copy link
Owner

o2sh commented Jul 10, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@o2sh o2sh added the stale label Jul 10, 2022
@Byron
Copy link
Collaborator

Byron commented Jul 10, 2022

For completeness, here is the final values, before…

 /usr/bin/time -lp onefetch-pre-gitoxide
                 ++++++                    Sebastian Thiel ~ git version 2.32.1 (Apple Git-133)
              ++++++++++++                 ----------------------------------------------------
          ++++++++++++++++++++             Project: reactos (57 branches, 274 tags)
       ++++++++++++++++++++++++++          HEAD: 2ba6b09 (master, origin/master)
    ++++++++++++++++++++++++++++++++       Version: 0.4.14-release
 +++++++++++++************+++++++++++++    Created: 26 years ago
+++++++++++******************++++++++;;;   Languages:
+++++++++**********************++;;;;;;;              ● C (89.6 %) ● C++ (9.4 %)
++++++++*********++++++******;;;;;;;;;;;              ● CMake (0.6 %) ● Python (0.2 %)
+++++++********++++++++++**;;;;;;;;;;;;;              ● JavaScript (0.1 %) ● HTML (0.0 %)
+++++++*******+++++++++;;;;;;;;;;;;;;;;;              ● Other (0.1 %)
+++++++******+++++++;;;;;;;;;;;;;;;;;;;;   Authors: 8% Amine Khaldi 6777
+++++++*******+++:::::;;;;;;;;;;;;;;;;;;            7% Timo Kreuzer 5536
+++++++********::::::::::**;;;;;;;;;;;;;            6% Eric Kohl 4808
++++++++*********::::::******;;;;;;;;;;;   Last change: 3 months ago
++++++:::**********************::;;;;;;;   Contributors: 361
+++::::::::******************::::::::;;;   Repo: https://github.com/reactos/reactos
 :::::::::::::************:::::::::::::    Commits: 81668
    ::::::::::::::::::::::::::::::::       Lines of code: 4900543
       ::::::::::::::::::::::::::          Size: 401.85 MiB (26712 files)
          ::::::::::::::::::::             License: BSD-2-Clause-Views, GPL-2.0-only, GPL-3.0-only, LGPL-2.1-only, LGPL-3.0-only
              ::::::::::::
                 ::::::

real 1.61
user 2.56
sys 0.78
           196935680  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               12570  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   7  voluntary context switches
               10044  involuntary context switches
         32654906025  instructions retired
          9907271625  cycles elapsed
           113985408  peak memory footprint

…in 1.61s with a 1.13 GB memory footprint, and after…

❯ /usr/bin/time -lp onefetch
                 ++++++                    Sebastian Thiel ~ git version 2.32.1 (Apple Git-133)
              ++++++++++++                 ----------------------------------------------------
          ++++++++++++++++++++             Project: reactos (57 branches, 274 tags)
       ++++++++++++++++++++++++++          HEAD: 2ba6b097543 (master, origin/master)
    ++++++++++++++++++++++++++++++++       Version: 0.4.14-release
 +++++++++++++************+++++++++++++    Created: 26 years ago
+++++++++++******************++++++++;;;   Languages:
+++++++++**********************++;;;;;;;              ● C (89.6 %) ● C++ (9.4 %)
++++++++*********++++++******;;;;;;;;;;;              ● CMake (0.6 %) ● Python (0.2 %)
+++++++********++++++++++**;;;;;;;;;;;;;              ● JavaScript (0.1 %) ● HTML (0.0 %)
+++++++*******+++++++++;;;;;;;;;;;;;;;;;              ● Other (0.1 %)
+++++++******+++++++;;;;;;;;;;;;;;;;;;;;   Authors: 8% Amine Khaldi 6777
+++++++*******+++:::::;;;;;;;;;;;;;;;;;;            7% Timo Kreuzer 5536
+++++++********::::::::::**;;;;;;;;;;;;;            6% Eric Kohl 4808
++++++++*********::::::******;;;;;;;;;;;   Last change: 3 months ago
++++++:::**********************::;;;;;;;   Contributors: 361
+++::::::::******************::::::::;;;   Repo: https://github.com/reactos/reactos
 :::::::::::::************:::::::::::::    Commits: 81668
    ::::::::::::::::::::::::::::::::       Lines of code: 4900543
       ::::::::::::::::::::::::::          Size: 401.85 MiB (26712 files)
          ::::::::::::::::::::             License: BSD-2-Clause-Views, GPL-2.0-only, GPL-3.0-only, LGPL-2.1-only, LGPL-3.0-only
              ::::::::::::
                 ::::::

real 0.62
user 2.23
sys 0.77
           127516672  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                8541  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   8  voluntary context switches
               10711  involuntary context switches
         26983149694  instructions retired
          8683590898  cycles elapsed
            43878208  peak memory footprint

…in 0.62s with a 438 MB memory footprint.

@o2sh
Copy link
Owner

o2sh commented Jul 10, 2022

@Byron 😮 ❤️

BTW, do you have an ETA for:

  • config 'user.name', remote information
  • git status/pending changes

tracking issue --> GitoxideLabs/gitoxide#364

@Byron
Copy link
Collaborator

Byron commented Jul 10, 2022

config 'user.name', remote information

This will probably be available this month as I am currently working hard to get git-config a big step closer to 1.0. All the building blocks are there already, so this can happen earlier with a few more lines of code in onefetch.

git status/pending changes

This one is further away, as this year is entirely dedicated to cloning related issues and integration into cargo, which doesn't yet involved worktree status. That said, if it goes well I will use the extra time to get it ready earlier.

@o2sh o2sh removed the stale label Jul 17, 2022
@o2sh
Copy link
Owner

o2sh commented Oct 16, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@o2sh o2sh added the stale label Oct 16, 2022
@o2sh o2sh closed this as not planned Won't fix, can't repro, duplicate, stale Oct 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants