Skip to content

Perf: Traverse commit history instead of reading the packfile #698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 18, 2019

Conversation

ajnavarro
Copy link
Contributor

To avoid show orphan commits and improve performance on commits table, instead of reading all the objects from each repository and discard commits, we can execute from go-git a git log --all command and traverse commit history, avoiding read a lot of data from disk.

It closes #617

@ajnavarro ajnavarro force-pushed the perf/list-only-reachable-objects branch 4 times, most recently from 301f3ca to fd8ac0b Compare February 13, 2019 18:15
i.iter, err = i.repo.CommitObjects()
i.iter, err =
i.repo.Log(&git.LogOptions{
All: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should force order by time (WDYT)?
By default pre-order will be used, which I think works slightly different than native git implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but here we don't want to mimic 100% git log output, we want to reduce unnecessary disk reads. Ordering commits will not give us any performance improvement.

Copy link
Contributor

@kuba-- kuba-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix tests

Signed-off-by: Antonio Jesus Navarro Perez <[email protected]>
To avoid show orphan commits and improve performance on commits table, instead of read all the objects from each repository and discard commits, we can execute from go-git a git log --all command and traverse commit history, avoiding read a lot of data from disk.

Signed-off-by: Antonio Jesus Navarro Perez <[email protected]>
@ajnavarro ajnavarro force-pushed the perf/list-only-reachable-objects branch from fd8ac0b to 9748ee3 Compare February 15, 2019 11:23
@ajnavarro
Copy link
Contributor Author

@kuba-- @erizocosmico tests finally passing

@ajnavarro
Copy link
Contributor Author

Some numbers using gitbase-regression with complexity=2:

ID      | remote:master   | local:perf/list-only-reachable-objects 
 query00 | 4.012231112s    | 1.482114222s 
 query01 | 115.760157ms    | 114.007921ms 
 query02 | 1.022439327s    | 1.008603787s 
 query03 | 937.115316ms    | 931.223158ms 
 query04 | 3.701316738s    | 1.361501136s 
 query05 | 1.377949809s    | 1.298451385s 
 query06 | 11m14.10874939s | 10m27.993923365s 
 query07 | 4.14140327s     | 1.388045549s 
 query08 | 4.193708919s    | 1.375377416s 
 query09 | 4.168560944s    | 1.395838104s 
 query10 | 6.991478009s    | 3.85651492s 
 query11 | 10m12.68998881s | 10m6.052184916s 
 query12 | 4.961075222s    | 4.957561797s 
 query13 | 116.180972ms    | 110.658067ms 

@ajnavarro ajnavarro merged commit 5acf0ba into src-d:master Feb 18, 2019
@ajnavarro ajnavarro deleted the perf/list-only-reachable-objects branch February 18, 2019 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Interesting performance issue counting commits
3 participants