You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
list-objects: don't queue root trees unless revs->tree_objects is set
When traverse_commit_list() processes each commit, it queues the
commit's root tree in the pending array. Then, after all commits are
processed, it calls traverse_trees_and_blobs() to walk over the pending
list, calling process_tree() on each. But if revs->tree_objects is not
set, process_tree() just exists immediately!
We can save ourselves some work by not even bothering to queue these
trees in the first place. There are a few subtle points to make:
- we also detect commits with a NULL tree pointer here. But this isn't
an interesting check for broken commits, since the lookup_tree()
we'd have done during commit parsing doesn't actually check that we
have the tree on disk. So we're not losing any robustness.
- besides queueing, we also set the NOT_USER_GIVEN flag on the tree
object. This is used by the traverse_commit_list_filtered() variant.
But if we're not exploring trees, then we won't actually care about
this flag, which is used only inside process_tree() code-paths.
- queueing trees eventually leads to us queueing blobs, too. But we
don't need to check revs->blob_objects here. Even in the current
code, we still wouldn't find those blobs, because we'd never open up
the tree objects to list their contents.
- the user-visible impact to the caller is minimal. The pending trees
are all cleared by the time the function returns anyway, by
traverse_trees_and_blobs(). We do call a show_commit() callback,
which technically could be looking at revs->pending during the
callback. But it seems like a rather unlikely thing to do (if you
want the tree of the current commit, then accessing the tree struct
member is a lot simpler).
So this should be safe to do. Let's look at the benefits:
[before]
Benchmark #1: git -C linux rev-list HEAD >/dev/null
Time (mean ± σ): 7.651 s ± 0.021 s [User: 7.399 s, System: 0.252 s]
Range (min … max): 7.607 s … 7.683 s 10 runs
[after]
Benchmark #1: git -C linux rev-list HEAD >/dev/null
Time (mean ± σ): 7.593 s ± 0.023 s [User: 7.329 s, System: 0.264 s]
Range (min … max): 7.565 s … 7.634 s 10 runs
Not too impressive, but then we're really just avoiding sticking a
pointer into a growable array. But still, I'll take a free 0.75%
speedup.
Let's try it after running "git commit-graph write":
[before]
Benchmark #1: git -C linux rev-list HEAD >/dev/null
Time (mean ± σ): 1.458 s ± 0.011 s [User: 1.199 s, System: 0.259 s]
Range (min … max): 1.447 s … 1.481 s 10 runs
[after]
Benchmark #1: git -C linux rev-list HEAD >/dev/null
Time (mean ± σ): 1.126 s ± 0.023 s [User: 896.5 ms, System: 229.0 ms]
Range (min … max): 1.106 s … 1.181 s 10 runs
Now that's more like it. We saved over 22% of the total time. Part of
that is because the runtime is shorter overall, but the absolute
improvement is also much larger. What's going on?
When we fill in a commit struct using the commit graph, we don't bother
to set the tree pointer, and instead lazy-load it when somebody calls
get_commit_tree(). So we're not only skipping the pointer write to the
pending queue, but we're skipping the lazy-load of the tree entirely.
Signed-off-by: Jeff King <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
0 commit comments