Skip to content

Interesting performance issue counting commits #617

Closed
@campoy

Description

@campoy

I cloned https://github.com/kubernetes/kubernetes in order to count how many commits I can find.

$ git clone https://github.com/kubernetes/kubernetes
$ cd kubernetes
$ git log | grep "^commit " |  wc -l
72071

Counting the commits accessible from HEAD in this way takes around 2 seconds on my MacBook pro.

Next step is doing the same thing with gitbase.

$ time srcd sql "select count(*) from commits"
+----------+
| COUNT(*) |
+----------+
|    79991 |
+----------+
srcd sql "select count(*) from  commits;"  0.02s user 0.03s system 0% cpu 18.415 total

Lastly, I tried to see whether adding cores would help. Running on a GCP instance with 96 cores and way more RAM that we need, the analysis

+----------+
| COUNT(*) |
+----------+
|    79991 |
+----------+
1 row in set (22.90 sec)

It takes longer than before! I assumed it was before my laptop has an SSD, while this instance was using a HD ... so I tried storing the dataset (just Kubernetes) in RAM. The result was interesting ... as in it took longer than before!

+----------+
| COUNT(*) |
+----------+
|    79991 |
+----------+
1 row in set (23.72 sec)

I have no idea why this is, but it goes completely against my expectations.

Metadata

Metadata

Assignees

Labels

performancePerformance improvements

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions