Closed
Description
I cloned https://github.com/kubernetes/kubernetes in order to count how many commits I can find.
$ git clone https://github.com/kubernetes/kubernetes
$ cd kubernetes
$ git log | grep "^commit " | wc -l
72071
Counting the commits accessible from HEAD
in this way takes around 2 seconds on my MacBook pro.
Next step is doing the same thing with gitbase.
$ time srcd sql "select count(*) from commits"
+----------+
| COUNT(*) |
+----------+
| 79991 |
+----------+
srcd sql "select count(*) from commits;" 0.02s user 0.03s system 0% cpu 18.415 total
Lastly, I tried to see whether adding cores would help. Running on a GCP instance with 96 cores and way more RAM that we need, the analysis
+----------+
| COUNT(*) |
+----------+
| 79991 |
+----------+
1 row in set (22.90 sec)
It takes longer than before! I assumed it was before my laptop has an SSD, while this instance was using a HD ... so I tried storing the dataset (just Kubernetes) in RAM. The result was interesting ... as in it took longer than before!
+----------+
| COUNT(*) |
+----------+
| 79991 |
+----------+
1 row in set (23.72 sec)
I have no idea why this is, but it goes completely against my expectations.