Interesting performance issue counting commits

I cloned https://github.com/kubernetes/kubernetes in order to count how many commits I can find.

```bash
$ git clone https://github.com/kubernetes/kubernetes
$ cd kubernetes
$ git log | grep "^commit " |  wc -l
72071
```

Counting the commits accessible from `HEAD` in this way takes around 2 seconds on my MacBook pro.

Next step is doing the same thing with gitbase.

```
$ time srcd sql "select count(*) from commits"
+----------+
| COUNT(*) |
+----------+
|    79991 |
+----------+
srcd sql "select count(*) from  commits;"  0.02s user 0.03s system 0% cpu 18.415 total
```

Lastly, I tried to see whether adding cores would help. Running on a GCP instance with 96 cores and way more RAM that we need, the analysis 

```
+----------+
| COUNT(*) |
+----------+
|    79991 |
+----------+
1 row in set (22.90 sec)
```

It takes longer than before!  I assumed it was before my laptop has an SSD, while this instance was using a HD ... so I tried storing the dataset (just Kubernetes) in RAM. The result was interesting ... as in it took longer than before!

```
+----------+
| COUNT(*) |
+----------+
|    79991 |
+----------+
1 row in set (23.72 sec)
```

I have no idea why this is, but it goes completely against my expectations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Interesting performance issue counting commits #617

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Interesting performance issue counting commits #617

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions