-
Notifications
You must be signed in to change notification settings - Fork 201
Description
Problem
CRC-32 is too small for 170GB checkpoint files (as noted in PR #1944 when file size was closer to 60GB).
The Proposed Solution
I suggested replacing CRC-32 and adding concurrency in a comment in checkpointer.go:
flow-go/ledger/complete/wal/checkpointer.go
Lines 294 to 295 in c26a026
| // TODO: evaluate alternatives to CRC32 since checkpoint file is many GB in size. | |
| // TODO: add concurrency if the performance gains are enough to offset complexity. |
We can replace CRC-32 with a 256-bit hash and use concurrency to reduce loss of speed. Go provides SHA-256 and SHA-512/256. BLAKE2s and BLAKE2 (512-bit) are popular and reasonably fast on most modern CPUs.
BLAKE3 is very fast but its implementation would be from 3rd party and should be extensively tested with huge file sizes.
If cryptographic hashes are too slow, then noncryptographic hashes with smaller digest size is another possibility. However, best practice recommendations for file verification since 2012 is to use a 256-bit or larger hash such as SHA-2 or SHA3.
Text of PR #1944 has more context. Among other things, it mentioned possibility of replacing CRC-32 with BLAKE2 or BLAKE2s after adding concurrency.

