-
Notifications
You must be signed in to change notification settings - Fork 20
fix: Consider all transactions in chunk hash #62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughdaChunkV7.Hash() now includes each block's 8-byte big-endian block number and every transaction's TxHash (validated as 32 bytes) when assembling data for the final Keccak256 digest; invalid TxHash lengths return an error. The v7 chunk hash is retained only as a unique identifier, not used by protocol. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant daChunkV7 as daChunkV7.Hash()
participant Blocks as blocks[]
participant Keccak as Keccak256
Note over daChunkV7: v7 flow — include block numbers + all tx hashes
Caller->>daChunkV7: Hash()
daChunkV7->>Blocks: iterate blocks
loop for each block
daChunkV7->>daChunkV7: append 8-byte BE block number
daChunkV7->>Blocks: iterate txData in block
loop for each txData
daChunkV7->>daChunkV7: decode txData.TxHash
alt TxHash length == 32
daChunkV7->>daChunkV7: append txHash to dataBytes
else invalid length
daChunkV7-->>Caller: return error
end
end
end
daChunkV7->>Keccak: hash(dataBytes)
Keccak-->>daChunkV7: digest
daChunkV7-->>Caller: digest
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes chunk hash collisions in v7 and v8 chunk hashing by updating the hash function to include all transaction types and block numbers. The collision issue occurred on mainnet when consecutive blocks with nearly identical headers resulted in identical chunk hashes.
- Updates chunk hash computation to include block numbers and all transaction types (not just L1 messages)
- Adds test coverage to verify chunk hash uniqueness across different block numbers
- Improves hash collision resistance for rollup-relayer's internal bookkeeping
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
encoding/codecv7_types.go | Updates Hash() method to include block numbers and all transaction hashes in computation |
encoding/codecv7_test.go | Adds TestChunkHashUnique to verify hash uniqueness between chunks with different block numbers |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
encoding/codecv7_types.go (3)
452-480
: Hash construction: consider streaming Keccak or at least validating blocks/txs shape and improving error context.The current approach builds a large
dataBytes
slice, which can be costly for big chunks. It also doesn’t guard againstlen(blocks) != len(transactions)
and emits a low-signal error ("unexpected hash"
). Suggest:
- Stream hashing to avoid holding all bytes in memory.
- Validate blocks/transactions list lengths match up-front.
- Emit actionable errors, including block/tx indices and actual byte length found.
Apply this refactor (body of Hash plus better errors):
@@ -func (c *daChunkV7) Hash() (common.Hash, error) { - var dataBytes []byte - - // concatenate block contexts - for _, block := range c.blocks { - // append block number - var tmp [8]byte - binary.BigEndian.PutUint64(tmp[:], block.Number()) - dataBytes = append(dataBytes, tmp[:]...) - - // append encoded block context - encodedBlock := block.Encode() - dataBytes = append(dataBytes, encodedBlock...) - } - - // concatenate tx hashes - for _, blockTxs := range c.transactions { - for _, txData := range blockTxs { - hashBytes := common.FromHex(txData.TxHash) - if len(hashBytes) != common.HashLength { - return common.Hash{}, fmt.Errorf("unexpected hash: %s", txData.TxHash) - } - dataBytes = append(dataBytes, hashBytes...) - } - } - - hash := crypto.Keccak256Hash(dataBytes) - return hash, nil -} +func (c *daChunkV7) Hash() (common.Hash, error) { + // Sanity check: the outer dims should match (blocks to tx-lists). + if len(c.transactions) != len(c.blocks) { + return common.Hash{}, fmt.Errorf("blocks/txs length mismatch: blocks=%d txLists=%d", len(c.blocks), len(c.transactions)) + } + + // Stream to Keccak to avoid large intermediate allocations. + hasher := sha3.NewLegacyKeccak256() + + // Blocks + var tmp [8]byte + for _, block := range c.blocks { + binary.BigEndian.PutUint64(tmp[:], block.Number()) + _, _ = hasher.Write(tmp[:]) + _, _ = hasher.Write(block.Encode()) + } + + // Transactions + for bi, blockTxs := range c.transactions { + for ti, txData := range blockTxs { + hashBytes := common.FromHex(txData.TxHash) + if len(hashBytes) != common.HashLength { + return common.Hash{}, fmt.Errorf("invalid tx hash length (block %d, tx %d): got %d bytes, want %d (%q)", + bi, ti, len(hashBytes), common.HashLength, txData.TxHash) + } + _, _ = hasher.Write(hashBytes) + } + } + + sum := hasher.Sum(nil) + return common.BytesToHash(sum), nil +}And add the import:
@@ "github.com/scroll-tech/go-ethereum/crypto" "github.com/scroll-tech/go-ethereum/crypto/kzg4844" + "golang.org/x/crypto/sha3"
467-476
: Decode and validate tx hashes: good strictness; add indices to error for debuggability.Parsing via
common.FromHex
and enforcing exactly 32 bytes is robust. Improve the error message with block/tx indices for faster triage.Minimal tweak:
- if len(hashBytes) != common.HashLength { - return common.Hash{}, fmt.Errorf("unexpected hash: %s", txData.TxHash) - } + if len(hashBytes) != common.HashLength { + return common.Hash{}, fmt.Errorf("invalid tx hash length (block %d, tx %d): got %d bytes, want %d (%q)", + bi, ti, len(hashBytes), common.HashLength, txData.TxHash) + }
452-480
: Guard against large allocations (alternative to streaming): pre-size buffer.If you prefer to keep the current “collect then hash” style, pre-size
dataBytes
to avoid repeated growth.- var dataBytes []byte + // Pre-size: 8 bytes for each block number + 52-byte block ctx + 32 bytes per tx. + total := len(c.blocks)*(8+daBlockV7BlockContextEncodedLength) + for _, blockTxs := range c.transactions { + total += len(blockTxs) * common.HashLength + } + dataBytes := make([]byte, 0, total)encoding/codecv7_test.go (1)
51-87
: Broaden coverage: assert determinism, tx inclusion, and error path.Given the PR goal “Consider all transactions in chunk hash,” add tests to:
- Prove tx list changes alter the hash (and that order matters).
- Prove identical chunks have identical hashes (determinism).
- Exercise the error path for invalid TxHash length.
Example additions:
func TestChunkHashDeterministic(t *testing.T) { b := &daBlockV7{daBlockV0: daBlockV0{number: 42}} ch1 := newDAChunkV7([]DABlock{b}, [][]*types.TransactionData{{}}) ch2 := newDAChunkV7([]DABlock{b}, [][]*types.TransactionData{{}}) h1, err1 := ch1.Hash() h2, err2 := ch2.Hash() require.NoError(t, err1) require.NoError(t, err2) require.Equal(t, h1, h2) } func TestChunkHashChangesWithTransactions(t *testing.T) { b := &daBlockV7{daBlockV0: daBlockV0{number: 1}} // 32-byte zero hash txA := &types.TransactionData{TxHash: "0x" + strings.Repeat("00", 32)} // 32-byte one hash txB := &types.TransactionData{TxHash: "0x" + strings.Repeat("11", 32)} chNoTx := newDAChunkV7([]DABlock{b}, [][]*types.TransactionData{{}}) chTxA := newDAChunkV7([]DABlock{b}, [][]*types.TransactionData{{txA}}) chTxB := newDAChunkV7([]DABlock{b}, [][]*types.TransactionData{{txB}}) chTxAB := newDAChunkV7([]DABlock{b}, [][]*types.TransactionData{{txA, txB}}) chTxBA := newDAChunkV7([]DABlock{b}, [][]*types.TransactionData{{txB, txA}}) hNo, _ := chNoTx.Hash() hA, _ := chTxA.Hash() hB, _ := chTxB.Hash() hAB, _ := chTxAB.Hash() hBA, _ := chTxBA.Hash() require.NotEqual(t, hNo, hA) require.NotEqual(t, hA, hB) require.NotEqual(t, hAB, hBA) // order matters } func TestChunkHashInvalidTxHashLength(t *testing.T) { b := &daBlockV7{daBlockV0: daBlockV0{number: 1}} // 31-byte (invalid) hash bad := &types.TransactionData{TxHash: "0x" + strings.Repeat("00", 31)} ch := newDAChunkV7([]DABlock{b}, [][]*types.TransactionData{{bad}}) _, err := ch.Hash() require.Error(t, err) require.Contains(t, err.Error(), "invalid tx hash length") }If you’d like, I can push these as a follow-up commit.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
encoding/codecv7_test.go
(1 hunks)encoding/codecv7_types.go
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-18T03:40:09.800Z
Learnt from: colinlyguo
PR: scroll-tech/da-codec#25
File: encoding/codecv1_types.go:105-116
Timestamp: 2024-10-18T03:40:09.800Z
Learning: The code in `encoding/codecv1_types.go`, specifically the `Encode` method in `daBatchV1`, has been updated. Previous comments regarding hardcoded byte offsets may be outdated.
Applied to files:
encoding/codecv7_test.go
encoding/codecv7_types.go
🧬 Code graph analysis (1)
encoding/codecv7_test.go (1)
encoding/interfaces.go (1)
DABlock
(14-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: tests
🔇 Additional comments (3)
encoding/codecv7_types.go (2)
449-466
: Including the 8-byte block number in the chunk hash input is a solid fix.Prefixing each encoded block context with the big-endian block number removes prior collisions between otherwise identical block contexts at different heights. This is backward-safe given the v7 note that chunk hashes are not protocol-significant anymore (unique identifier only).
452-480
: No call sites ignore the new error return on daChunkV7.Hash()I’ve checked all direct invocations of daChunkV7.Hash():
- encoding/codecv7_test.go (lines 66 and 83) call
chunk.Hash()
and immediately assertrequire.NoError(t, err)
, so errors are handled.- There are no other standalone calls to
daChunkV7.Hash()
in the codebase that omit error handling.Since every use of the updated signature handles the returned error, this concern is resolved.
encoding/codecv7_test.go (1)
51-87
: Nice regression test ensuring block number affects the chunk hash.The test clearly demonstrates the intended non-collision when only the block number differs. Good use of minimal blocks and explicit checks.
Purpose or design rationale of this PR
The current v7 and v8 chunk hashing is prone to chunk hash collision, which has happened on mainnet: Two consecutive blocks with mostly identical header fields (including timestamp), each including enough transactions to fill a chunk. This results in two consecutive chunks with identical hashes.
The reason of the chunk hash collision is that some chunk contents are not considered during hashing, most importantly: The block number and the L2 transaction hashes.
This PR updates the chunk hash function to ensure that chunk hashes are unique. Since chunk hashes are only used for
rollup-relayer
's internal bookkeeping, and they're not used anymore in the contracts, provers, or in l2geth rollup-verifier, we can upgrade directly to this new hash.PR title
Your PR title must follow conventional commits (as we are doing squash merge for each PR), so it must start with one of the following types:
Breaking change label
Does this PR have the
breaking-change
label?Summary by CodeRabbit
Bug Fixes
Tests