fix: Consider all transactions in chunk hash #62

Thegaram · 2025-08-22T15:16:22Z

Purpose or design rationale of this PR

The current v7 and v8 chunk hashing is prone to chunk hash collision, which has happened on mainnet: Two consecutive blocks with mostly identical header fields (including timestamp), each including enough transactions to fill a chunk. This results in two consecutive chunks with identical hashes.

The reason of the chunk hash collision is that some chunk contents are not considered during hashing, most importantly: The block number and the L2 transaction hashes.

This PR updates the chunk hash function to ensure that chunk hashes are unique. Since chunk hashes are only used for rollup-relayer's internal bookkeeping, and they're not used anymore in the contracts, provers, or in l2geth rollup-verifier, we can upgrade directly to this new hash.

PR title

Your PR title must follow conventional commits (as we are doing squash merge for each PR), so it must start with one of the following types:

fix: A bug fix

Breaking change label

Does this PR have the breaking-change label?

No, this PR is not a breaking change
Yes

Summary by CodeRabbit

Bug Fixes
- Hash computation now includes block numbers and all transaction hashes, so resulting chunk identifiers may differ from prior releases.
- Stricter validation: transactions with missing or invalid hashes now raise errors instead of being skipped, improving robustness and fail-fast behavior.
- Chunk hash is treated as a unique identifier rather than a protocol-used value; downstream verification/caching may be affected.
Tests
- Added a test ensuring chunks with different block numbers produce distinct hashes.

coderabbitai · 2025-08-22T15:16:29Z

Walkthrough

daChunkV7.Hash() now includes each block's 8-byte big-endian block number and every transaction's TxHash (validated as 32 bytes) when assembling data for the final Keccak256 digest; invalid TxHash lengths return an error. The v7 chunk hash is retained only as a unique identifier, not used by protocol.

Changes

Cohort / File(s)	Summary
Hash computation & data layout `encoding/codecv7_types.go`	Hash now prefixes each block with an 8-byte big-endian block number, processes all txData (no L1-only filter), decodes and validates each `TxHash` as 32 bytes, appends block numbers and tx hashes to `dataBytes`, and computes Keccak256 over the aggregated bytes. Errors on invalid hash lengths. No public signatures changed.
Tests `encoding/codecv7_test.go`	Added `TestChunkHashUnique` which constructs two DA chunks that differ only by block number and asserts their computed hashes differ.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller
  participant daChunkV7 as daChunkV7.Hash()
  participant Blocks as blocks[]
  participant Keccak as Keccak256

  Note over daChunkV7: v7 flow — include block numbers + all tx hashes
  Caller->>daChunkV7: Hash()
  daChunkV7->>Blocks: iterate blocks
  loop for each block
    daChunkV7->>daChunkV7: append 8-byte BE block number
    daChunkV7->>Blocks: iterate txData in block
    loop for each txData
      daChunkV7->>daChunkV7: decode txData.TxHash
      alt TxHash length == 32
        daChunkV7->>daChunkV7: append txHash to dataBytes
      else invalid length
        daChunkV7-->>Caller: return error
      end
    end
  end
  daChunkV7->>Keccak: hash(dataBytes)
  Keccak-->>daChunkV7: digest
  daChunkV7-->>Caller: digest

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

fix(CodecV7): fix L1 message hash bug #48 — Changes how TransactionData/TxHash values are produced; directly affects the TxHash values consumed by the new hashing logic.

Suggested reviewers

georgehao
colinlyguo

Poem

I nibbled bytes beneath the moon, precise and spry,
Block numbers tucked like carrots, one by one, nearby.
Every tx invited—thirty-two bytes to sing,
Keccak hums the warren's new identifying ring. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch Thegaram-patch-1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Copilot

Pull Request Overview

This PR fixes chunk hash collisions in v7 and v8 chunk hashing by updating the hash function to include all transaction types and block numbers. The collision issue occurred on mainnet when consecutive blocks with nearly identical headers resulted in identical chunk hashes.

Updates chunk hash computation to include block numbers and all transaction types (not just L1 messages)
Adds test coverage to verify chunk hash uniqueness across different block numbers
Improves hash collision resistance for rollup-relayer's internal bookkeeping

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
encoding/codecv7_types.go	Updates Hash() method to include block numbers and all transaction hashes in computation
encoding/codecv7_test.go	Adds TestChunkHashUnique to verify hash uniqueness between chunks with different block numbers

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

encoding/codecv7_types.go

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

encoding/codecv7_types.go (3)

452-480: Hash construction: consider streaming Keccak or at least validating blocks/txs shape and improving error context.

The current approach builds a large dataBytes slice, which can be costly for big chunks. It also doesn’t guard against len(blocks) != len(transactions) and emits a low-signal error ("unexpected hash"). Suggest:

Stream hashing to avoid holding all bytes in memory.
Validate blocks/transactions list lengths match up-front.
Emit actionable errors, including block/tx indices and actual byte length found.

Apply this refactor (body of Hash plus better errors):

@@
-func (c *daChunkV7) Hash() (common.Hash, error) {
-	var dataBytes []byte
-
-	// concatenate block contexts
-	for _, block := range c.blocks {
-		// append block number
-		var tmp [8]byte
-		binary.BigEndian.PutUint64(tmp[:], block.Number())
-		dataBytes = append(dataBytes, tmp[:]...)
-
-		// append encoded block context
-		encodedBlock := block.Encode()
-		dataBytes = append(dataBytes, encodedBlock...)
-	}
-
-	// concatenate tx hashes
-	for _, blockTxs := range c.transactions {
-		for _, txData := range blockTxs {
-			hashBytes := common.FromHex(txData.TxHash)
-			if len(hashBytes) != common.HashLength {
-				return common.Hash{}, fmt.Errorf("unexpected hash: %s", txData.TxHash)
-			}
-			dataBytes = append(dataBytes, hashBytes...)
-		}
-	}
-
-	hash := crypto.Keccak256Hash(dataBytes)
-	return hash, nil
-}
+func (c *daChunkV7) Hash() (common.Hash, error) {
+	// Sanity check: the outer dims should match (blocks to tx-lists).
+	if len(c.transactions) != len(c.blocks) {
+		return common.Hash{}, fmt.Errorf("blocks/txs length mismatch: blocks=%d txLists=%d", len(c.blocks), len(c.transactions))
+	}
+
+	// Stream to Keccak to avoid large intermediate allocations.
+	hasher := sha3.NewLegacyKeccak256()
+
+	// Blocks
+	var tmp [8]byte
+	for _, block := range c.blocks {
+		binary.BigEndian.PutUint64(tmp[:], block.Number())
+		_, _ = hasher.Write(tmp[:])
+		_, _ = hasher.Write(block.Encode())
+	}
+
+	// Transactions
+	for bi, blockTxs := range c.transactions {
+		for ti, txData := range blockTxs {
+			hashBytes := common.FromHex(txData.TxHash)
+			if len(hashBytes) != common.HashLength {
+				return common.Hash{}, fmt.Errorf("invalid tx hash length (block %d, tx %d): got %d bytes, want %d (%q)",
+					bi, ti, len(hashBytes), common.HashLength, txData.TxHash)
+			}
+			_, _ = hasher.Write(hashBytes)
+		}
+	}
+
+	sum := hasher.Sum(nil)
+	return common.BytesToHash(sum), nil
+}

And add the import:

@@
 	"github.com/scroll-tech/go-ethereum/crypto"
 	"github.com/scroll-tech/go-ethereum/crypto/kzg4844"
+	"golang.org/x/crypto/sha3"

467-476: Decode and validate tx hashes: good strictness; add indices to error for debuggability.

Parsing via common.FromHex and enforcing exactly 32 bytes is robust. Improve the error message with block/tx indices for faster triage.

Minimal tweak:

-			if len(hashBytes) != common.HashLength {
-				return common.Hash{}, fmt.Errorf("unexpected hash: %s", txData.TxHash)
-			}
+			if len(hashBytes) != common.HashLength {
+				return common.Hash{}, fmt.Errorf("invalid tx hash length (block %d, tx %d): got %d bytes, want %d (%q)",
+					bi, ti, len(hashBytes), common.HashLength, txData.TxHash)
+			}

452-480: Guard against large allocations (alternative to streaming): pre-size buffer.

If you prefer to keep the current “collect then hash” style, pre-size dataBytes to avoid repeated growth.

-	var dataBytes []byte
+	// Pre-size: 8 bytes for each block number + 52-byte block ctx + 32 bytes per tx.
+	total := len(c.blocks)*(8+daBlockV7BlockContextEncodedLength)
+	for _, blockTxs := range c.transactions {
+		total += len(blockTxs) * common.HashLength
+	}
+	dataBytes := make([]byte, 0, total)

encoding/codecv7_test.go (1)

51-87: Broaden coverage: assert determinism, tx inclusion, and error path.

Given the PR goal “Consider all transactions in chunk hash,” add tests to:

Prove tx list changes alter the hash (and that order matters).
Prove identical chunks have identical hashes (determinism).
Exercise the error path for invalid TxHash length.

Example additions:

func TestChunkHashDeterministic(t *testing.T) {
	b := &daBlockV7{daBlockV0: daBlockV0{number: 42}}
	ch1 := newDAChunkV7([]DABlock{b}, [][]*types.TransactionData{{}})
	ch2 := newDAChunkV7([]DABlock{b}, [][]*types.TransactionData{{}})
	h1, err1 := ch1.Hash()
	h2, err2 := ch2.Hash()
	require.NoError(t, err1)
	require.NoError(t, err2)
	require.Equal(t, h1, h2)
}

func TestChunkHashChangesWithTransactions(t *testing.T) {
	b := &daBlockV7{daBlockV0: daBlockV0{number: 1}}
	// 32-byte zero hash
	txA := &types.TransactionData{TxHash: "0x" + strings.Repeat("00", 32)}
	// 32-byte one hash
	txB := &types.TransactionData{TxHash: "0x" + strings.Repeat("11", 32)}

	chNoTx := newDAChunkV7([]DABlock{b}, [][]*types.TransactionData{{}})
	chTxA  := newDAChunkV7([]DABlock{b}, [][]*types.TransactionData{{txA}})
	chTxB  := newDAChunkV7([]DABlock{b}, [][]*types.TransactionData{{txB}})
	chTxAB := newDAChunkV7([]DABlock{b}, [][]*types.TransactionData{{txA, txB}})
	chTxBA := newDAChunkV7([]DABlock{b}, [][]*types.TransactionData{{txB, txA}})

	hNo, _ := chNoTx.Hash()
	hA, _  := chTxA.Hash()
	hB, _  := chTxB.Hash()
	hAB, _ := chTxAB.Hash()
	hBA, _ := chTxBA.Hash()

	require.NotEqual(t, hNo, hA)
	require.NotEqual(t, hA, hB)
	require.NotEqual(t, hAB, hBA) // order matters
}

func TestChunkHashInvalidTxHashLength(t *testing.T) {
	b := &daBlockV7{daBlockV0: daBlockV0{number: 1}}
	// 31-byte (invalid) hash
	bad := &types.TransactionData{TxHash: "0x" + strings.Repeat("00", 31)}
	ch := newDAChunkV7([]DABlock{b}, [][]*types.TransactionData{{bad}})
	_, err := ch.Hash()
	require.Error(t, err)
	require.Contains(t, err.Error(), "invalid tx hash length")
}

If you’d like, I can push these as a follow-up commit.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 01efe0d and 9e1dd6c.

📒 Files selected for processing (2)

encoding/codecv7_test.go (1 hunks)
encoding/codecv7_types.go (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2024-10-18T03:40:09.800Z

Learnt from: colinlyguo
PR: scroll-tech/da-codec#25
File: encoding/codecv1_types.go:105-116
Timestamp: 2024-10-18T03:40:09.800Z
Learning: The code in `encoding/codecv1_types.go`, specifically the `Encode` method in `daBatchV1`, has been updated. Previous comments regarding hardcoded byte offsets may be outdated.

Applied to files:

encoding/codecv7_test.go
encoding/codecv7_types.go

🧬 Code graph analysis (1)

encoding/codecv7_test.go (1)

encoding/interfaces.go (1)

DABlock (14-23)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: tests

🔇 Additional comments (3)

encoding/codecv7_types.go (2)

449-466: Including the 8-byte block number in the chunk hash input is a solid fix.

Prefixing each encoded block context with the big-endian block number removes prior collisions between otherwise identical block contexts at different heights. This is backward-safe given the v7 note that chunk hashes are not protocol-significant anymore (unique identifier only).

452-480: No call sites ignore the new error return on daChunkV7.Hash()

I’ve checked all direct invocations of daChunkV7.Hash():

encoding/codecv7_test.go (lines 66 and 83) call chunk.Hash() and immediately assert require.NoError(t, err), so errors are handled.

There are no other standalone calls to daChunkV7.Hash() in the codebase that omit error handling.

Since every use of the updated signature handles the returned error, this concern is resolved.

encoding/codecv7_test.go (1)

51-87: Nice regression test ensuring block number affects the chunk hash.

The test clearly demonstrates the intended non-collision when only the block number differs. Good use of minimal blocks and explicit checks.

fix: Consider all transactions in chunk hash

01efe0d

Thegaram requested review from colinlyguo and jonastheis August 22, 2025 15:16

colinlyguo previously approved these changes Aug 22, 2025

View reviewed changes

Thegaram added 2 commits August 25, 2025 16:09

Merge branch 'main' into Thegaram-patch-1

4eac6d1

adjust

09129b6

Thegaram dismissed colinlyguo’s stale review via 09129b6 August 25, 2025 14:26

nit

9e1dd6c

Thegaram requested a review from Copilot August 25, 2025 14:30

Copilot AI reviewed Aug 25, 2025

View reviewed changes

encoding/codecv7_types.go Show resolved Hide resolved

coderabbitai bot reviewed Aug 25, 2025

View reviewed changes

Thegaram requested a review from yiweichi August 25, 2025 15:22

jonastheis approved these changes Aug 25, 2025

View reviewed changes

yiweichi approved these changes Aug 26, 2025

View reviewed changes

Thegaram merged commit b4cce5c into main Aug 26, 2025
4 checks passed

Thegaram deleted the Thegaram-patch-1 branch August 26, 2025 11:24

Thegaram mentioned this pull request Aug 26, 2025

Fix unique chunk hash scroll-tech/scroll#1727

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Consider all transactions in chunk hash #62

fix: Consider all transactions in chunk hash #62

Uh oh!

Thegaram commented Aug 22, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 22, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: Consider all transactions in chunk hash #62

fix: Consider all transactions in chunk hash #62

Uh oh!

Conversation

Thegaram commented Aug 22, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose or design rationale of this PR

PR title

Breaking change label

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Thegaram commented Aug 22, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 22, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)