Skip to content

Conversation

Thegaram
Copy link
Contributor

@Thegaram Thegaram commented Aug 22, 2025

Purpose or design rationale of this PR

The current v7 and v8 chunk hashing is prone to chunk hash collision, which has happened on mainnet: Two consecutive blocks with mostly identical header fields (including timestamp), each including enough transactions to fill a chunk. This results in two consecutive chunks with identical hashes.

The reason of the chunk hash collision is that some chunk contents are not considered during hashing, most importantly: The block number and the L2 transaction hashes.

This PR updates the chunk hash function to ensure that chunk hashes are unique. Since chunk hashes are only used for rollup-relayer's internal bookkeeping, and they're not used anymore in the contracts, provers, or in l2geth rollup-verifier, we can upgrade directly to this new hash.

PR title

Your PR title must follow conventional commits (as we are doing squash merge for each PR), so it must start with one of the following types:

  • fix: A bug fix

Breaking change label

Does this PR have the breaking-change label?

  • No, this PR is not a breaking change
  • Yes

Summary by CodeRabbit

  • Bug Fixes

    • Hash computation now includes block numbers and all transaction hashes, so resulting chunk identifiers may differ from prior releases.
    • Stricter validation: transactions with missing or invalid hashes now raise errors instead of being skipped, improving robustness and fail-fast behavior.
    • Chunk hash is treated as a unique identifier rather than a protocol-used value; downstream verification/caching may be affected.
  • Tests

    • Added a test ensuring chunks with different block numbers produce distinct hashes.

Copy link

coderabbitai bot commented Aug 22, 2025

Walkthrough

daChunkV7.Hash() now includes each block's 8-byte big-endian block number and every transaction's TxHash (validated as 32 bytes) when assembling data for the final Keccak256 digest; invalid TxHash lengths return an error. The v7 chunk hash is retained only as a unique identifier, not used by protocol.

Changes

Cohort / File(s) Summary
Hash computation & data layout
encoding/codecv7_types.go
Hash now prefixes each block with an 8-byte big-endian block number, processes all txData (no L1-only filter), decodes and validates each TxHash as 32 bytes, appends block numbers and tx hashes to dataBytes, and computes Keccak256 over the aggregated bytes. Errors on invalid hash lengths. No public signatures changed.
Tests
encoding/codecv7_test.go
Added TestChunkHashUnique which constructs two DA chunks that differ only by block number and asserts their computed hashes differ.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller
  participant daChunkV7 as daChunkV7.Hash()
  participant Blocks as blocks[]
  participant Keccak as Keccak256

  Note over daChunkV7: v7 flow — include block numbers + all tx hashes
  Caller->>daChunkV7: Hash()
  daChunkV7->>Blocks: iterate blocks
  loop for each block
    daChunkV7->>daChunkV7: append 8-byte BE block number
    daChunkV7->>Blocks: iterate txData in block
    loop for each txData
      daChunkV7->>daChunkV7: decode txData.TxHash
      alt TxHash length == 32
        daChunkV7->>daChunkV7: append txHash to dataBytes
      else invalid length
        daChunkV7-->>Caller: return error
      end
    end
  end
  daChunkV7->>Keccak: hash(dataBytes)
  Keccak-->>daChunkV7: digest
  daChunkV7-->>Caller: digest
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • georgehao
  • colinlyguo

Poem

I nibbled bytes beneath the moon, precise and spry,
Block numbers tucked like carrots, one by one, nearby.
Every tx invited—thirty-two bytes to sing,
Keccak hums the warren's new identifying ring. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch Thegaram-patch-1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

colinlyguo
colinlyguo previously approved these changes Aug 22, 2025
@Thegaram Thegaram requested a review from Copilot August 25, 2025 14:30
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes chunk hash collisions in v7 and v8 chunk hashing by updating the hash function to include all transaction types and block numbers. The collision issue occurred on mainnet when consecutive blocks with nearly identical headers resulted in identical chunk hashes.

  • Updates chunk hash computation to include block numbers and all transaction types (not just L1 messages)
  • Adds test coverage to verify chunk hash uniqueness across different block numbers
  • Improves hash collision resistance for rollup-relayer's internal bookkeeping

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
encoding/codecv7_types.go Updates Hash() method to include block numbers and all transaction hashes in computation
encoding/codecv7_test.go Adds TestChunkHashUnique to verify hash uniqueness between chunks with different block numbers

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
encoding/codecv7_types.go (3)

452-480: Hash construction: consider streaming Keccak or at least validating blocks/txs shape and improving error context.

The current approach builds a large dataBytes slice, which can be costly for big chunks. It also doesn’t guard against len(blocks) != len(transactions) and emits a low-signal error ("unexpected hash"). Suggest:

  • Stream hashing to avoid holding all bytes in memory.
  • Validate blocks/transactions list lengths match up-front.
  • Emit actionable errors, including block/tx indices and actual byte length found.

Apply this refactor (body of Hash plus better errors):

@@
-func (c *daChunkV7) Hash() (common.Hash, error) {
-	var dataBytes []byte
-
-	// concatenate block contexts
-	for _, block := range c.blocks {
-		// append block number
-		var tmp [8]byte
-		binary.BigEndian.PutUint64(tmp[:], block.Number())
-		dataBytes = append(dataBytes, tmp[:]...)
-
-		// append encoded block context
-		encodedBlock := block.Encode()
-		dataBytes = append(dataBytes, encodedBlock...)
-	}
-
-	// concatenate tx hashes
-	for _, blockTxs := range c.transactions {
-		for _, txData := range blockTxs {
-			hashBytes := common.FromHex(txData.TxHash)
-			if len(hashBytes) != common.HashLength {
-				return common.Hash{}, fmt.Errorf("unexpected hash: %s", txData.TxHash)
-			}
-			dataBytes = append(dataBytes, hashBytes...)
-		}
-	}
-
-	hash := crypto.Keccak256Hash(dataBytes)
-	return hash, nil
-}
+func (c *daChunkV7) Hash() (common.Hash, error) {
+	// Sanity check: the outer dims should match (blocks to tx-lists).
+	if len(c.transactions) != len(c.blocks) {
+		return common.Hash{}, fmt.Errorf("blocks/txs length mismatch: blocks=%d txLists=%d", len(c.blocks), len(c.transactions))
+	}
+
+	// Stream to Keccak to avoid large intermediate allocations.
+	hasher := sha3.NewLegacyKeccak256()
+
+	// Blocks
+	var tmp [8]byte
+	for _, block := range c.blocks {
+		binary.BigEndian.PutUint64(tmp[:], block.Number())
+		_, _ = hasher.Write(tmp[:])
+		_, _ = hasher.Write(block.Encode())
+	}
+
+	// Transactions
+	for bi, blockTxs := range c.transactions {
+		for ti, txData := range blockTxs {
+			hashBytes := common.FromHex(txData.TxHash)
+			if len(hashBytes) != common.HashLength {
+				return common.Hash{}, fmt.Errorf("invalid tx hash length (block %d, tx %d): got %d bytes, want %d (%q)",
+					bi, ti, len(hashBytes), common.HashLength, txData.TxHash)
+			}
+			_, _ = hasher.Write(hashBytes)
+		}
+	}
+
+	sum := hasher.Sum(nil)
+	return common.BytesToHash(sum), nil
+}

And add the import:

@@
 	"github.com/scroll-tech/go-ethereum/crypto"
 	"github.com/scroll-tech/go-ethereum/crypto/kzg4844"
+	"golang.org/x/crypto/sha3"

467-476: Decode and validate tx hashes: good strictness; add indices to error for debuggability.

Parsing via common.FromHex and enforcing exactly 32 bytes is robust. Improve the error message with block/tx indices for faster triage.

Minimal tweak:

-			if len(hashBytes) != common.HashLength {
-				return common.Hash{}, fmt.Errorf("unexpected hash: %s", txData.TxHash)
-			}
+			if len(hashBytes) != common.HashLength {
+				return common.Hash{}, fmt.Errorf("invalid tx hash length (block %d, tx %d): got %d bytes, want %d (%q)",
+					bi, ti, len(hashBytes), common.HashLength, txData.TxHash)
+			}

452-480: Guard against large allocations (alternative to streaming): pre-size buffer.

If you prefer to keep the current “collect then hash” style, pre-size dataBytes to avoid repeated growth.

-	var dataBytes []byte
+	// Pre-size: 8 bytes for each block number + 52-byte block ctx + 32 bytes per tx.
+	total := len(c.blocks)*(8+daBlockV7BlockContextEncodedLength)
+	for _, blockTxs := range c.transactions {
+		total += len(blockTxs) * common.HashLength
+	}
+	dataBytes := make([]byte, 0, total)
encoding/codecv7_test.go (1)

51-87: Broaden coverage: assert determinism, tx inclusion, and error path.

Given the PR goal “Consider all transactions in chunk hash,” add tests to:

  • Prove tx list changes alter the hash (and that order matters).
  • Prove identical chunks have identical hashes (determinism).
  • Exercise the error path for invalid TxHash length.

Example additions:

func TestChunkHashDeterministic(t *testing.T) {
	b := &daBlockV7{daBlockV0: daBlockV0{number: 42}}
	ch1 := newDAChunkV7([]DABlock{b}, [][]*types.TransactionData{{}})
	ch2 := newDAChunkV7([]DABlock{b}, [][]*types.TransactionData{{}})
	h1, err1 := ch1.Hash()
	h2, err2 := ch2.Hash()
	require.NoError(t, err1)
	require.NoError(t, err2)
	require.Equal(t, h1, h2)
}

func TestChunkHashChangesWithTransactions(t *testing.T) {
	b := &daBlockV7{daBlockV0: daBlockV0{number: 1}}
	// 32-byte zero hash
	txA := &types.TransactionData{TxHash: "0x" + strings.Repeat("00", 32)}
	// 32-byte one hash
	txB := &types.TransactionData{TxHash: "0x" + strings.Repeat("11", 32)}

	chNoTx := newDAChunkV7([]DABlock{b}, [][]*types.TransactionData{{}})
	chTxA  := newDAChunkV7([]DABlock{b}, [][]*types.TransactionData{{txA}})
	chTxB  := newDAChunkV7([]DABlock{b}, [][]*types.TransactionData{{txB}})
	chTxAB := newDAChunkV7([]DABlock{b}, [][]*types.TransactionData{{txA, txB}})
	chTxBA := newDAChunkV7([]DABlock{b}, [][]*types.TransactionData{{txB, txA}})

	hNo, _ := chNoTx.Hash()
	hA, _  := chTxA.Hash()
	hB, _  := chTxB.Hash()
	hAB, _ := chTxAB.Hash()
	hBA, _ := chTxBA.Hash()

	require.NotEqual(t, hNo, hA)
	require.NotEqual(t, hA, hB)
	require.NotEqual(t, hAB, hBA) // order matters
}

func TestChunkHashInvalidTxHashLength(t *testing.T) {
	b := &daBlockV7{daBlockV0: daBlockV0{number: 1}}
	// 31-byte (invalid) hash
	bad := &types.TransactionData{TxHash: "0x" + strings.Repeat("00", 31)}
	ch := newDAChunkV7([]DABlock{b}, [][]*types.TransactionData{{bad}})
	_, err := ch.Hash()
	require.Error(t, err)
	require.Contains(t, err.Error(), "invalid tx hash length")
}

If you’d like, I can push these as a follow-up commit.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 01efe0d and 9e1dd6c.

📒 Files selected for processing (2)
  • encoding/codecv7_test.go (1 hunks)
  • encoding/codecv7_types.go (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-18T03:40:09.800Z
Learnt from: colinlyguo
PR: scroll-tech/da-codec#25
File: encoding/codecv1_types.go:105-116
Timestamp: 2024-10-18T03:40:09.800Z
Learning: The code in `encoding/codecv1_types.go`, specifically the `Encode` method in `daBatchV1`, has been updated. Previous comments regarding hardcoded byte offsets may be outdated.

Applied to files:

  • encoding/codecv7_test.go
  • encoding/codecv7_types.go
🧬 Code graph analysis (1)
encoding/codecv7_test.go (1)
encoding/interfaces.go (1)
  • DABlock (14-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: tests
🔇 Additional comments (3)
encoding/codecv7_types.go (2)

449-466: Including the 8-byte block number in the chunk hash input is a solid fix.

Prefixing each encoded block context with the big-endian block number removes prior collisions between otherwise identical block contexts at different heights. This is backward-safe given the v7 note that chunk hashes are not protocol-significant anymore (unique identifier only).


452-480: No call sites ignore the new error return on daChunkV7.Hash()

I’ve checked all direct invocations of daChunkV7.Hash():

  • encoding/codecv7_test.go (lines 66 and 83) call chunk.Hash() and immediately assert require.NoError(t, err), so errors are handled.
  • There are no other standalone calls to daChunkV7.Hash() in the codebase that omit error handling.

Since every use of the updated signature handles the returned error, this concern is resolved.

encoding/codecv7_test.go (1)

51-87: Nice regression test ensuring block number affects the chunk hash.

The test clearly demonstrates the intended non-collision when only the block number differs. Good use of minimal blocks and explicit checks.

@Thegaram Thegaram requested a review from yiweichi August 25, 2025 15:22
@Thegaram Thegaram merged commit b4cce5c into main Aug 26, 2025
4 checks passed
@Thegaram Thegaram deleted the Thegaram-patch-1 branch August 26, 2025 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants