Skip to content

Conversation

@ilitteri
Copy link
Contributor

Motivation

In a previous PR, DB checkpoints were introduced to ensure old state availability in the current path-based fashion. Every time a batch is sealed, a checkpoint whose state is the state of the latest block of the sealed batch is created to be used in the next batch.

The checkpoint is needed in two different steps of the batch commitment: for batch preparation (this is essentially building the batch) and for witness generation. Both steps need a non-modified checkpoint, but they both need to modify the checkpoint to be able to re-execute the batch.

As batch preparation occurs before witness generation, we opted to create a one-time checkpoint out of the main checkpoint that can be modified during batch preparation if needed (sometimes the batch was already available in the DB, and there's no need to re-execute anything); then, witness generation modifies the original checkpoint as needed because it is no longer needed.

Once the one-time checkpoint fulfills its purpose, it is removed. Currently, if batch preparation fails, the one-time checkpoint is not removed, and after retrying batch preparation, there's another attempt at creating the one-time checkpoint, which ends in an error because the directory already exists. We need to either avoid creating the one-time checkpoint again or to remove the existing one.

Description

Remove the existing one-time checkpoint if it already exists.

@ilitteri ilitteri self-assigned this Oct 28, 2025
@ilitteri ilitteri requested a review from a team as a code owner October 28, 2025 16:14
Copilot AI review requested due to automatic review settings October 28, 2025 16:14
@github-actions github-actions bot added the L2 Rollup client label Oct 28, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses a bug where batch preparation retries would fail due to an existing one-time checkpoint directory. The fix ensures that if a one-time checkpoint already exists (from a previous failed attempt), it is removed before attempting to create a new one.

Key Changes:

  • Added a check to detect if a one-time checkpoint directory already exists
  • Implemented removal of existing one-time checkpoint directories before creating new ones

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


if one_time_checkpoint_path.exists() {
remove_dir_all(&one_time_checkpoint_path).map_err(|e| {
CommitterError::FailedToCreateCheckpoint(format!(
Copy link

Copilot AI Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error type FailedToCreateCheckpoint is misleading when removing an existing checkpoint. Consider creating a more specific error variant like FailedToRemoveCheckpoint or using a more generic error message that reflects the cleanup operation.

Suggested change
CommitterError::FailedToCreateCheckpoint(format!(
CommitterError::FailedToRemoveCheckpoint(format!(

Copilot uses AI. Check for mistakes.
@github-actions
Copy link

github-actions bot commented Oct 28, 2025

Lines of code report

Total lines added: 0
Total lines removed: 9
Total lines changed: 9

Detailed view
+--------------------------------------------+-------+------+
| File                                       | Lines | Diff |
+--------------------------------------------+-------+------+
| ethrex/crates/l2/sequencer/l1_committer.rs | 973   | -9   |
+--------------------------------------------+-------+------+

@avilagaston9
Copy link
Contributor

The PR was changed to use a random one_time_checkpoint_path on each attempt, preventing the following error:

2025-10-28T17:55:42.302454Z ERROR ethrex_l2::sequencer::l1_committer: L1 Committer Error: Committer failed retrieve block from storage: Failed to open RocksDB: IO error: lock hold by current process, acquire time 1761674125 acquiring thread 50: /root/.local/share/ethrex/temp_checkpoint_batch_1/LOCK: No locks available

Comment on lines 307 to 319
.inspect_err(|_| {
if one_time_checkpoint_path.exists() {
// Remove one-time checkpoint directory
let _ = remove_dir_all(&one_time_checkpoint_path);
}
})?;

if one_time_checkpoint_path.exists() {
remove_dir_all(&one_time_checkpoint_path).map_err(|e| {
CommitterError::FailedToCreateCheckpoint(format!(
"Failed to remove one-time checkpoint directory {one_time_checkpoint_path:?}: {e}"
))
})?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove the one_time_checkpoint_path whether or not it returns an error, we can do it in one place

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done bd44892!

@ilitteri ilitteri added this pull request to the merge queue Oct 29, 2025
Merged via the queue into main with commit ee97e8e Oct 29, 2025
27 checks passed
@ilitteri ilitteri deleted the fix_checkpoint branch October 29, 2025 23:49
ManuelBilbao pushed a commit that referenced this pull request Oct 30, 2025
**Motivation**

In a previous PR, DB checkpoints were introduced to ensure old state
availability in the current path-based fashion. Every time a batch is
sealed, a checkpoint whose state is the state of the latest block of the
sealed batch is created to be used in the next batch.

The checkpoint is needed in two different steps of the batch commitment:
for batch preparation (this is essentially building the batch) and for
witness generation. Both steps need a non-modified checkpoint, but they
both need to modify the checkpoint to be able to re-execute the batch.

As batch preparation occurs before witness generation, we opted to
create a one-time checkpoint out of the main checkpoint that can be
modified during batch preparation if needed (sometimes the batch was
already available in the DB, and there's no need to re-execute
anything); then, witness generation modifies the original checkpoint as
needed because it is no longer needed.

Once the one-time checkpoint fulfills its purpose, it is removed.
Currently, if batch preparation fails, the one-time checkpoint is not
removed, and after retrying batch preparation, there's another attempt
at creating the one-time checkpoint, which ends in an error because the
directory already exists. We need to either avoid creating the one-time
checkpoint again or to remove the existing one.

**Description**

Remove the existing one-time checkpoint if it already exists.

---------

Co-authored-by: avilagaston9 <[email protected]>
Co-authored-by: Gianbelinche <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L2 Rollup client

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants