-
Notifications
You must be signed in to change notification settings - Fork 21.3k
eth/downloader: fix case where skeleton reorgs below the filled block #29358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eth/downloader: fix case where skeleton reorgs below the filled block #29358
Conversation
It detects a bug in skeleton sync code. Specifically I have to think how to properly fix it (or if you want to fix it by yourself, go ahead!). It's a nice uni test! |
Great insight! I will attempt a fix. |
96f95fa
to
70be2dd
Compare
I have pushed a fix for this test-case: When a skeleton sync has just linked, we assume that if the filled block is in the range of the skeleton it means there is a fork. Rewind the blockchain to the common ancestor of the skeleton and the filled block before proceeding with state/block backfilling. I've noticed that this case triggers for any fork that occurs where the fork is larger than 1 block. |
This seems to fix test and it seems to be the correct fix to me: diff --git a/eth/downloader/skeleton.go b/eth/downloader/skeleton.go
index 873ee950b6..9f8953f4db 100644
--- a/eth/downloader/skeleton.go
+++ b/eth/downloader/skeleton.go
@@ -1132,6 +1132,15 @@ func (s *skeleton) cleanStales(filled *types.Header) error {
if number+1 == s.progress.Subchains[0].Tail {
return nil
}
+ // If the latest fill was on a different subchain, it means the backfiller
+ // was interrupted before it got to do any meaningful work, no cleanup
+ header := rawdb.ReadSkeletonHeader(s.db, filled.Number.Uint64())
+ if header == nil {
+ return fmt.Errorf("filled header outside of skeleton range")
+ } else if header.Hash() != filled.Hash() {
+ log.Debug("Filled header on different sidechain", "number", number, "filled", filled.Hash(), "skeleton", header.Hash())
+ return nil
+ }
var (
start uint64
end uint64 I.e. When doing the skeleton cleanup, we check if the filled header is acually within the range of what we were meant to backfill. If not, it means the backfill was a noop (possibly because we started and stopped it so quickly that it didn't have time to do any meaningful work). In that case, just don't clean up anything. An alternative would be for the backfiller itself to mandatorily do a chain reorg (in such cases), but that leaks a bit of chain management into the backfiller, which for now is completely handled by blcokchain.ImportXXX triggered by the downloader. So whilst triggering a chain reorg in the backfiller should also work just as fine, it introduces a second chain mutation point and I'd rather keep it to one. What's unclear a bit is whether it can happen that the backfiller returns a header above the skeleton chain. In theory not (because that would mean it filled something it didn't know about), but that would error out with the new code so we should make sure. |
It's totally possible. If I understand correctly, there are four potential scenarios could happen for stale header clean. (a) (b) (c) (d) The fix looks good to me, although i would vote a tiny change. If the diff --git a/eth/downloader/skeleton.go b/eth/downloader/skeleton.go
index 873ee950b6..04421a2bf5 100644
--- a/eth/downloader/skeleton.go
+++ b/eth/downloader/skeleton.go
@@ -1132,6 +1132,16 @@ func (s *skeleton) cleanStales(filled *types.Header) error {
if number+1 == s.progress.Subchains[0].Tail {
return nil
}
+ // If the latest fill was on a different subchain, it means the backfiller
+ // was interrupted before it got to do any meaningful work, no cleanup
+ header := rawdb.ReadSkeletonHeader(s.db, filled.Number.Uint64())
+ if header == nil {
+ log.Debug("Filled header outside of skeleton range", "number", number, "head", s.progress.Subchains[0].Head, "tail", s.progress.Subchains[0].Tail)
+ return nil
+ } else if header.Hash() != filled.Hash() {
+ log.Debug("Filled header on different sidechain", "number", number, "filled", filled.Hash(), "skeleton", header.Hash())
+ return nil
+ }
var (
start uint64
end uint64 |
4ad005f
to
6ab5719
Compare
…d, rewind the chain to the shared ancestor before restarting state/block backfilling.
6ab5719
to
d1aa652
Compare
…nis non-existent or different than the corresponding skeleton header. add test case that beacon syncs a chain, beacon syncs to a separate fork Co-authored-by: Péter Szilágyi <[email protected]>
d1aa652
to
9f6de6d
Compare
Okay this should be good for merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…ethereum#29358) This change adds a testcase and fixes a corner-case in the skeleton sync. With this change, when doing the skeleton cleanup, we check if the filled header is acually within the range of what we were meant to backfill. If not, it means the backfill was a noop (possibly because we started and stopped it so quickly that it didn't have time to do any meaningful work). In that case, just don't clean up anything. --------- Co-authored-by: Péter Szilágyi <[email protected]>
Lifting this test-case from #29281 to provide some visibility on what I am stuck on and trying to debug.
The test syncs a chain, then syncs to a fork of the chain. Expect that the sync origin after this is at the fork block.
Logs indicate that the first sync succeeds. The sync to the fork succeeds in retrieving the skeleton but later fails after trying to start the backfiller: