Skip to content

Commit fda1723

Browse files
ABastionOfSanityholimankaralabejmank88fjl
authored
V1.10.26 rebase wip (#289)
* eth/protocols/snap: fix problems due to idle-but-busy peers (ethereum#25651) * eth/protocols/snap: throttle trie heal requests when peers DoS us (ethereum#25666) * eth/protocols/snap: throttle trie heal requests when peers DoS us * eth/protocols/snap: lower heal throttle log to debug Co-authored-by: Martin Holst Swende <[email protected]> * eth/protocols/snap: fix comment Co-authored-by: Martin Holst Swende <[email protected]> * trie: check childrens' existence concurrently for snap heal (ethereum#25694) * eth: fix a rare datarace on CHT challenge reply / shutdown (ethereum#25831) * eth/filters: change filter block to be by-ref (ethereum#26054) This PR changes the block field in the filter to be a pointer, to disambiguate between empty hash and no hash * rpc: handle wrong HTTP batch response length (ethereum#26064) * params: release geth v1.10.26 stable * V1.10.25 statediff v4 wip (#275) * Statediff Geth Handle conflicts (#244) * Handle conflicts * Update go mod file versions * Make lint changes Disassociate block number from the indexer object Update ipld-eth-db ref Refactor builder code to make it reusable Use prefix comparison for account selective statediffing Update builder unit tests Add mode to write to CSV files in statediff file writer (#249) * Change file writing mode to csv files * Implement writer interface for file indexer * Implement option for csv or sql in file mode * Close files in CSV writer * Add tests for CSV file mode * Implement CSV file for watched addresses * Separate test configs for CSV and SQL * Refactor common code for file indexer tests Update indexer to include block hash in receipts and logs (#256) * Update indexer to include block hash in receipts and logs * Upgrade ipld-eth-db image in docker-compose to run tests Use watched addresses from direct indexing params by default while serving statediff APIs (#262) * Use watched addresses from direct indexing params in statediff APIs by default * Avoid using indexer object when direct indexing is off * Add nil check before accessing watched addresses from direct indexing params Rebase missed these changes needed at 1.10.20 Flags cleanup for CLI changes and linter complaints Linter appeasements to achieve perfection enforce go 1.18 for check (#267) * enforce go 1.18 for check * tests on 1.18 as well * adding db yml for possible change in docker-compose behavior in yml parsing Add indexer tests for handling non canonical blocks (#254) * Add indexer tests for header and transactions in a non canonical block * Add indexer tests for receipts in a non-canonical block and refactor * Add indexer tests for logs in a non-canonical block * Add indexer tests for state and storage nodes in a non-canonical block * Add indexer tests for non-canonical block at another height * Avoid passing address of a pointer * Update refs in GitHub workflow * Add genesis file path to stack-orchestrator config in GitHub workflow * Add descriptive comments fix non-deterministic ordering in unit tests Refactor indexer tests to avoid duplicate code (#270) * Refactor indexer tests to avoid duplicate code * Refactor file mode indexer tests * Fix expected db stats for sqlx after tx closure * Refactor indexer tests for legacy block * Refactor mainnet indexer tests * Refactor tests for watched addressess methods * Fix query in legacy indexer test rebase and resolve onto 1.10.23... still error out of index related to GetLeafKeys changed trie.Commit behavior was subtle about not not flushing to disk without an Update * no merge nodeset throws nil * linter appeasement Cerc refactor (#281) * first pass cerc refactor in cicd * 1st attempt to publish binary to git.vdb.to from github release * docker build step mangled * docker build step mangled * wrong username for docker login... which still succeeded * circcicd is not cerccicd * bad hostname adding manual override of binary publish to git.vdb.to for development/emergency (#282) Cerc io publish fix (#284) * adding manual override of binary publish to git.vdb.to for development/emergency * Create manual_binary_publish.yaml (#283) * github did not pick up workflow added outside of its UI and I still cannot spell cerc right rawdb helper functions for cold levelDB sync export Jenkins reborn (#285) * initial build and output testing... lots of trial and error * clean up for working (but failing) unit test geth with ubuntu foundation image * linter problem on comments in version * trying linter appeasement with gofmt output on versions.go Co-authored-by: Martin Holst Swende <[email protected]> Co-authored-by: Péter Szilágyi <[email protected]> Co-authored-by: Jordan Krage <[email protected]> Co-authored-by: Felix Lange <[email protected]>
1 parent 70d6dbb commit fda1723

File tree

8 files changed

+267
-71
lines changed

8 files changed

+267
-71
lines changed

eth/filters/filter.go

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,8 @@ type Filter struct {
3434
addresses []common.Address
3535
topics [][]common.Hash
3636

37-
block common.Hash // Block hash if filtering a single block
38-
begin, end int64 // Range interval if filtering multiple blocks
37+
block *common.Hash // Block hash if filtering a single block
38+
begin, end int64 // Range interval if filtering multiple blocks
3939

4040
matcher *bloombits.Matcher
4141
}
@@ -78,7 +78,7 @@ func (sys *FilterSystem) NewRangeFilter(begin, end int64, addresses []common.Add
7878
func (sys *FilterSystem) NewBlockFilter(block common.Hash, addresses []common.Address, topics [][]common.Hash) *Filter {
7979
// Create a generic filter and convert it into a block filter
8080
filter := newFilter(sys, addresses, topics)
81-
filter.block = block
81+
filter.block = &block
8282
return filter
8383
}
8484

@@ -96,8 +96,8 @@ func newFilter(sys *FilterSystem, addresses []common.Address, topics [][]common.
9696
// first block that contains matches, updating the start of the filter accordingly.
9797
func (f *Filter) Logs(ctx context.Context) ([]*types.Log, error) {
9898
// If we're doing singleton block filtering, execute and return
99-
if f.block != (common.Hash{}) {
100-
header, err := f.sys.backend.HeaderByHash(ctx, f.block)
99+
if f.block != nil {
100+
header, err := f.sys.backend.HeaderByHash(ctx, *f.block)
101101
if err != nil {
102102
return nil, err
103103
}

eth/handler.go

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -391,11 +391,16 @@ func (h *handler) runEthPeer(peer *eth.Peer, handler eth.Handler) error {
391391
if h.checkpointHash != (common.Hash{}) {
392392
// Request the peer's checkpoint header for chain height/weight validation
393393
resCh := make(chan *eth.Response)
394-
if _, err := peer.RequestHeadersByNumber(h.checkpointNumber, 1, 0, false, resCh); err != nil {
394+
395+
req, err := peer.RequestHeadersByNumber(h.checkpointNumber, 1, 0, false, resCh)
396+
if err != nil {
395397
return err
396398
}
397399
// Start a timer to disconnect if the peer doesn't reply in time
398400
go func() {
401+
// Ensure the request gets cancelled in case of error/drop
402+
defer req.Close()
403+
399404
timeout := time.NewTimer(syncChallengeTimeout)
400405
defer timeout.Stop()
401406

@@ -437,10 +442,15 @@ func (h *handler) runEthPeer(peer *eth.Peer, handler eth.Handler) error {
437442
// If we have any explicit peer required block hashes, request them
438443
for number, hash := range h.requiredBlocks {
439444
resCh := make(chan *eth.Response)
440-
if _, err := peer.RequestHeadersByNumber(number, 1, 0, false, resCh); err != nil {
445+
446+
req, err := peer.RequestHeadersByNumber(number, 1, 0, false, resCh)
447+
if err != nil {
441448
return err
442449
}
443-
go func(number uint64, hash common.Hash) {
450+
go func(number uint64, hash common.Hash, req *eth.Request) {
451+
// Ensure the request gets cancelled in case of error/drop
452+
defer req.Close()
453+
444454
timeout := time.NewTimer(syncChallengeTimeout)
445455
defer timeout.Stop()
446456

@@ -469,7 +479,7 @@ func (h *handler) runEthPeer(peer *eth.Peer, handler eth.Handler) error {
469479
peer.Log().Warn("Required block challenge timed out, dropping", "addr", peer.RemoteAddr(), "type", peer.Name())
470480
h.removePeer(peer.ID())
471481
}
472-
}(number, hash)
482+
}(number, hash, req)
473483
}
474484
// Handle incoming messages until the connection is torn down
475485
return handler(peer)

eth/protocols/snap/sync.go

Lines changed: 155 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,12 @@ import (
2121
"encoding/json"
2222
"errors"
2323
"fmt"
24+
gomath "math"
2425
"math/big"
2526
"math/rand"
2627
"sort"
2728
"sync"
29+
"sync/atomic"
2830
"time"
2931

3032
"github.com/ethereum/go-ethereum/common"
@@ -78,6 +80,29 @@ const (
7880
// and waste round trip times. If it's too high, we're capping responses and
7981
// waste bandwidth.
8082
maxTrieRequestCount = maxRequestSize / 512
83+
84+
// trienodeHealRateMeasurementImpact is the impact a single measurement has on
85+
// the local node's trienode processing capacity. A value closer to 0 reacts
86+
// slower to sudden changes, but it is also more stable against temporary hiccups.
87+
trienodeHealRateMeasurementImpact = 0.005
88+
89+
// minTrienodeHealThrottle is the minimum divisor for throttling trie node
90+
// heal requests to avoid overloading the local node and exessively expanding
91+
// the state trie bedth wise.
92+
minTrienodeHealThrottle = 1
93+
94+
// maxTrienodeHealThrottle is the maximum divisor for throttling trie node
95+
// heal requests to avoid overloading the local node and exessively expanding
96+
// the state trie bedth wise.
97+
maxTrienodeHealThrottle = maxTrieRequestCount
98+
99+
// trienodeHealThrottleIncrease is the multiplier for the throttle when the
100+
// rate of arriving data is higher than the rate of processing it.
101+
trienodeHealThrottleIncrease = 1.33
102+
103+
// trienodeHealThrottleDecrease is the divisor for the throttle when the
104+
// rate of arriving data is lower than the rate of processing it.
105+
trienodeHealThrottleDecrease = 1.25
81106
)
82107

83108
var (
@@ -431,6 +456,11 @@ type Syncer struct {
431456
trienodeHealReqs map[uint64]*trienodeHealRequest // Trie node requests currently running
432457
bytecodeHealReqs map[uint64]*bytecodeHealRequest // Bytecode requests currently running
433458

459+
trienodeHealRate float64 // Average heal rate for processing trie node data
460+
trienodeHealPend uint64 // Number of trie nodes currently pending for processing
461+
trienodeHealThrottle float64 // Divisor for throttling the amount of trienode heal data requested
462+
trienodeHealThrottled time.Time // Timestamp the last time the throttle was updated
463+
434464
trienodeHealSynced uint64 // Number of state trie nodes downloaded
435465
trienodeHealBytes common.StorageSize // Number of state trie bytes persisted to disk
436466
trienodeHealDups uint64 // Number of state trie nodes already processed
@@ -476,9 +506,10 @@ func NewSyncer(db ethdb.KeyValueStore) *Syncer {
476506
trienodeHealIdlers: make(map[string]struct{}),
477507
bytecodeHealIdlers: make(map[string]struct{}),
478508

479-
trienodeHealReqs: make(map[uint64]*trienodeHealRequest),
480-
bytecodeHealReqs: make(map[uint64]*bytecodeHealRequest),
481-
stateWriter: db.NewBatch(),
509+
trienodeHealReqs: make(map[uint64]*trienodeHealRequest),
510+
bytecodeHealReqs: make(map[uint64]*bytecodeHealRequest),
511+
trienodeHealThrottle: maxTrienodeHealThrottle, // Tune downward instead of insta-filling with junk
512+
stateWriter: db.NewBatch(),
482513

483514
extProgress: new(SyncProgress),
484515
}
@@ -1321,6 +1352,10 @@ func (s *Syncer) assignTrienodeHealTasks(success chan *trienodeHealResponse, fai
13211352
if cap > maxTrieRequestCount {
13221353
cap = maxTrieRequestCount
13231354
}
1355+
cap = int(float64(cap) / s.trienodeHealThrottle)
1356+
if cap <= 0 {
1357+
cap = 1
1358+
}
13241359
var (
13251360
hashes = make([]common.Hash, 0, cap)
13261361
paths = make([]string, 0, cap)
@@ -2090,6 +2125,10 @@ func (s *Syncer) processStorageResponse(res *storageResponse) {
20902125
// processTrienodeHealResponse integrates an already validated trienode response
20912126
// into the healer tasks.
20922127
func (s *Syncer) processTrienodeHealResponse(res *trienodeHealResponse) {
2128+
var (
2129+
start = time.Now()
2130+
fills int
2131+
)
20932132
for i, hash := range res.hashes {
20942133
node := res.nodes[i]
20952134

@@ -2098,6 +2137,8 @@ func (s *Syncer) processTrienodeHealResponse(res *trienodeHealResponse) {
20982137
res.task.trieTasks[res.paths[i]] = res.hashes[i]
20992138
continue
21002139
}
2140+
fills++
2141+
21012142
// Push the trie node into the state syncer
21022143
s.trienodeHealSynced++
21032144
s.trienodeHealBytes += common.StorageSize(len(node))
@@ -2121,6 +2162,50 @@ func (s *Syncer) processTrienodeHealResponse(res *trienodeHealResponse) {
21212162
log.Crit("Failed to persist healing data", "err", err)
21222163
}
21232164
log.Debug("Persisted set of healing data", "type", "trienodes", "bytes", common.StorageSize(batch.ValueSize()))
2165+
2166+
// Calculate the processing rate of one filled trie node
2167+
rate := float64(fills) / (float64(time.Since(start)) / float64(time.Second))
2168+
2169+
// Update the currently measured trienode queueing and processing throughput.
2170+
//
2171+
// The processing rate needs to be updated uniformly independent if we've
2172+
// processed 1x100 trie nodes or 100x1 to keep the rate consistent even in
2173+
// the face of varying network packets. As such, we cannot just measure the
2174+
// time it took to process N trie nodes and update once, we need one update
2175+
// per trie node.
2176+
//
2177+
// Naively, that would be:
2178+
//
2179+
// for i:=0; i<fills; i++ {
2180+
// healRate = (1-measurementImpact)*oldRate + measurementImpact*newRate
2181+
// }
2182+
//
2183+
// Essentially, a recursive expansion of HR = (1-MI)*HR + MI*NR.
2184+
//
2185+
// We can expand that formula for the Nth item as:
2186+
// HR(N) = (1-MI)^N*OR + (1-MI)^(N-1)*MI*NR + (1-MI)^(N-2)*MI*NR + ... + (1-MI)^0*MI*NR
2187+
//
2188+
// The above is a geometric sequence that can be summed to:
2189+
// HR(N) = (1-MI)^N*(OR-NR) + NR
2190+
s.trienodeHealRate = gomath.Pow(1-trienodeHealRateMeasurementImpact, float64(fills))*(s.trienodeHealRate-rate) + rate
2191+
2192+
pending := atomic.LoadUint64(&s.trienodeHealPend)
2193+
if time.Since(s.trienodeHealThrottled) > time.Second {
2194+
// Periodically adjust the trie node throttler
2195+
if float64(pending) > 2*s.trienodeHealRate {
2196+
s.trienodeHealThrottle *= trienodeHealThrottleIncrease
2197+
} else {
2198+
s.trienodeHealThrottle /= trienodeHealThrottleDecrease
2199+
}
2200+
if s.trienodeHealThrottle > maxTrienodeHealThrottle {
2201+
s.trienodeHealThrottle = maxTrienodeHealThrottle
2202+
} else if s.trienodeHealThrottle < minTrienodeHealThrottle {
2203+
s.trienodeHealThrottle = minTrienodeHealThrottle
2204+
}
2205+
s.trienodeHealThrottled = time.Now()
2206+
2207+
log.Debug("Updated trie node heal throttler", "rate", s.trienodeHealRate, "pending", pending, "throttle", s.trienodeHealThrottle)
2208+
}
21242209
}
21252210

21262211
// processBytecodeHealResponse integrates an already validated bytecode response
@@ -2248,14 +2333,18 @@ func (s *Syncer) OnAccounts(peer SyncPeer, id uint64, hashes []common.Hash, acco
22482333
// Whether or not the response is valid, we can mark the peer as idle and
22492334
// notify the scheduler to assign a new task. If the response is invalid,
22502335
// we'll drop the peer in a bit.
2336+
defer func() {
2337+
s.lock.Lock()
2338+
defer s.lock.Unlock()
2339+
if _, ok := s.peers[peer.ID()]; ok {
2340+
s.accountIdlers[peer.ID()] = struct{}{}
2341+
}
2342+
select {
2343+
case s.update <- struct{}{}:
2344+
default:
2345+
}
2346+
}()
22512347
s.lock.Lock()
2252-
if _, ok := s.peers[peer.ID()]; ok {
2253-
s.accountIdlers[peer.ID()] = struct{}{}
2254-
}
2255-
select {
2256-
case s.update <- struct{}{}:
2257-
default:
2258-
}
22592348
// Ensure the response is for a valid request
22602349
req, ok := s.accountReqs[id]
22612350
if !ok {
@@ -2360,14 +2449,18 @@ func (s *Syncer) onByteCodes(peer SyncPeer, id uint64, bytecodes [][]byte) error
23602449
// Whether or not the response is valid, we can mark the peer as idle and
23612450
// notify the scheduler to assign a new task. If the response is invalid,
23622451
// we'll drop the peer in a bit.
2452+
defer func() {
2453+
s.lock.Lock()
2454+
defer s.lock.Unlock()
2455+
if _, ok := s.peers[peer.ID()]; ok {
2456+
s.bytecodeIdlers[peer.ID()] = struct{}{}
2457+
}
2458+
select {
2459+
case s.update <- struct{}{}:
2460+
default:
2461+
}
2462+
}()
23632463
s.lock.Lock()
2364-
if _, ok := s.peers[peer.ID()]; ok {
2365-
s.bytecodeIdlers[peer.ID()] = struct{}{}
2366-
}
2367-
select {
2368-
case s.update <- struct{}{}:
2369-
default:
2370-
}
23712464
// Ensure the response is for a valid request
23722465
req, ok := s.bytecodeReqs[id]
23732466
if !ok {
@@ -2469,14 +2562,18 @@ func (s *Syncer) OnStorage(peer SyncPeer, id uint64, hashes [][]common.Hash, slo
24692562
// Whether or not the response is valid, we can mark the peer as idle and
24702563
// notify the scheduler to assign a new task. If the response is invalid,
24712564
// we'll drop the peer in a bit.
2565+
defer func() {
2566+
s.lock.Lock()
2567+
defer s.lock.Unlock()
2568+
if _, ok := s.peers[peer.ID()]; ok {
2569+
s.storageIdlers[peer.ID()] = struct{}{}
2570+
}
2571+
select {
2572+
case s.update <- struct{}{}:
2573+
default:
2574+
}
2575+
}()
24722576
s.lock.Lock()
2473-
if _, ok := s.peers[peer.ID()]; ok {
2474-
s.storageIdlers[peer.ID()] = struct{}{}
2475-
}
2476-
select {
2477-
case s.update <- struct{}{}:
2478-
default:
2479-
}
24802577
// Ensure the response is for a valid request
24812578
req, ok := s.storageReqs[id]
24822579
if !ok {
@@ -2596,14 +2693,18 @@ func (s *Syncer) OnTrieNodes(peer SyncPeer, id uint64, trienodes [][]byte) error
25962693
// Whether or not the response is valid, we can mark the peer as idle and
25972694
// notify the scheduler to assign a new task. If the response is invalid,
25982695
// we'll drop the peer in a bit.
2696+
defer func() {
2697+
s.lock.Lock()
2698+
defer s.lock.Unlock()
2699+
if _, ok := s.peers[peer.ID()]; ok {
2700+
s.trienodeHealIdlers[peer.ID()] = struct{}{}
2701+
}
2702+
select {
2703+
case s.update <- struct{}{}:
2704+
default:
2705+
}
2706+
}()
25992707
s.lock.Lock()
2600-
if _, ok := s.peers[peer.ID()]; ok {
2601-
s.trienodeHealIdlers[peer.ID()] = struct{}{}
2602-
}
2603-
select {
2604-
case s.update <- struct{}{}:
2605-
default:
2606-
}
26072708
// Ensure the response is for a valid request
26082709
req, ok := s.trienodeHealReqs[id]
26092710
if !ok {
@@ -2639,10 +2740,12 @@ func (s *Syncer) OnTrieNodes(peer SyncPeer, id uint64, trienodes [][]byte) error
26392740

26402741
// Cross reference the requested trienodes with the response to find gaps
26412742
// that the serving node is missing
2642-
hasher := sha3.NewLegacyKeccak256().(crypto.KeccakState)
2643-
hash := make([]byte, 32)
2644-
2645-
nodes := make([][]byte, len(req.hashes))
2743+
var (
2744+
hasher = sha3.NewLegacyKeccak256().(crypto.KeccakState)
2745+
hash = make([]byte, 32)
2746+
nodes = make([][]byte, len(req.hashes))
2747+
fills uint64
2748+
)
26462749
for i, j := 0, 0; i < len(trienodes); i++ {
26472750
// Find the next hash that we've been served, leaving misses with nils
26482751
hasher.Reset()
@@ -2654,16 +2757,22 @@ func (s *Syncer) OnTrieNodes(peer SyncPeer, id uint64, trienodes [][]byte) error
26542757
}
26552758
if j < len(req.hashes) {
26562759
nodes[j] = trienodes[i]
2760+
fills++
26572761
j++
26582762
continue
26592763
}
26602764
// We've either ran out of hashes, or got unrequested data
26612765
logger.Warn("Unexpected healing trienodes", "count", len(trienodes)-i)
2766+
26622767
// Signal this request as failed, and ready for rescheduling
26632768
s.scheduleRevertTrienodeHealRequest(req)
26642769
return errors.New("unexpected healing trienode")
26652770
}
26662771
// Response validated, send it to the scheduler for filling
2772+
atomic.AddUint64(&s.trienodeHealPend, fills)
2773+
defer func() {
2774+
atomic.AddUint64(&s.trienodeHealPend, ^(fills - 1))
2775+
}()
26672776
response := &trienodeHealResponse{
26682777
paths: req.paths,
26692778
task: req.task,
@@ -2691,14 +2800,18 @@ func (s *Syncer) onHealByteCodes(peer SyncPeer, id uint64, bytecodes [][]byte) e
26912800
// Whether or not the response is valid, we can mark the peer as idle and
26922801
// notify the scheduler to assign a new task. If the response is invalid,
26932802
// we'll drop the peer in a bit.
2803+
defer func() {
2804+
s.lock.Lock()
2805+
defer s.lock.Unlock()
2806+
if _, ok := s.peers[peer.ID()]; ok {
2807+
s.bytecodeHealIdlers[peer.ID()] = struct{}{}
2808+
}
2809+
select {
2810+
case s.update <- struct{}{}:
2811+
default:
2812+
}
2813+
}()
26942814
s.lock.Lock()
2695-
if _, ok := s.peers[peer.ID()]; ok {
2696-
s.bytecodeHealIdlers[peer.ID()] = struct{}{}
2697-
}
2698-
select {
2699-
case s.update <- struct{}{}:
2700-
default:
2701-
}
27022815
// Ensure the response is for a valid request
27032816
req, ok := s.bytecodeHealReqs[id]
27042817
if !ok {

0 commit comments

Comments
 (0)