Commit c3f46d5
[SPARK-41360][CORE] Avoid BlockManager re-registration if the executor has been lost
### What changes were proposed in this pull request?
This PR majorly proposes to reject the block manager re-registration if the executor has been already considered lost/dead from the scheduler backend.
Along with the major proposal, this PR also includes a few other changes:
* Only post `SparkListenerBlockManagerAdded` event when the registration succeeds
* Return an "invalid" executor id when the re-registration fails
* Do not report all blocks when the re-registration fails
### Why are the changes needed?
BlockManager re-registration from lost executor (terminated/terminating executor or orphan executor) has led to some known issues, e.g., false-active executor shows up in UP (SPARK-35011), [block fetching to the dead executor](#32114 (comment)). And since there's no re-registration from the lost executor itself, it's meaningless to have BlockManager re-registration when the executor is already lost.
Regarding the corner case where the re-registration event comes earlier before the lost executor is actually removed from the scheduler backend, I think it is not possible. Because re-registration will only be required when the BlockManager doesn't see the block manager in `blockManagerInfo`. And the block manager will only be removed from `blockManagerInfo` whether when the executor is already know lost or removed by the driver proactively. So the executor should always be removed from the scheduler backend first before the re-registration event comes.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit test
Closes #38876 from Ngone51/fix-blockmanager-reregister.
Authored-by: Yi Wu <[email protected]>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>1 parent 7cb8288 commit c3f46d5
File tree
6 files changed
+119
-16
lines changed- core/src
- main/scala/org/apache/spark/storage
- test/scala/org/apache/spark/storage
6 files changed
+119
-16
lines changedLines changed: 8 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
637 | 637 | | |
638 | 638 | | |
639 | 639 | | |
640 | | - | |
641 | | - | |
642 | | - | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
643 | 648 | | |
644 | 649 | | |
645 | 650 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
147 | 147 | | |
148 | 148 | | |
149 | 149 | | |
| 150 | + | |
| 151 | + | |
150 | 152 | | |
Lines changed: 17 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
74 | 74 | | |
75 | 75 | | |
76 | 76 | | |
77 | | - | |
| 77 | + | |
| 78 | + | |
78 | 79 | | |
79 | 80 | | |
80 | | - | |
81 | | - | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
82 | 96 | | |
83 | 97 | | |
84 | 98 | | |
| |||
Lines changed: 34 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
117 | 117 | | |
118 | 118 | | |
119 | 119 | | |
120 | | - | |
121 | | - | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
122 | 124 | | |
123 | 125 | | |
124 | 126 | | |
| |||
572 | 574 | | |
573 | 575 | | |
574 | 576 | | |
575 | | - | |
| 577 | + | |
| 578 | + | |
576 | 579 | | |
577 | 580 | | |
578 | 581 | | |
| |||
583 | 586 | | |
584 | 587 | | |
585 | 588 | | |
586 | | - | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
587 | 595 | | |
588 | 596 | | |
589 | 597 | | |
| |||
616 | 624 | | |
617 | 625 | | |
618 | 626 | | |
| 627 | + | |
| 628 | + | |
619 | 629 | | |
620 | | - | |
621 | | - | |
622 | | - | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
623 | 650 | | |
624 | 651 | | |
625 | 652 | | |
| |||
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
66 | | - | |
| 66 | + | |
| 67 | + | |
67 | 68 | | |
68 | 69 | | |
69 | 70 | | |
| |||
Lines changed: 56 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
295 | 295 | | |
296 | 296 | | |
297 | 297 | | |
298 | | - | |
| 298 | + | |
299 | 299 | | |
300 | 300 | | |
301 | 301 | | |
| |||
361 | 361 | | |
362 | 362 | | |
363 | 363 | | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
364 | 402 | | |
365 | 403 | | |
366 | 404 | | |
| |||
669 | 707 | | |
670 | 708 | | |
671 | 709 | | |
| 710 | + | |
| 711 | + | |
| 712 | + | |
| 713 | + | |
| 714 | + | |
| 715 | + | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
672 | 726 | | |
673 | 727 | | |
674 | 728 | | |
| |||
2207 | 2261 | | |
2208 | 2262 | | |
2209 | 2263 | | |
2210 | | - | |
| 2264 | + | |
2211 | 2265 | | |
2212 | 2266 | | |
2213 | 2267 | | |
| |||
0 commit comments