-
Notifications
You must be signed in to change notification settings - Fork 9.1k
YARN-11342. [Federation] Refactor getNewApplication, submitApplication Use FederationActionRetry. #5005
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…pplication Use FederationActionRetry.
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
@goiri Can you help review this pr? Thank you very much! |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
|
||
| <property> | ||
| <name>yarn.router.submit.interval.time</name> | ||
| <value>10</value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10ms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will modify the configuration file.
| |`yarn.router.admin.address` | `0.0.0.0:8052` | Admin address at the router. | | ||
| |`yarn.router.webapp.https.address` | `0.0.0.0:8091` | Secure webapp address at the router. | | ||
| |`yarn.router.submit.retry` | `3` | The number of retries in the router before we give up. | | ||
| |`yarn.router.submit.interval.time` | `10` | The interval between two retry, the default value is 10ms. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10ms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion, I will modify the code.
| .runWithRetries(cleanUpRetryCountNum, cleanUpRetrySleepTime); | ||
| return ((FederationActionRetry<Boolean>) (retry) -> | ||
| invokeCleanUpFinishApp(appId, isQuery, request)) | ||
| .runWithRetries(cleanUpRetryCountNum, cleanUpRetrySleepTime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation looks incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix it.
|
🎊 +1 overall
This message was automatically generated. |
| * cluster is composed of only 1 bad SubCluster. | ||
| */ | ||
| @Test | ||
| public void testGetNewApplicationOneBadSC() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion, I will fix it.
|
🎊 +1 overall
This message was automatically generated. |
|
@goiri Thank you very much for helping to review the code! |
…n Use FederationActionRetry. (apache#5005)
JIRA: YARN-11342. [Federation] Refactor FederationClientInterceptor#submitApplication Use FederationActionRetry.
In this pr, the code for getNewApplication and submitApplication is refactored, and FederationActionRetry is used.
The code readability has been enhanced, and the specific execution logic is encapsulated in two methods,
invokeGetNewApplicationandinvokeSubmitApplication, which are both idempotent methods.Optimized the logic of SubmitRetries. When the number of retries defined by the user is greater than the number of surviving SubClusters in the cluster, we need to choose a smaller value between the two.
Part of the audit log logic optimization, we will record all failed retries in the audit log to facilitate troubleshooting.