Skip to content

Commit bc09ac9

Browse files
WillymontazWilliam Montaz
andauthored
Add a condition in RMAppManager to force application queue placement based on ACL during upgrade (apache#6)
This commit is only useful to perform the upgrade from 2 to 3. Once the upgrade is finalized, this specific code can (and should) be removed. See below for detailed explanation. In HDP2, when an application was assigned to a queue through acl queue mappings, it was done at the CapacityScheduler level and not modifying the ApplicationSubmissionContext. Because of this, the application was serialized in the ZKStateStore with the queue it was submitted on at submit time, which is 'default' (we ask user not to decalre the queue but to let acl queue mappings do the job). It was not a problem for the RM v2, because even if it read the queue default, it would eventually place it on the correct queue. However, in HDP3, this mechanism has changed and has been rationalized through specific classes and no more the CapacityScheduler (code is a lot cleaner by the way). As a result, queue placement is evaluated before the application is completly submitted. Thus, RM v3 will serialize the final queue to which the application was assigned. But it means that with v3, at restart, the RM will read the queue field that was serialized in the state store and try to place the queue on it. It will not reevaluate the acl queue mapping. When we upgrade from v2 to v3, the RM will read a lot of application placed on queue default and not try to perform acl queue mapping. Since there is no queue default, teh RM will KILL all the applications that were running. The fix consists in detecting that the queue returned from the state store is 'default' and if so, evaluate the placement of the application, allowing migration without killing existing apps. Once all applciations have been started/serialized again (through app state changes) in the ZkStateStore, this code has no value anymore. Co-authored-by: William Montaz <[email protected]>
1 parent f348a8b commit bc09ac9

File tree

1 file changed

+3
-3
lines changed
  • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager

1 file changed

+3
-3
lines changed

hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@
8484
/**
8585
* This class manages the list of applications for the resource manager.
8686
*/
87-
public class RMAppManager implements EventHandler<RMAppManagerEvent>,
87+
public class RMAppManager implements EventHandler<RMAppManagerEvent>,
8888
Recoverable {
8989

9090
private static final Logger LOG =
@@ -399,13 +399,13 @@ private RMAppImpl createAndPopulateNewRMApp(
399399
RMAppState recoveredFinalState) throws YarnException {
400400

401401
ApplicationPlacementContext placementContext = null;
402-
if (recoveredFinalState == null) {
402+
if (recoveredFinalState == null || (isRecovery && submissionContext.getQueue().equals("default"))) {
403403
placementContext = placeApplication(rmContext.getQueuePlacementManager(),
404404
submissionContext, user, isRecovery);
405405
}
406406

407407
// We only replace the queue when it's a new application
408-
if (!isRecovery) {
408+
if (!isRecovery || submissionContext.getQueue().equals("default")) {
409409
copyPlacementQueueToSubmissionContext(placementContext,
410410
submissionContext);
411411

0 commit comments

Comments
 (0)