@@ -51,7 +51,7 @@ import org.apache.spark.util.random.{SamplingUtils, XORShiftRandom}
5151 * findSplits() method during initialization, after which each continuous feature becomes
5252 * an ordered discretized feature with at most maxBins possible values.
5353 *
54- * The main loop in the algorithm operates on a queue of nodes (nodeQueue ). These nodes
54+ * The main loop in the algorithm operates on a queue of nodes (nodeStack ). These nodes
5555 * lie at the periphery of the tree being trained. If multiple trees are being trained at once,
5656 * then this queue contains nodes from all of them. Each iteration works roughly as follows:
5757 * On the master node:
@@ -162,11 +162,10 @@ private[spark] object RandomForest extends Logging {
162162 }
163163
164164 /*
165- FILO queue of nodes to train: (treeIndex, node)
166- We make this FILO by always inserting nodes by appending (+=) and removing with dropRight.
165+ Stack of nodes to train: (treeIndex, node)
167166 The reason this is FILO is that we train many trees at once, but we want to focus on
168167 completing trees, rather than training all simultaneously. If we are splitting nodes from
169- 1 tree, then the new nodes to split will be put at the end of this list , so we will continue
168+ 1 tree, then the new nodes to split will be put at the top of this stack , so we will continue
170169 training the same tree in the next iteration. This focus allows us to send fewer trees to
171170 workers on each iteration; see topNodesForGroup below.
172171 */
0 commit comments