-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-21902][CORE] Uniform calling for DiskBlockManager.getFile #19133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can one of the admins verify this patch? |
// This method should be kept in sync with | ||
// org.apache.spark.network.shuffle.ExternalShuffleBlockResolver#getFile(). | ||
def getFile(filename: String): File = { | ||
private def getFileInternal(filename: String): File = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this rename?
Since this PR includes two types of changes, I think that it would be good to split this into two PRs if both of them are necessary. |
Ping @kiszk Cloud you help take a look at this? Thanks too much. |
This is not a necessary fix. We usually don't do such changes without really fix anything. |
Actually , initially i put this together with PR-19171 since i found the api is not unify when fix that problem.All right i will close this one.Cloud you help review 19171? @jerryshao Thanks. |
## What changes were proposed in this pull request? As logging below, actually exception will be hidden when removeBlockInternal throw an exception. `2017-08-31,10:26:57,733 WARN org.apache.spark.storage.BlockManager: Putting block broadcast_110 failed due to an exception 2017-08-31,10:26:57,734 WARN org.apache.spark.broadcast.BroadcastManager: Failed to create a new broadcast in 1 attempts java.io.IOException: Failed to create local dir in /tmp/blockmgr-5bb5ac1e-c494-434a-ab89-bd1808c6b9ed/2e. at org.apache.spark.storage.DiskBlockManager.getFile(DiskBlockManager.scala:70) at org.apache.spark.storage.DiskStore.remove(DiskStore.scala:115) at org.apache.spark.storage.BlockManager.removeBlockInternal(BlockManager.scala:1339) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:910) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948) at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:726) at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:1233) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:122) at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:88) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) at org.apache.spark.broadcast.BroadcastManager$$anonfun$newBroadcast$1.apply$mcVI$sp(BroadcastManager.scala:60) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:58) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1415) at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1002) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:924) at org.apache.spark.scheduler.DAGScheduler$$anonfun$submitWaitingChildStages$6.apply(DAGScheduler.scala:771) at org.apache.spark.scheduler.DAGScheduler$$anonfun$submitWaitingChildStages$6.apply(DAGScheduler.scala:770) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at org.apache.spark.scheduler.DAGScheduler.submitWaitingChildStages(DAGScheduler.scala:770) at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1235) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1662) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1620) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1609) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)` In this pr i will print exception first make troubleshooting more conveniently. PS: This one split from [PR-19133](apache#19133) ## How was this patch tested? Exsist unit test Author: zhoukang <[email protected]> Closes apache#19171 from caneGuy/zhoukang/print-rootcause.
What changes were proposed in this pull request?
Originally this pr is for SPARK-21902 which i think calling for DiskBlockManager.getFile should be uniform and we should print root cause when call 'Blockmanager#doPut' failed.
Since @kiszk commented,i split this one into an other pr PR-19171
How was this patch tested?
Exsist unit test