Skip to content

Conversation

caneGuy
Copy link
Contributor

@caneGuy caneGuy commented Sep 5, 2017

What changes were proposed in this pull request?

Originally this pr is for SPARK-21902 which i think calling for DiskBlockManager.getFile should be uniform and we should print root cause when call 'Blockmanager#doPut' failed.
Since @kiszk commented,i split this one into an other pr PR-19171

How was this patch tested?

Exsist unit test

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

// This method should be kept in sync with
// org.apache.spark.network.shuffle.ExternalShuffleBlockResolver#getFile().
def getFile(filename: String): File = {
private def getFileInternal(filename: String): File = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this rename?

@kiszk
Copy link
Member

kiszk commented Sep 8, 2017

Since this PR includes two types of changes, I think that it would be good to split this into two PRs if both of them are necessary.

@caneGuy caneGuy changed the title [SPARK-21902] Uniform calling for DiskBlockManager.getFile and print root cause for doPut [SPARK-21902] Uniform calling for DiskBlockManager.getFile Sep 9, 2017
@caneGuy caneGuy changed the title [SPARK-21902] Uniform calling for DiskBlockManager.getFile [SPARK-21902][CORE] Uniform calling for DiskBlockManager.getFile Sep 11, 2017
@caneGuy
Copy link
Contributor Author

caneGuy commented Sep 15, 2017

Ping @kiszk Cloud you help take a look at this? Thanks too much.

@jerryshao
Copy link
Contributor

This is not a necessary fix. We usually don't do such changes without really fix anything.

@caneGuy
Copy link
Contributor Author

caneGuy commented Sep 15, 2017

Actually , initially i put this together with PR-19171 since i found the api is not unify when fix that problem.All right i will close this one.Cloud you help review 19171? @jerryshao Thanks.

@caneGuy caneGuy closed this Sep 15, 2017
ghost pushed a commit to dbtsai/spark that referenced this pull request Sep 15, 2017
## What changes were proposed in this pull request?

As logging below, actually exception will be hidden when removeBlockInternal throw an exception.
`2017-08-31,10:26:57,733 WARN org.apache.spark.storage.BlockManager: Putting block broadcast_110 failed due to an exception
2017-08-31,10:26:57,734 WARN org.apache.spark.broadcast.BroadcastManager: Failed to create a new broadcast in 1 attempts
java.io.IOException: Failed to create local dir in /tmp/blockmgr-5bb5ac1e-c494-434a-ab89-bd1808c6b9ed/2e.
        at org.apache.spark.storage.DiskBlockManager.getFile(DiskBlockManager.scala:70)
        at org.apache.spark.storage.DiskStore.remove(DiskStore.scala:115)
        at org.apache.spark.storage.BlockManager.removeBlockInternal(BlockManager.scala:1339)
        at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:910)
        at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
        at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:726)
        at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:1233)
        at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:122)
        at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:88)
        at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
        at org.apache.spark.broadcast.BroadcastManager$$anonfun$newBroadcast$1.apply$mcVI$sp(BroadcastManager.scala:60)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
        at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:58)
        at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1415)
        at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1002)
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:924)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$submitWaitingChildStages$6.apply(DAGScheduler.scala:771)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$submitWaitingChildStages$6.apply(DAGScheduler.scala:770)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at org.apache.spark.scheduler.DAGScheduler.submitWaitingChildStages(DAGScheduler.scala:770)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1235)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1662)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1620)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1609)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)`

In this pr i will print exception first make troubleshooting more conveniently.
PS:
This one split from [PR-19133](apache#19133)

## How was this patch tested?
Exsist unit test

Author: zhoukang <[email protected]>

Closes apache#19171 from caneGuy/zhoukang/print-rootcause.
@caneGuy caneGuy deleted the zhoukang/uniform-api branch September 25, 2017 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants