Skip to content

Commit 16adaf6

Browse files
sujith71955gatorsmile
authored andcommitted
[SPARK-22601][SQL] Data load is getting displayed successful on providing non existing nonlocal file path
## What changes were proposed in this pull request? When user tries to load data with a non existing hdfs file path system is not validating it and the load command operation is getting successful. This is misleading to the user. already there is a validation in the scenario of none existing local file path. This PR has added validation in the scenario of nonexisting hdfs file path ## How was this patch tested? UT has been added for verifying the issue, also snapshots has been added after the verification in a spark yarn cluster Author: sujith71955 <[email protected]> Closes #19823 from sujith71955/master_LoadComand_Issue.
1 parent dc36542 commit 16adaf6

File tree

2 files changed

+17
-1
lines changed

2 files changed

+17
-1
lines changed

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -340,7 +340,7 @@ case class LoadDataCommand(
340340
uri
341341
} else {
342342
val uri = new URI(path)
343-
if (uri.getScheme() != null && uri.getAuthority() != null) {
343+
val hdfsUri = if (uri.getScheme() != null && uri.getAuthority() != null) {
344344
uri
345345
} else {
346346
// Follow Hive's behavior:
@@ -380,6 +380,13 @@ case class LoadDataCommand(
380380
}
381381
new URI(scheme, authority, absolutePath, uri.getQuery(), uri.getFragment())
382382
}
383+
val hadoopConf = sparkSession.sessionState.newHadoopConf()
384+
val srcPath = new Path(hdfsUri)
385+
val fs = srcPath.getFileSystem(hadoopConf)
386+
if (!fs.exists(srcPath)) {
387+
throw new AnalysisException(s"LOAD DATA input path does not exist: $path")
388+
}
389+
hdfsUri
383390
}
384391

385392
if (partition.nonEmpty) {

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2141,4 +2141,13 @@ class HiveDDLSuite
21412141
}
21422142
}
21432143
}
2144+
2145+
test("load command for non local invalid path validation") {
2146+
withTable("tbl") {
2147+
sql("CREATE TABLE tbl(i INT, j STRING)")
2148+
val e = intercept[AnalysisException](
2149+
sql("load data inpath '/doesnotexist.csv' into table tbl"))
2150+
assert(e.message.contains("LOAD DATA input path does not exist"))
2151+
}
2152+
}
21442153
}

0 commit comments

Comments
 (0)