Commit faedcd9
[SPARK-41970] Introduce SparkPath for typesafety
### What changes were proposed in this pull request?
This PR proposes a strongly typed `SparkPath` that encapsulates a url-encoded string. It has helper methods for creating hadoop paths, uris, and uri-encoded strings.
The intent is to identify and fix various bugs in the way that Spark handles these paths. To do this we introduced the SparkPath type to `PartitionFile` (a widely used class), and then started fixing compile errors. In doing so we fixed various bugs.
### Why are the changes needed?
Given `val str = "s3://bucket/path with space/a"` There is a difference between `new Path(str)` and `new Path(new URI(str))`, and thus a difference between `new URI(str)` and `new Path(str).toUri`.
Both `URI` and `Path` are symmetric in construction and `toString`, but are not interchangeable. Spark confuses these two paths (uri-encoded vs not). This PR attempts to use types to disambiguate them.
### Does this PR introduce _any_ user-facing change?
This PR proposes changing the public API of `PartitionedFile`, and various other methods in the name of type safety. It needs to be clear to callers of an API what type of path string is expected.
### How was this patch tested?
We rely on existing tests, and update the default temp path creation to include paths with spaces.
Closes #39488 from databricks-david-lewis/SPARK_PATH.
Authored-by: David Lewis <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>1 parent 498b3ec commit faedcd9
File tree
42 files changed
+216
-133
lines changed- connector/avro/src
- main/scala/org/apache/spark/sql
- avro
- v2/avro
- test/scala/org/apache/spark/sql/avro
- core/src/main/scala/org/apache/spark
- deploy/worker
- paths
- rpc
- mllib/src/main/scala/org/apache/spark/ml/source/image
- sql
- core/src
- main/scala/org/apache/spark/sql
- execution
- datasources
- binaryfile
- csv
- json
- orc
- parquet
- v2
- csv
- orc
- parquet
- streaming
- test/scala/org/apache/spark/sql
- execution/datasources
- binaryfile
- streaming
- hive/src/main/scala/org/apache/spark/sql/hive/orc
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
42 files changed
+216
-133
lines changedLines changed: 2 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
22 | 21 | | |
23 | 22 | | |
24 | 23 | | |
| |||
96 | 95 | | |
97 | 96 | | |
98 | 97 | | |
99 | | - | |
| 98 | + | |
100 | 99 | | |
101 | | - | |
| 100 | + | |
102 | 101 | | |
103 | 102 | | |
104 | 103 | | |
| |||
Lines changed: 2 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
20 | | - | |
21 | 19 | | |
22 | 20 | | |
23 | 21 | | |
24 | 22 | | |
25 | 23 | | |
26 | | - | |
27 | 24 | | |
28 | 25 | | |
29 | 26 | | |
| |||
62 | 59 | | |
63 | 60 | | |
64 | 61 | | |
65 | | - | |
| 62 | + | |
66 | 63 | | |
67 | | - | |
| 64 | + | |
68 | 65 | | |
69 | 66 | | |
70 | 67 | | |
| |||
Lines changed: 2 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
22 | 21 | | |
23 | 22 | | |
24 | 23 | | |
25 | 24 | | |
26 | 25 | | |
27 | | - | |
28 | 26 | | |
29 | 27 | | |
30 | 28 | | |
| |||
62 | 60 | | |
63 | 61 | | |
64 | 62 | | |
65 | | - | |
66 | | - | |
| 63 | + | |
| 64 | + | |
67 | 65 | | |
68 | 66 | | |
69 | 67 | | |
| |||
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2357 | 2357 | | |
2358 | 2358 | | |
2359 | 2359 | | |
2360 | | - | |
| 2360 | + | |
| 2361 | + | |
2361 | 2362 | | |
2362 | 2363 | | |
2363 | 2364 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
50 | | - | |
| 50 | + | |
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
| |||
Lines changed: 55 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
42 | | - | |
| 42 | + | |
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
| |||
Lines changed: 3 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
| 22 | + | |
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| |||
71 | 71 | | |
72 | 72 | | |
73 | 73 | | |
74 | | - | |
75 | | - | |
| 74 | + | |
| 75 | + | |
76 | 76 | | |
77 | 77 | | |
78 | 78 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
437 | 437 | | |
438 | 438 | | |
439 | 439 | | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
440 | 448 | | |
Lines changed: 4 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
| 37 | + | |
37 | 38 | | |
38 | 39 | | |
39 | 40 | | |
| |||
3924 | 3925 | | |
3925 | 3926 | | |
3926 | 3927 | | |
3927 | | - | |
| 3928 | + | |
3928 | 3929 | | |
3929 | 3930 | | |
3930 | 3931 | | |
3931 | 3932 | | |
3932 | 3933 | | |
3933 | | - | |
| 3934 | + | |
3934 | 3935 | | |
3935 | 3936 | | |
3936 | 3937 | | |
3937 | 3938 | | |
3938 | | - | |
| 3939 | + | |
3939 | 3940 | | |
3940 | 3941 | | |
3941 | 3942 | | |
| |||
0 commit comments