Skip to content

Commit 672a9b7

Browse files
committed
Updates Kotlin to 1.5.21
1 parent a888ea4 commit 672a9b7

File tree

2 files changed

+106
-76
lines changed

2 files changed

+106
-76
lines changed

README.md

Lines changed: 105 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,15 @@
11
# Kotlin for Apache® Spark™ [![Maven Central](https://img.shields.io/maven-central/v/org.jetbrains.kotlinx.spark/kotlin-spark-api-parent.svg?label=Maven%20Central)](https://search.maven.org/search?q=g:org.jetbrains.kotlinx.spark%20AND%20v:1.0.1) [![official JetBrains project](http://jb.gg/badges/incubator.svg)](https://confluence.jetbrains.com/display/ALL/JetBrains+on+GitHub)
22

3+
Your next API to work with [Apache Spark](https://spark.apache.org/).
34

4-
Your next API to work with [Apache Spark](https://spark.apache.org/).
5+
This project adds a missing layer of compatibility between [Kotlin](https://kotlinlang.org/)
6+
and [Apache Spark](https://spark.apache.org/). It allows Kotlin developers to use familiar language features such as
7+
data classes, and lambda expressions as simple expressions in curly braces or method references.
58

6-
This project adds a missing layer of compatibility between [Kotlin](https://kotlinlang.org/) and [Apache Spark](https://spark.apache.org/).
7-
It allows Kotlin developers to use familiar language features such as data classes, and lambda expressions as simple expressions in curly braces or method references.
8-
9-
We have opened a Spark Project Improvement Proposal: [Kotlin support for Apache Spark](http://issues.apache.org/jira/browse/SPARK-32530#) to work with the community towards getting Kotlin support as a first-class citizen in Apache Spark. We encourage you to voice your opinions and participate in the discussion.
9+
We have opened a Spark Project Improvement
10+
Proposal: [Kotlin support for Apache Spark](http://issues.apache.org/jira/browse/SPARK-32530#) to work with the
11+
community towards getting Kotlin support as a first-class citizen in Apache Spark. We encourage you to voice your
12+
opinions and participate in the discussion.
1013

1114
## Table of Contents
1215

@@ -29,139 +32,153 @@ We have opened a Spark Project Improvement Proposal: [Kotlin support for Apache
2932

3033
## Supported versions of Apache Spark
3134

32-
| Apache Spark | Scala | Kotlin for Apache Spark |
33-
|:------------:|:-----------:|:------------:|
34-
| 3.0.0+ | 2.12 | kotlin-spark-api-3.0.0:1.0.1 |
35-
| 2.4.1+ | 2.12 | kotlin-spark-api-2.4_2.12:1.0.1 |
36-
| 2.4.1+ | 2.11 | kotlin-spark-api-2.4_2.11:1.0.1 |
35+
| Apache Spark | Scala | Kotlin for Apache Spark |
36+
|:------------:|:-----:|:-------------------------------:|
37+
| 3.0.0+ | 2.12 | kotlin-spark-api-3.0.0:1.0.1 |
38+
| 2.4.1+ | 2.12 | kotlin-spark-api-2.4_2.12:1.0.1 |
39+
| 2.4.1+ | 2.11 | kotlin-spark-api-2.4_2.11:1.0.1 |
3740

3841
## Releases
3942

40-
The list of Kotlin for Apache Spark releases is available [here](https://github.com/JetBrains/kotlin-spark-api/releases/).
41-
The Kotlin for Spark artifacts adhere to the following convention:
42-
`[Apache Spark version]_[Scala core version]:[Kotlin for Apache Spark API version]`
43+
The list of Kotlin for Apache Spark releases is
44+
available [here](https://github.com/JetBrains/kotlin-spark-api/releases/). The Kotlin for Spark artifacts adhere to the
45+
following convention:
46+
`[Apache Spark version]_[Scala core version]:[Kotlin for Apache Spark API version]`
4347

4448
[![Maven Central](https://img.shields.io/maven-central/v/org.jetbrains.kotlinx.spark/kotlin-spark-api-parent.svg?label=Maven%20Central)](https://search.maven.org/search?q=g:%22org.jetbrains.kotlinx.spark%22%20AND%20a:%22kotlin-spark-api-3.0.0_2.12%22)
4549

4650
## How to configure Kotlin for Apache Spark in your project
4751

48-
You can add Kotlin for Apache Spark as a dependency to your project: `Maven`, `Gradle`, `SBT`, and `leinengen` are supported.
49-
52+
You can add Kotlin for Apache Spark as a dependency to your project: `Maven`, `Gradle`, `SBT`, and `leinengen` are
53+
supported.
54+
5055
Here's an example `pom.xml`:
5156

5257
```xml
58+
5359
<dependency>
54-
<groupId>org.jetbrains.kotlinx.spark</groupId>
55-
<artifactId>kotlin-spark-api-3.0.0</artifactId>
56-
<version>${kotlin-spark-api.version}</version>
60+
<groupId>org.jetbrains.kotlinx.spark</groupId>
61+
<artifactId>kotlin-spark-api-3.0.0</artifactId>
62+
<version>${kotlin-spark-api.version}</version>
5763
</dependency>
5864
<dependency>
59-
<groupId>org.apache.spark</groupId>
60-
<artifactId>spark-sql_2.12</artifactId>
61-
<version>${spark.version}</version>
65+
<groupId>org.apache.spark</groupId>
66+
<artifactId>spark-sql_2.12</artifactId>
67+
<version>${spark.version}</version>
6268
</dependency>
6369
```
6470

6571
Note that `core` is being compiled against Scala version `2.12`.
66-
You can find a complete example with `pom.xml` and `build.gradle` in the [Quick Start Guide](https://github.com/JetBrains/kotlin-spark-api/wiki/Quick-Start-Guide).
72+
You can find a complete example with `pom.xml` and `build.gradle` in
73+
the [Quick Start Guide](https://github.com/JetBrains/kotlin-spark-api/wiki/Quick-Start-Guide).
74+
75+
Once you have configured the dependency, you only need to add the following import to your Kotlin file:
6776

68-
Once you have configured the dependency, you only need to add the following import to your Kotlin file:
6977
```kotlin
7078
import org.jetbrains.kotlinx.spark.api.*
7179
```
7280

7381
## Kotlin for Apache Spark features
7482

7583
### Creating a SparkSession in Kotlin
84+
7685
```kotlin
7786
val spark = SparkSession
78-
.builder()
79-
.master("local[2]")
80-
.appName("Simple Application").orCreate
87+
.builder()
88+
.master("local[2]")
89+
.appName("Simple Application").orCreate
8190

8291
```
8392

8493
### Creating a Dataset in Kotlin
94+
8595
```kotlin
8696
spark.toDS("a" to 1, "b" to 2)
8797
```
98+
8899
The example above produces `Dataset<Pair<String, Int>>`.
89-
100+
90101
### Null safety
91-
There are several aliases in API, like `leftJoin`, `rightJoin` etc. These are null-safe by design.
92-
For example, `leftJoin` is aware of nullability and returns `Dataset<Pair<LEFT, RIGHT?>>`.
93-
Note that we are forcing `RIGHT` to be nullable for you as a developer to be able to handle this situation.
102+
103+
There are several aliases in API, like `leftJoin`, `rightJoin` etc. These are null-safe by design. For
104+
example, `leftJoin` is aware of nullability and returns `Dataset<Pair<LEFT, RIGHT?>>`. Note that we are forcing `RIGHT`
105+
to be nullable for you as a developer to be able to handle this situation.
94106
`NullPointerException`s are hard to debug in Spark, and we doing our best to make them as rare as possible.
95107

96108
### withSpark function
97109

98-
We provide you with useful function `withSpark`, which accepts everything that may be needed to run Spark — properties, name, master location and so on. It also accepts a block of code to execute inside Spark context.
110+
We provide you with useful function `withSpark`, which accepts everything that may be needed to run Spark — properties,
111+
name, master location and so on. It also accepts a block of code to execute inside Spark context.
99112

100113
After work block ends, `spark.stop()` is called automatically.
101114

102115
```kotlin
103116
withSpark {
104117
dsOf(1, 2)
105-
.map { it to it }
106-
.show()
118+
.map { it to it }
119+
.show()
107120
}
108121
```
109122

110123
`dsOf` is just one more way to create `Dataset` (`Dataset<Int>`) from varargs.
111124

112-
### withCached function
113-
It can easily happen that we need to fork our computation to several paths. To compute things only once we should call `cache`
114-
method. However, it becomes difficult to control when we're using cached `Dataset` and when not.
115-
It is also easy to forget to unpersist cached data, which can break things unexpectedly or take up more memory
116-
than intended.
125+
### `withCached` function
126+
127+
It can easily happen that we need to fork our computation to several paths. To compute things only once we should
128+
call `cache`
129+
method. However, it becomes difficult to control when we're using cached `Dataset` and when not. It is also easy to
130+
forget to unpersist cached data, which can break things unexpectedly or take up more memory than intended.
117131

118132
To solve these problems we've added `withCached` function
119133

120134
```kotlin
121135
withSpark {
122136
dsOf(1, 2, 3, 4, 5)
123-
.map { it to (it + 2) }
124-
.withCached {
125-
showDS()
126-
127-
filter { it.first % 2 == 0 }.showDS()
128-
}
129-
.map { c(it.first, it.second, (it.first + it.second) * 2) }
130-
.show()
137+
.map { it to (it + 2) }
138+
.withCached {
139+
showDS()
140+
141+
filter { it.first % 2 == 0 }.showDS()
142+
}
143+
.map { c(it.first, it.second, (it.first + it.second) * 2) }
144+
.show()
131145
}
132146
```
133147

134-
Here we're showing cached `Dataset` for debugging purposes then filtering it.
135-
The `filter` method returns filtered `Dataset` and then the cached `Dataset` is being unpersisted, so we have more memory t
136-
o call the `map` method and collect the resulting `Dataset`.
148+
Here we're showing cached `Dataset` for debugging purposes then filtering it. The `filter` method returns
149+
filtered `Dataset` and then the cached `Dataset` is being unpersisted, so we have more memory t o call the `map` method
150+
and collect the resulting `Dataset`.
137151

138-
### toList and toArray methods
152+
### `toList` and `toArray` methods
139153

140-
For more idiomatic Kotlin code we've added `toList` and `toArray` methods in this API. You can still use the `collect` method as in Scala API, however the result should be casted to `Array`.
141-
This is because `collect` returns a Scala array, which is not the same as Java/Kotlin one.
154+
For more idiomatic Kotlin code we've added `toList` and `toArray` methods in this API. You can still use the `collect`
155+
method as in Scala API, however the result should be casted to `Array`. This is because `collect` returns a Scala array,
156+
which is not the same as Java/Kotlin one.
142157

143158
### Column infix/operator functions
144159

145-
Similar to the Scala API for `Columns`, many of the operator functions could be ported over.
146-
For example:
160+
Similar to the Scala API for `Columns`, many of the operator functions could be ported over. For example:
161+
147162
```kotlin
148-
dataset.select( col("colA") + 5 )
149-
dataset.select( col("colA") / col("colB") )
163+
dataset.select(col("colA") + 5)
164+
dataset.select(col("colA") / col("colB"))
150165

151-
dataset.where( col("colA") `===` 6 )
166+
dataset.where(col("colA") `===` 6)
152167
// or alternatively
153-
dataset.where( col("colA") eq 6)
168+
dataset.where(col("colA") eq 6)
154169
```
170+
155171
In short, all supported operators are:
172+
156173
- `==`,
157-
- `!=`,
174+
- `!=`,
158175
- `eq` / `` `===` ``,
159176
- `neq` / `` `=!=` ``,
160177
- `-col(...)`,
161-
- `!col(...)`,
178+
- `!col(...)`,
162179
- `gt`,
163180
- `lt`,
164-
- `geq`,
181+
- `geq`,
165182
- `leq`,
166183
- `or`,
167184
- `and` / `` `&&` ``,
@@ -173,43 +190,56 @@ In short, all supported operators are:
173190

174191
Secondly, there are some quality of life additions as well:
175192

176-
In Kotlin, Ranges are often
177-
used to solve inclusive/exclusive situations for a range. So, you can now do:
193+
In Kotlin, Ranges are often used to solve inclusive/exclusive situations for a range. So, you can now do:
194+
178195
```kotlin
179-
dataset.where( col("colA") inRangeOf 0..2 )
196+
dataset.where(col("colA") inRangeOf 0..2)
180197
```
198+
181199
Also, for columns containing map- or array like types:
200+
182201
```kotlin
183-
dataset.where( col("colB")[0] geq 5 )
202+
dataset.where(col("colB")[0] geq 5)
184203
```
185204

186-
Finally, thanks to Kotlin reflection, we can provide a type- and refactor safe way
187-
to create `TypedColumn`s and with those a new Dataset from pieces of another using the `selectTyped()` function, added to the API:
205+
Finally, thanks to Kotlin reflection, we can provide a type- and refactor safe way to create `TypedColumn`s and with
206+
those a new Dataset from pieces of another using the `selectTyped()` function, added to the API:
207+
188208
```kotlin
189209
val dataset: Dataset<YourClass> = ...
190210
val newDataset: Dataset<Pair<TypeA, TypeB>> = dataset.selectTyped(col(YourClass::colA), col(YourClass::colB))
191211
```
192212

193213
### `reduceGroups`
194214

195-
We had to implemet `reduceGroups` operator for Kotlin separately as `reduceGroupsK` function, because otherwise it caused resolution ambiguity between Kotlin, Scala and Java APIs, which was quite hard to solve.
196-
197-
We have a special example of work with this function in the [Groups example](https://github.com/JetBrains/kotlin-spark-api/edit/main/examples/src/main/kotlin/org/jetbrains/kotlinx/spark/examples/Group.kt).
215+
We had to implemet `reduceGroups` operator for Kotlin separately as `reduceGroupsK` function, because otherwise it
216+
caused resolution ambiguity between Kotlin, Scala and Java APIs, which was quite hard to solve.
198217

218+
We have a special example of work with this function in
219+
the [Groups example](https://github.com/JetBrains/kotlin-spark-api/edit/main/examples/src/main/kotlin/org/jetbrains/kotlinx/spark/examples/Group.kt)
220+
.
199221

200222
## Examples
201223

202-
For more, check out [examples](https://github.com/JetBrains/kotlin-spark-api/tree/master/examples/src/main/kotlin/org/jetbrains/kotlinx/spark/examples) module.
203-
To get up and running quickly, check out this [tutorial](https://github.com/JetBrains/kotlin-spark-api/wiki/Quick-Start-Guide).
224+
For more, check
225+
out [examples](https://github.com/JetBrains/kotlin-spark-api/tree/master/examples/src/main/kotlin/org/jetbrains/kotlinx/spark/examples)
226+
module. To get up and running quickly, check out
227+
this [tutorial](https://github.com/JetBrains/kotlin-spark-api/wiki/Quick-Start-Guide).
204228

205229
## Reporting issues/Support
206-
Please use [GitHub issues](https://github.com/JetBrains/kotlin-spark-api/issues) for filing feature requests and bug reports.
207-
You are also welcome to join [kotlin-spark channel](https://kotlinlang.slack.com/archives/C015B9ZRGJF) in the Kotlin Slack.
230+
231+
Please use [GitHub issues](https://github.com/JetBrains/kotlin-spark-api/issues) for filing feature requests and bug
232+
reports. You are also welcome to join [kotlin-spark channel](https://kotlinlang.slack.com/archives/C015B9ZRGJF) in the
233+
Kotlin Slack.
208234

209235
## Code of Conduct
210-
This project and the corresponding community is governed by the [JetBrains Open Source and Community Code of Conduct](https://confluence.jetbrains.com/display/ALL/JetBrains+Open+Source+and+Community+Code+of+Conduct). Please make sure you read it.
236+
237+
This project and the corresponding community is governed by
238+
the [JetBrains Open Source and Community Code of Conduct](https://confluence.jetbrains.com/display/ALL/JetBrains+Open+Source+and+Community+Code+of+Conduct)
239+
. Please make sure you read it.
211240

212241
## License
242+
213243
Kotlin for Apache Spark is licensed under the [Apache 2.0 License](LICENSE).
214244

215245

pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
<packaging>pom</packaging>
1111

1212
<properties>
13-
<kotlin.version>1.5.20</kotlin.version>
13+
<kotlin.version>1.5.21</kotlin.version>
1414
<dokka.version>1.4.32</dokka.version>
1515
<atrium.version>0.16.0</atrium.version>
1616
<kotest.version>4.6.0</kotest.version>

0 commit comments

Comments
 (0)