You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Kotlin for Apache® Spark™ [](https://search.maven.org/search?q=g:org.jetbrains.kotlinx.spark%20AND%20v:1.0.2)[](https://confluence.jetbrains.com/display/ALL/JetBrains+on+GitHub)[](https://gitter.im/JetBrains/kotlin-spark-api?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
1
+
# Kotlin for Apache® Spark™ [](https://search.maven.org/search?q=g:org.jetbrains.kotlinx.spark%20AND%20v:1.1.0)[](https://confluence.jetbrains.com/display/ALL/JetBrains+on+GitHub)[](https://gitter.im/JetBrains/kotlin-spark-api?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
2
2
3
3
4
4
Your next API to work with [Apache Spark](https://spark.apache.org/).
@@ -31,20 +31,21 @@ We have opened a Spark Project Improvement Proposal: [Kotlin support for Apache
## How to configure Kotlin for Apache Spark in your project
50
51
@@ -55,7 +56,7 @@ Here's an example `pom.xml`:
55
56
```xml
56
57
<dependency>
57
58
<groupId>org.jetbrains.kotlinx.spark</groupId>
58
-
<artifactId>kotlin-spark-api-3.0</artifactId>
59
+
<artifactId>kotlin-spark-api-3.1</artifactId>
59
60
<version>${kotlin-spark-api.version}</version>
60
61
</dependency>
61
62
<dependency>
@@ -79,25 +80,28 @@ The Kotlin Spark API also supports Kotlin Jupyter notebooks.
79
80
To it, simply add
80
81
81
82
```jupyterpython
82
-
%use kotlin-spark-api
83
+
%use spark
83
84
```
84
85
to the top of your notebook. This will get the latest version of the API, together with the latest version of Spark.
85
86
To define a certain version of Spark or the API itself, simply add it like this:
86
87
```jupyterpython
87
-
%use kotlin-spark-api(spark=3.2, version=1.0.4)
88
+
%use spark(spark=3.2, v=1.1.0)
88
89
```
89
90
90
91
Inside the notebook a Spark session will be initiated automatically. This can be accessed via the `spark` value.
91
92
`sc: JavaSparkContext` can also be accessed directly. The API operates pretty similarly.
92
93
93
94
There is also support for HTML rendering of Datasets and simple (Java)RDDs.
95
+
Check out the [example](examples/src/main/kotlin/org/jetbrains/kotlinx/spark/examples/JupyterExample.ipynb) as well.
96
+
94
97
95
98
To use Spark Streaming abilities, instead use
96
99
```jupyterpython
97
-
%use kotlin-spark-api-streaming
100
+
%use spark-streaming
98
101
```
99
102
This does not start a Spark session right away, meaning you can call `withSparkStreaming(batchDuration) {}`
100
103
in whichever cell you want.
104
+
Check out the [example](examples/src/main/kotlin/org/jetbrains/kotlinx/spark/examples/streaming/JupyterStreamingExample.ipynb).
101
105
102
106
## Kotlin for Apache Spark features
103
107
@@ -115,14 +119,19 @@ This is not needed when running the Kotlin Spark API from a Jupyter notebook.
115
119
```kotlin
116
120
spark.dsOf("a" to 1, "b" to 2)
117
121
```
118
-
The example above produces `Dataset<Pair<String, Int>>`. While Kotlin Pairs and Triples are supported, Scala Tuples are reccomended for better support.
122
+
The example above produces `Dataset<Pair<String, Int>>`. While Kotlin Pairs and Triples are supported, Scala Tuples are
123
+
recommended for better support.
119
124
120
125
### Null safety
121
126
There are several aliases in API, like `leftJoin`, `rightJoin` etc. These are null-safe by design.
122
127
For example, `leftJoin` is aware of nullability and returns `Dataset<Pair<LEFT, RIGHT?>>`.
123
128
Note that we are forcing `RIGHT` to be nullable for you as a developer to be able to handle this situation.
124
-
`NullPointerException`s are hard to debug in Spark, and we doing our best to make them as rare as possible.
129
+
`NullPointerException`s are hard to debug in Spark, and we're doing our best to make them as rare as possible.
130
+
131
+
In Spark, you might also come across Scala-native `Option<*>` or Java-compatible `Optional<*>` classes.
132
+
We provide `getOrNull()` and `getOrElse()` functions for these to use Kotlin's null safety for good.
125
133
134
+
Similarly, you can also create `Option<*>`s and `Optional<*>`s like `T?.toOptional()` if a Spark function requires it.
126
135
### withSpark function
127
136
128
137
We provide you with useful function `withSpark`, which accepts everything that may be needed to run Spark — properties, name, master location and so on. It also accepts a block of code to execute inside Spark context.
@@ -134,8 +143,8 @@ Do not use this when running the Kotlin Spark API from a Jupyter notebook.
134
143
```kotlin
135
144
withSpark {
136
145
dsOf(1, 2)
137
-
.map { it X it } // creates Tuple2<Int, Int>
138
-
.show()
146
+
.map { it X it } // creates Tuple2<Int, Int>
147
+
.show()
139
148
}
140
149
```
141
150
@@ -152,14 +161,14 @@ To solve these problems we've added `withCached` function
0 commit comments