Commit fa757ee
[SPARK-20883][SPARK-20376][SS] Refactored StateStore APIs and added conf to choose implementation
## What changes were proposed in this pull request?
A bunch of changes to the StateStore APIs and implementation.
Current state store API has a bunch of problems that causes too many transient objects causing memory pressure.
- `StateStore.get(): Option` forces creation of Some/None objects for every get. Changed this to return the row or null.
- `StateStore.iterator(): (UnsafeRow, UnsafeRow)` forces creation of new tuple for each record returned. Changed this to return a UnsafeRowTuple which can be reused across records.
- `StateStore.updates()` requires the implementation to keep track of updates, while this is used minimally (only by Append mode in streaming aggregations). Removed updates() and updated StateStoreSaveExec accordingly.
- `StateStore.filter(condition)` and `StateStore.remove(condition)` has been merge into a single API `getRange(start, end)` which allows a state store to do optimized range queries (i.e. avoid full scans). Stateful operators have been updated accordingly.
- Removed a lot of unnecessary row copies Each operator copied rows before calling StateStore.put() even if the implementation does not require it to be copied. It is left up to the implementation on whether to copy the row or not.
Additionally,
- Added a name to the StateStoreId so that each operator+partition can use multiple state stores (different names)
- Added a configuration that allows the user to specify which implementation to use.
- Added new metrics to understand the time taken to update keys, remove keys and commit all changes to the state store. These metrics will be visible on the plan diagram in the SQL tab of the UI.
- Refactored unit tests such that they can be reused to test any implementation of StateStore.
## How was this patch tested?
Old and new unit tests
Author: Tathagata Das <[email protected]>
Closes #18107 from tdas/SPARK-20376.1 parent 4bb6a53 commit fa757ee
File tree
12 files changed
+695
-588
lines changed- sql
- catalyst/src/main/scala/org/apache/spark/sql/internal
- core/src
- main/scala/org/apache/spark/sql/execution/streaming
- state
- test/scala/org/apache/spark/sql
- execution/streaming/state
- streaming
12 files changed
+695
-588
lines changedLines changed: 11 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
552 | 552 | | |
553 | 553 | | |
554 | 554 | | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
555 | 564 | | |
556 | 565 | | |
557 | 566 | | |
| |||
828 | 837 | | |
829 | 838 | | |
830 | 839 | | |
| 840 | + | |
| 841 | + | |
831 | 842 | | |
832 | 843 | | |
833 | 844 | | |
| |||
Lines changed: 23 additions & 16 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
109 | 109 | | |
110 | 110 | | |
111 | 111 | | |
| 112 | + | |
112 | 113 | | |
113 | 114 | | |
114 | 115 | | |
| 116 | + | |
115 | 117 | | |
116 | 118 | | |
117 | 119 | | |
| |||
191 | 193 | | |
192 | 194 | | |
193 | 195 | | |
194 | | - | |
195 | | - | |
| 196 | + | |
| 197 | + | |
196 | 198 | | |
197 | 199 | | |
198 | | - | |
199 | | - | |
| 200 | + | |
| 201 | + | |
200 | 202 | | |
201 | 203 | | |
202 | 204 | | |
| |||
205 | 207 | | |
206 | 208 | | |
207 | 209 | | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
208 | 215 | | |
209 | 216 | | |
210 | 217 | | |
211 | 218 | | |
212 | | - | |
| 219 | + | |
213 | 220 | | |
214 | 221 | | |
215 | 222 | | |
216 | 223 | | |
217 | | - | |
| 224 | + | |
218 | 225 | | |
219 | | - | |
| 226 | + | |
220 | 227 | | |
221 | 228 | | |
222 | 229 | | |
| |||
249 | 256 | | |
250 | 257 | | |
251 | 258 | | |
252 | | - | |
253 | | - | |
254 | | - | |
255 | | - | |
| 259 | + | |
256 | 260 | | |
257 | 261 | | |
258 | 262 | | |
259 | | - | |
| 263 | + | |
260 | 264 | | |
261 | 265 | | |
262 | 266 | | |
| |||
269 | 273 | | |
270 | 274 | | |
271 | 275 | | |
272 | | - | |
| 276 | + | |
273 | 277 | | |
274 | 278 | | |
275 | 279 | | |
| |||
280 | 284 | | |
281 | 285 | | |
282 | 286 | | |
283 | | - | |
284 | | - | |
| 287 | + | |
| 288 | + | |
285 | 289 | | |
286 | 290 | | |
287 | 291 | | |
288 | 292 | | |
| 293 | + | |
289 | 294 | | |
290 | 295 | | |
291 | 296 | | |
292 | 297 | | |
293 | 298 | | |
294 | | - | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
295 | 302 | | |
296 | 303 | | |
297 | 304 | | |
| |||
0 commit comments