Skip to content

Commit 45a1935

Browse files
anishshri-dbHeartSaVioR
authored andcommitted
[SPARK-43364][SS][DOCS] Add docs for RocksDB state store memory management
### What changes were proposed in this pull request? Add docs for RocksDB state store memory management ### Why are the changes needed? Docs only change ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? N/A Closes #41042 from anishshri-db/task/SPARK-43364. Authored-by: Anish Shrigondekar <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]>
1 parent 63cb093 commit 45a1935

File tree

1 file changed

+27
-0
lines changed

1 file changed

+27
-0
lines changed

docs/structured-streaming-programming-guide.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2360,8 +2360,35 @@ Here are the configs regarding to RocksDB instance of the state store provider:
23602360
<td>The maximum number of MemTables in RocksDB, both active and immutable. Value of -1 means that RocksDB internal default values will be used</td>
23612361
<td>-1</td>
23622362
</tr>
2363+
<tr>
2364+
<td>spark.sql.streaming.stateStore.rocksdb.boundedMemoryUsage</td>
2365+
<td>Whether total memory usage for RocksDB state store instances on a single node is bounded.</td>
2366+
<td>false</td>
2367+
</tr>
2368+
<tr>
2369+
<td>spark.sql.streaming.stateStore.rocksdb.maxMemoryUsageMB</td>
2370+
<td>Total memory limit in MB for RocksDB state store instances on a single node.</td>
2371+
<td>500</td>
2372+
</tr>
2373+
<tr>
2374+
<td>spark.sql.streaming.stateStore.rocksdb.writeBufferCacheRatio</td>
2375+
<td>Total memory to be occupied by write buffers as a fraction of memory allocated across all RocksDB instances on a single node using maxMemoryUsageMB.</td>
2376+
<td>0.5</td>
2377+
</tr>
2378+
<tr>
2379+
<td>spark.sql.streaming.stateStore.rocksdb.highPriorityPoolRatio</td>
2380+
<td>Total memory to be occupied by blocks in high priority pool as a fraction of memory allocated across all RocksDB instances on a single node using maxMemoryUsageMB.</td>
2381+
<td>0.1</td>
2382+
</tr>
23632383
</table>
23642384

2385+
##### RocksDB State Store Memory Management
2386+
RocksDB allocates memory for different objects such as memtables, block cache and filter/index blocks. If left unbounded, RocksDB memory usage across multiple instances could grow indefinitely and potentially cause OOM (out-of-memory) issues.
2387+
RocksDB provides a way to limit the memory usage for all DB instances running on a single node by using the write buffer manager functionality.
2388+
If you want to cap RocksDB memory usage in your Spark Structured Streaming deployment, this feature can be enabled by setting the `spark.sql.streaming.stateStore.rocksdb.boundedMemoryUsage` config to `true`.
2389+
You can also determine the max allowed memory for RocksDB instances by setting the `spark.sql.streaming.stateStore.rocksdb.maxMemoryUsageMB` value to a static number or as a fraction of the physical memory available on the node.
2390+
Limits for individual RocksDB instances can also be configured by setting `spark.sql.streaming.stateStore.rocksdb.writeBufferSizeMB` and `spark.sql.streaming.stateStore.rocksdb.maxWriteBufferNumber` to the required values. By default, RocksDB internal defaults are used for these settings.
2391+
23652392
##### Performance-aspect considerations
23662393

23672394
1. You may want to disable the track of total number of rows to aim the better performance on RocksDB state store.

0 commit comments

Comments
 (0)