@@ -12,15 +12,15 @@ Read Configuration Options
1212
1313.. _spark-input-conf:
1414
15- Input Configuration
16- --------------------
15+ Read Configuration
16+ ------------------
1717
1818You can configure the following properties to read from MongoDB:
1919
2020.. note::
2121
22- If you use ``SparkConf`` to set the connector's input configurations,
23- prefix ``spark.mongodb.input .`` to each property.
22+ If you use ``SparkConf`` to set the connector's read configurations,
23+ prefix ``spark.mongodb.read .`` to each property.
2424
2525.. list-table::
2626 :header-rows: 1
@@ -36,7 +36,7 @@ You can configure the following properties to read from MongoDB:
3636 address, or UNIX domain socket. If the connection string doesn't
3737 specify a ``port``, it uses the default MongoDB port, ``27017``.
3838
39- You can append the other remaining input options to the ``uri``
39+ You can append the other remaining read options to the ``uri``
4040 setting. See :ref:`configure-input-uri`.
4141
4242 * - ``database``
@@ -80,58 +80,45 @@ You can configure the following properties to read from MongoDB:
8080
8181 The connector provides the following partitioners:
8282
83- - ``MongoDefaultPartitioner``
84- **Default**. Wraps the MongoSamplePartitioner and provides
85- help for users of older versions of MongoDB.
86-
87- - ``MongoSamplePartitioner``
88- **Requires MongoDB 3.2**. A general purpose partitioner for
83+ - ``SamplePartitioner``
84+ A general purpose partitioner for
8985 all deployments. Uses the average document size and random
9086 sampling of the collection to determine suitable
91- partitions for the collection. For configuration
92- settings for the MongoSamplePartitioner, see
93- :ref:`conf-mongosamplepartitioner `.
87+ partitions for the collection. To learn more about the
88+ ``SamplePartitioner``, see the
89+ :ref:`configuration settings description < conf-samplepartitioner> `.
9490
95- - ``MongoShardedPartitioner ``
91+ - ``ShardedPartitioner ``
9692 A partitioner for sharded clusters. Partitions the
9793 collection based on the data chunks. Requires read access
98- to the ``config`` database. For configuration settings for
99- the MongoShardedPartitioner, see
100- :ref:`conf-mongoshardedpartitioner`.
101-
102- - ``MongoSplitVectorPartitioner``
103- A partitioner for standalone or replica sets. Uses the
104- :dbcommand:`splitVector` command on the standalone or the
105- primary to determine the partitions of the database.
106- Requires privileges to run :dbcommand:`splitVector`
107- command. For configuration settings for the
108- MongoSplitVectorPartitioner, see
109- :ref:`conf-mongosplitvectorpartitioner`.
110-
111- - ``MongoPaginateByCountPartitioner``
112- A slow, general purpose partitioner for all deployments.
113- Creates a specific number of partitions. Requires a query
114- for every partition. For configuration settings for the
115- MongoPaginateByCountPartitioner, see
116- :ref:`conf-mongopaginatebycountpartitioner`.
94+ to the ``config`` database. To learn more about the
95+ ``ShardedPartitioner``, see the
96+ :ref:`configuration settings description <conf-shardedpartitioner>`.
11797
118- - ``MongoPaginateBySizePartitioner ``
98+ - ``PaginateBySizePartitioner ``
11999 A slow, general purpose partitioner for all deployments.
120100 Creates partitions based on data size. Requires a query
121- for every partition. For configuration settings for the
122- MongoPaginateBySizePartitioner, see
123- :ref:`conf-mongopaginatebysizepartitioner`.
101+ for every partition. To learn more about the
102+ ``PaginateBySizePartitioner``, see the
103+ :ref:`configuration settings description <conf-paginatebysizepartitioner>`.
104+
105+ - ``PaginateIntoPartitionsPartitioner``
106+ A partitioner that creates a specified number of
107+ partitions for a collection. Uses the total number of
108+ documents and the average document size in a collection to
109+ calculate the number of documents to include in each
110+ partition. To learn more about the
111+ ``PaginateIntoPartitionsPartitioner``, see the
112+ :ref:`configuration settings description <conf-paginateintopartitionspartitioner>`.
124113
125114 You can also specify a custom partitioner implementation. For
126- custom implementations of the ``MongoPartitioner`` trait, provide
127- the full class name. If you don't provide package names, then
128- this property uses the default package,
129- ``com.mongodb.spark.rdd.partitioner``.
115+ custom implementations of the ``Partitioner`` interface, provide
116+ the full class name.
130117
131118 To configure options for the various partitioners, see
132119 :ref:`partitioner-conf`.
133120
134- **Default:** ``MongoDefaultPartitioner ``
121+ **Default:** ``SamplePartitioner ``
135122
136123 * - ``partitionerOptions``
137124 - The custom options to configure the partitioner.
@@ -182,9 +169,10 @@ Partitioner Configuration
182169~~~~~~~~~~~~~~~~~~~~~~~~~
183170
184171.. _conf-mongosamplepartitioner:
172+ .. _conf-samplepartitioner:
185173
186- ``MongoSamplePartitioner `` Configuration
187- ````````````````````````````````````````
174+ ``SamplePartitioner `` Configuration
175+ ```````````````````````````````````
188176
189177.. include:: /includes/sparkconf-partitioner-options-note.rst
190178
@@ -195,19 +183,20 @@ Partitioner Configuration
195183 * - Property name
196184 - Description
197185
198- * - ``partitionKey``
199- - The field by which to split the collection data. The field
200- should be indexed and contain unique values.
186+ * - ``partitioner.options.partitionKey``
187+ - A comma-delimited list of fields by which to split the
188+ collection data. The fields should be indexed and contain unique
189+ values.
201190
202191 **Default:** ``_id``
203192
204- * - ``partitionSizeMB``
193+ * - ``partitioner.options. partitionSizeMB``
205194 - The size (in MB) for each partition. Smaller partition sizes
206195 create more partitions containing fewer documents.
207196
208197 **Default:** ``64``
209198
210- * - ``samplesPerPartition``
199+ * - ``partitioner.options. samplesPerPartition``
211200 - The number of sample documents to take for each partition in
212201 order to establish a ``partitionKey`` range for each partition.
213202
@@ -230,41 +219,29 @@ Partitioner Configuration
230219.. example::
231220
232221 For a collection with 640 documents with an average document
233- size of 0.5 MB, the default ``MongoSamplePartitioner `` configuration
222+ size of 0.5 MB, the default ``SamplePartitioner `` configuration
234223 values creates 5 partitions with 128 documents per partition.
235224
236225 The MongoDB Spark Connector samples 50 documents (the default 10
237226 per intended partition) and defines 5 partitions by selecting
238227 ``partitionKey`` ranges from the sampled documents.
239228
240229.. _conf-mongoshardedpartitioner:
230+ .. _conf-shardedpartitioner:
241231
242- ``MongoShardedPartitioner`` Configuration
243- `````````````````````````````````````````
244-
245- .. include:: /includes/sparkconf-partitioner-options-note.rst
246-
247- .. list-table::
248- :header-rows: 1
249- :widths: 35 65
250-
251- * - Property name
252- - Description
253-
254- * - ``shardkey``
255- - The field by which to split the collection data. The field
256- should be indexed.
257-
258- **Default:** ``_id``
232+ ``ShardedPartitioner`` Configuration
233+ ````````````````````````````````````
259234
260- .. important::
235+ The ``ShardedPartitioner`` automatically determines
236+ partitions to use based on your shard configuration.
261237
262- This property is not compatible with hashed shard keys.
238+ This partitioner is not compatible with hashed shard keys.
263239
264- .. _conf-mongosplitvectorpartitioner:
240+ .. _conf-mongopaginatebysizepartitioner:
241+ .. _conf-paginatebysizepartitioner:
265242
266- ``MongoSplitVectorPartitioner `` Configuration
267- `````````````````````````````````````````````
243+ ``PaginateBySizePartitioner `` Configuration
244+ ```````````````````````````````````````````
268245
269246.. include:: /includes/sparkconf-partitioner-options-note.rst
270247
@@ -275,22 +252,23 @@ Partitioner Configuration
275252 * - Property name
276253 - Description
277254
278- * - ``partitionKey``
279- - The field by which to split the collection data. The field
280- should be indexed and contain unique values.
255+ * - ``partitioner.options.partitionKey``
256+ - A comma-delimited list of fields by which to split the
257+ collection data. The fields should be indexed and contain unique
258+ values.
281259
282260 **Default:** ``_id``
283261
284- * - ``partitionSizeMB``
262+ * - ``partitioner.options. partitionSizeMB``
285263 - The size (in MB) for each partition. Smaller partition sizes
286264 create more partitions containing fewer documents.
287265
288266 **Default:** ``64``
289267
290- .. _conf-mongopaginatebycountpartitioner :
268+ .. _conf-paginateintopartitionspartitioner :
291269
292- ``MongoPaginateByCountPartitioner `` Configuration
293- `````````````````````````````````````````````````
270+ ``PaginateIntoPartitionsPartitioner `` Configuration
271+ ```````````````````````````````````````````````````
294272
295273.. include:: /includes/sparkconf-partitioner-options-note.rst
296274
@@ -301,41 +279,15 @@ Partitioner Configuration
301279 * - Property name
302280 - Description
303281
304- * - ``partitionKey``
305- - The field by which to split the collection data. The field
306- should be indexed and contain unique values.
282+ * - ``partitioner.options.partitionKey``
283+ - A comma-delimited list of fields by which to split the
284+ collection data. The fields should be indexed and contain unique
285+ values.
307286
308287 **Default:** ``_id``
309288
310- * - ``numberOfPartitions``
311- - The number of partitions to create. A greater number of
312- partitions means fewer documents per partition.
313-
314- **Default:** ``64``
315-
316- .. _conf-mongopaginatebysizepartitioner:
317-
318- ``MongoPaginateBySizePartitioner`` Configuration
319- ````````````````````````````````````````````````
320-
321- .. include:: /includes/sparkconf-partitioner-options-note.rst
322-
323- .. list-table::
324- :header-rows: 1
325- :widths: 35 65
326-
327- * - Property name
328- - Description
329-
330- * - ``partitionKey``
331- - The field by which to split the collection data. The field
332- should be indexed and contain unique values.
333-
334- **Default:** ``_id``
335-
336- * - ``partitionSizeMB``
337- - The size (in MB) for each partition. Smaller partition sizes
338- create more partitions containing fewer documents.
289+ * - ``partitioner.options.maxNumberOfPartitions``
290+ - The number of partitions to create.
339291
340292 **Default:** ``64``
341293
@@ -344,36 +296,36 @@ Partitioner Configuration
344296``uri`` Configuration Setting
345297-----------------------------
346298
347- You can set all :ref:`spark-input-conf` via the input ``uri`` setting.
299+ You can set all :ref:`spark-input-conf` via the read ``uri`` setting.
348300
349- For example, consider the following example which sets the input
301+ For example, consider the following example which sets the read
350302``uri`` setting via ``SparkConf``:
351303
352304.. note::
353305
354- If you use ``SparkConf`` to set the connector's input configurations,
355- prefix ``spark.mongodb.input .`` to the setting.
306+ If you use ``SparkConf`` to set the connector's read configurations,
307+ prefix ``spark.mongodb.read .`` to the setting.
356308
357309.. code:: cfg
358310
359- spark.mongodb.input .uri=mongodb://127.0.0.1/databaseName.collectionName?readPreference=primaryPreferred
311+ spark.mongodb.read .uri=mongodb://127.0.0.1/databaseName.collectionName?readPreference=primaryPreferred
360312
361313The configuration corresponds to the following separate configuration
362314settings:
363315
364316.. code:: cfg
365317
366- spark.mongodb.input .uri=mongodb://127.0.0.1/
367- spark.mongodb.input .database=databaseName
368- spark.mongodb.input .collection=collectionName
369- spark.mongodb.input .readPreference.name=primaryPreferred
318+ spark.mongodb.read .uri=mongodb://127.0.0.1/
319+ spark.mongodb.read .database=databaseName
320+ spark.mongodb.read .collection=collectionName
321+ spark.mongodb.read .readPreference.name=primaryPreferred
370322
371323If you specify a setting both in the ``uri`` and in a separate
372324configuration, the ``uri`` setting overrides the separate
373- setting. For example, given the following configuration, the input
325+ setting. For example, given the following configuration, the
374326database for the connection is ``foobar``:
375327
376328.. code:: cfg
377329
378- spark.mongodb.input .uri=mongodb://127.0.0.1/foobar
379- spark.mongodb.input .database=bar
330+ spark.mongodb.read .uri=mongodb://127.0.0.1/foobar
331+ spark.mongodb.read .database=bar
0 commit comments