@@ -46,6 +46,10 @@ You can configure the following properties when reading data from MongoDB in str
4646 * - ``collection``
4747 - | **Required.**
4848 | The collection name configuration.
49+ | You can specify multiple collections by separating the collection names
50+ with a comma.
51+ |
52+ | To learn more about specifying multiple collections, see :ref:`spark-specify-multiple-collections`.
4953
5054 * - ``comment``
5155 - | The comment to append to the read operation. Comments appear in the
@@ -168,7 +172,7 @@ You can configure the following properties when reading a change stream from Mon
168172 omit the ``fullDocument`` field and publishes only the value of the
169173 field.
170174 - If you don't specify a schema, the connector infers the schema
171- from the change stream document rather than from the underlying collection .
175+ from the change stream document.
172176
173177 **Default**: ``false``
174178
@@ -203,4 +207,91 @@ You can configure the following properties when reading a change stream from Mon
203207Specifying Properties in ``connection.uri``
204208-------------------------------------------
205209
206- .. include:: /includes/connection-read-config.rst
210+ .. include:: /includes/connection-read-config.rst
211+
212+ .. _spark-specify-multiple-collections:
213+
214+ Specifying Multiple Collections in the ``collection`` Property
215+ --------------------------------------------------------------
216+
217+ You can specify multiple collections in the ``collection`` change stream
218+ configuration property by separating the collection names
219+ with a comma. Do not add a space between the collections unless the space is a
220+ part of the collection name.
221+
222+ Specify multiple collections as shown in the following example:
223+
224+ .. code-block:: java
225+
226+ ...
227+ .option("spark.mongodb.collection", "collectionOne,collectionTwo")
228+
229+ If a collection name is "*", or if the name includes a comma or a backslash (\\),
230+ you must escape the character as follows:
231+
232+ - If the name of a collection used in your ``collection`` configuration
233+ option contains a comma, the {+connector-short+} treats it as two different
234+ collections. To avoid this, you must escape the comma by preceding it with
235+ a backslash (\\). Escape a collection named "my,collection" as follows:
236+
237+ .. code-block:: java
238+
239+ "my\,collection"
240+
241+ - If the name of a collection used in your ``collection`` configuration
242+ option is "*", the {+connector-short+} interprets it as a specification
243+ to scan all collections. To avoid this, you must escape the asterisk by preceding it
244+ with a backslash (\\). Escape a collection named "*" as follows:
245+
246+ .. code-block:: java
247+
248+ "\*"
249+
250+ - If the name of a collection used in your ``collection`` configuration
251+ option contains a backslash (\\), the
252+ {+connector-short+} treats the backslash as an escape character, which
253+ might change how it interprets the value. To avoid this, you must escape
254+ the backslash by preceding it with another backslash. Escape a collection named "\\collection" as follows:
255+
256+ .. code-block:: java
257+
258+ "\\collection"
259+
260+ .. note::
261+
262+ When specifying the collection name as a string literal in Java, you must
263+ further escape each backslash with another one. For example, escape a collection
264+ named "\\collection" as follows:
265+
266+ .. code-block:: java
267+
268+ "\\\\collection"
269+
270+ You can stream from all collections in the database by passing an
271+ asterisk (*) as a string for the collection name.
272+
273+ Specify all collections as shown in the following example:
274+
275+ .. code-block:: java
276+
277+ ...
278+ .option("spark.mongodb.collection", "*")
279+
280+ If you create a collection while streaming from all collections, the new
281+ collection is automatically included in the stream.
282+
283+ You can drop collections at any time while streaming from multiple collections.
284+
285+ .. important:: Inferring the Schema with Multiple Collections
286+
287+ If you set the ``change.stream.publish.full.document.only``
288+ option to ``true``, the {+connector-short+} infers the schema of a ``DataFrame``
289+ by using the schema of the scanned documents.
290+
291+ Schema inference happens at the beginning of streaming, and does not take
292+ into account collections that are created during streaming.
293+
294+ When streaming from multiple collections and inferring the schema, the connector samples
295+ each collection sequentially. Streaming from a large number of
296+ collections can cause the schema inference to have noticeably slower
297+ performance. This performance impact occurs only while inferring the schema.
0 commit comments