Skip to content

Commit eb36be7

Browse files
DOCSP-41467 - handle corrupt reads (#212) (#213)
(cherry picked from commit 9338913) Co-authored-by: Mike Woofter <[email protected]>
1 parent bb71d1f commit eb36be7

File tree

2 files changed

+51
-1
lines changed

2 files changed

+51
-1
lines changed

source/batch-mode/batch-read-config.txt

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,31 @@ You can configure the following properties when reading data from MongoDB in bat
5353
|
5454
| **Default:** None
5555

56+
* - ``mode``
57+
- | The parsing strategy to use when handling documents that don't match the
58+
expected schema. This option accepts the following values:
59+
60+
- ``ReadConfig.ParseMode.FAILFAST``: Throws an exception when parsing a document that
61+
doesn't match the schema.
62+
- ``ReadConfig.ParseMode.PERMISSIVE``: Sets fields to ``null`` when data types don't match
63+
the schema. To store each invalid document as an extended JSON string,
64+
combine this value with the ``columnNameOfCorruptRecord`` option.
65+
- ``ReadConfig.ParseMode.DROPMALFORMED``: Ignores any document that doesn't match
66+
the schema.
67+
68+
|
69+
| **Default:** ``ReadConfig.ParseMode.FAILFAST``
70+
71+
* - ``columnNameOfCorruptRecord``
72+
- | If you set the ``mode`` option to ``ReadConfig.ParseMode.PERMISSIVE``,
73+
this option specifies the name of the new column that stores the invalid
74+
document as extended JSON. If you're using an explicit schema, it must
75+
include the name of the new column. If you're
76+
using an inferred schema, the {+connector-short+} adds the new column to the
77+
end of the schema.
78+
|
79+
| **Default:** None
80+
5681
* - ``mongoClientFactory``
5782
- | MongoClientFactory configuration key.
5883
| You can specify a custom implementation which must implement the

source/streaming-mode/streaming-read-config.txt

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,13 +50,38 @@ You can configure the following properties when reading data from MongoDB in str
5050
with a comma.
5151
|
5252
| To learn more about specifying multiple collections, see :ref:`spark-specify-multiple-collections`.
53-
53+
5454
* - ``comment``
5555
- | The comment to append to the read operation. Comments appear in the
5656
:manual:`output of the Database Profiler. </reference/database-profiler>`
5757
|
5858
| **Default:** None
5959

60+
* - ``mode``
61+
- | The parsing strategy to use when handling documents that don't match the
62+
expected schema. This option accepts the following values:
63+
64+
- ``ReadConfig.ParseMode.FAILFAST``: Throws an exception when parsing a document that
65+
doesn't match the schema.
66+
- ``ReadConfig.ParseMode.PERMISSIVE``: Sets fields to ``null`` when data types don't match
67+
the schema. To store each invalid document as an extended JSON string,
68+
combine this value with the ``columnNameOfCorruptRecord`` option.
69+
- ``ReadConfig.ParseMode.DROPMALFORMED``: Ignores any document that doesn't match
70+
the schema.
71+
72+
|
73+
| **Default:** ``ReadConfig.ParseMode.FAILFAST``
74+
75+
* - ``columnNameOfCorruptRecord``
76+
- | If you set the ``mode`` option to ``ReadConfig.ParseMode.PERMISSIVE``,
77+
this option specifies the name of the new column that stores the invalid
78+
document as extended JSON. If you're using an explicit schema, it must
79+
include the name of the new column. If you're
80+
using an inferred schema, the {+connector-short+} adds the new column to the
81+
end of the schema.
82+
|
83+
| **Default:** None
84+
6085
* - ``mongoClientFactory``
6186
- | MongoClientFactory configuration key.
6287
| You can specify a custom implementation, which must implement the

0 commit comments

Comments
 (0)