DOCSP-41467 - handle corrupt reads (#212) (#213)

github-actions[bot] · mongoKart · web-flow · commit eb36be76b2f6 · 2024-09-11T08:31:41.000-05:00
(cherry picked from commit 9338913) Co-authored-by: Mike Woofter <108414937+mongoKart@users.noreply.github.com>
diff --git a/source/batch-mode/batch-read-config.txt b/source/batch-mode/batch-read-config.txt
@@ -53,6 +53,31 @@ You can configure the following properties when reading data from MongoDB in bat
        |
        | **Default:** None 
 
+   * - ``mode``
+     - | The parsing strategy to use when handling documents that don't match the
+         expected schema. This option accepts the following values:
+
+       - ``ReadConfig.ParseMode.FAILFAST``: Throws an exception when parsing a document that
+         doesn't match the schema.
+       - ``ReadConfig.ParseMode.PERMISSIVE``: Sets fields to ``null`` when data types don't match
+         the schema. To store each invalid document as an extended JSON string,
+         combine this value with the ``columnNameOfCorruptRecord`` option.
+       - ``ReadConfig.ParseMode.DROPMALFORMED``: Ignores any document that doesn't match
+         the schema.
+
+       |
+       | **Default:** ``ReadConfig.ParseMode.FAILFAST``
+ 
+   * - ``columnNameOfCorruptRecord``
+     - | If you set the ``mode`` option to ``ReadConfig.ParseMode.PERMISSIVE``,
+         this option specifies the name of the new column that stores the invalid
+         document as extended JSON. If you're using an explicit schema, it must
+         include the name of the new column. If you're
+         using an inferred schema, the {+connector-short+} adds the new column to the
+         end of the schema. 
+       |
+       | **Default:** None 
+
    * - ``mongoClientFactory``
      - | MongoClientFactory configuration key.
        | You can specify a custom implementation which must implement the
diff --git a/source/streaming-mode/streaming-read-config.txt b/source/streaming-mode/streaming-read-config.txt
@@ -50,13 +50,38 @@ You can configure the following properties when reading data from MongoDB in str
          with a comma.
        |
        | To learn more about specifying multiple collections, see :ref:`spark-specify-multiple-collections`.
-
+   
    * - ``comment``
      - | The comment to append to the read operation. Comments appear in the 
          :manual:`output of the Database Profiler. </reference/database-profiler>`
        |
        | **Default:** None 
 
+   * - ``mode``
+     - | The parsing strategy to use when handling documents that don't match the
+         expected schema. This option accepts the following values:
+
+       - ``ReadConfig.ParseMode.FAILFAST``: Throws an exception when parsing a document that
+         doesn't match the schema.
+       - ``ReadConfig.ParseMode.PERMISSIVE``: Sets fields to ``null`` when data types don't match
+         the schema. To store each invalid document as an extended JSON string,
+         combine this value with the ``columnNameOfCorruptRecord`` option.
+       - ``ReadConfig.ParseMode.DROPMALFORMED``: Ignores any document that doesn't match
+         the schema.
+
+       |
+       | **Default:** ``ReadConfig.ParseMode.FAILFAST``
+ 
+   * - ``columnNameOfCorruptRecord``
+     - | If you set the ``mode`` option to ``ReadConfig.ParseMode.PERMISSIVE``,
+         this option specifies the name of the new column that stores the invalid
+         document as extended JSON. If you're using an explicit schema, it must
+         include the name of the new column. If you're
+         using an inferred schema, the {+connector-short+} adds the new column to the
+         end of the schema. 
+       |
+       | **Default:** None 
+  
    * - ``mongoClientFactory``
      - | MongoClientFactory configuration key.
        | You can specify a custom implementation, which must implement the