|
| 1 | +==================== |
| 2 | +Invalid Resume Token |
| 3 | +==================== |
| 4 | + |
| 5 | +.. default-domain:: mongodb |
| 6 | + |
| 7 | +.. contents:: On this page |
| 8 | + :local: |
| 9 | + :backlinks: none |
| 10 | + :depth: 2 |
| 11 | + :class: singlecol |
| 12 | + |
| 13 | +Overview |
| 14 | +-------- |
| 15 | + |
| 16 | +Learn how to recover from an invalid resume token |
| 17 | +in a MongoDB Kafka Connector source connector. |
| 18 | + |
| 19 | +Stack Trace |
| 20 | +~~~~~~~~~~~ |
| 21 | + |
| 22 | +The following stack trace indicates that the source connector has an invalid resume token: |
| 23 | + |
| 24 | +.. code-block:: text |
| 25 | + |
| 26 | + ... |
| 27 | + org.apache.kafka.connect.errors.ConnectException: ResumeToken not found. |
| 28 | + Cannot create a change stream cursor |
| 29 | + ... |
| 30 | + Command failed with error 286 (ChangeStreamHistoryLost): 'PlanExecutor |
| 31 | + error during aggregation :: caused by :: Resume of change stream was not |
| 32 | + possible, as the resume point may no longer be in the oplog |
| 33 | + ... |
| 34 | + |
| 35 | +Cause |
| 36 | +----- |
| 37 | + |
| 38 | +When the ID of your source connector's resume token does not correspond to any |
| 39 | +entry in your MongoDB deployment's :ref:`oplog <replica-set-oplog>`, |
| 40 | +your connector has no way to determine where to begin to process your |
| 41 | +MongoDB change stream. This issue most commonly occurs when you pause the source |
| 42 | +connector and fill the oplog, as outlined in the following scenario: |
| 43 | + |
| 44 | +#. You start a Kafka deployment with a MongoDB Kafka Connector source connector. |
| 45 | +#. You produce change stream events in MongoDB, and your connector stores a |
| 46 | + resume token corresponding to the most recent oplog entry in MongoDB. |
| 47 | +#. You pause your source connector. |
| 48 | +#. While your connector sits idle, you fill your MongoDB oplog such that MongoDB |
| 49 | + deletes the oplog entry corresponding to your resume token. |
| 50 | +#. You restart your source connector, and it is unable to resume |
| 51 | + processing as its resume token does not exist in your MongoDB oplog. |
| 52 | + |
| 53 | +For more information on the oplog, see the |
| 54 | +:ref:`MongoDB Manual <replica-set-oplog>`. |
| 55 | + |
| 56 | +.. TODO: update doc link to ref once page is written |
| 57 | + |
| 58 | +For more information on change streams, see the |
| 59 | +:doc:`guide on change streams </source-connector/fundamentals/change-streams>`. |
| 60 | + |
| 61 | +Solutions |
| 62 | +--------- |
| 63 | + |
| 64 | +You can recover from an invalid resume token using one of the following |
| 65 | +strategies: |
| 66 | + |
| 67 | +- :ref:`Temporarily Tolerate Errors <temporarily-tolerate-errors>` |
| 68 | +- :ref:`Delete Stored Offsets <troubleshoot-delete-stored-offsets>` |
| 69 | + |
| 70 | +.. _temporarily-tolerate-errors: |
| 71 | + |
| 72 | +Temporarily Tolerate Errors |
| 73 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 74 | + |
| 75 | +You can configure your source connector to tolerate errors |
| 76 | +while you produce a change stream event that updates the |
| 77 | +connector's resume token. This recovery strategy is the |
| 78 | +simplest, but there is a risk that your connector briefly |
| 79 | +ignores errors unrelated to the invalid resume token. If you |
| 80 | +aren't comfortable briefly tolerating errors |
| 81 | +in your deployment, you can |
| 82 | +:ref:`delete stored offsets <troubleshoot-delete-stored-offsets>` instead. |
| 83 | + |
| 84 | +To configure your source connector to temporarily tolerate errors: |
| 85 | + |
| 86 | +#. Set the ``errors.tolerance`` option to tolerate all errors: |
| 87 | + |
| 88 | + .. code-block:: java |
| 89 | + |
| 90 | + errors.tolerance="all" |
| 91 | + |
| 92 | +#. Insert, update, or delete a document in the collection referenced by your source connector to |
| 93 | + produce a change stream event that updates your connector's resume token. |
| 94 | + |
| 95 | +#. Once you produce a change stream event, set the ``errors.tolerance`` |
| 96 | + option to no longer tolerate errors: |
| 97 | + |
| 98 | + .. code-block:: java |
| 99 | + |
| 100 | + errors.tolerance="none" |
| 101 | + |
| 102 | +.. TODO: <Confirm linked page discusses errors.tolerance once it's written> |
| 103 | +.. TODO: update doc link to ref once page is written |
| 104 | + |
| 105 | +For more information on the ``errors.tolerance`` option, see the |
| 106 | +:doc:`guide on source connector configuration properties </source-connector/configuration-properties>`. |
| 107 | + |
| 108 | +.. _troubleshoot-delete-stored-offsets: |
| 109 | + |
| 110 | +Delete Stored Offsets |
| 111 | +~~~~~~~~~~~~~~~~~~~~~ |
| 112 | + |
| 113 | +You can delete your Kafka Connect offset data, which contains your resume token, |
| 114 | +to allow your connector to resume processing your change stream. This strategy is |
| 115 | +more complex than the preceding strategy, but does not risk tolerating errors |
| 116 | +unrelated to the invalid resume token. |
| 117 | + |
| 118 | +.. As far as I can tell, there is not a straightforward way to tell at runtime |
| 119 | + which mode you are in. The Data Engineer Persona likely knows how they |
| 120 | + configured their pipeline, but if they do not know they may |
| 121 | + have to attempt both choices. |
| 122 | + |
| 123 | +The steps to perform this strategy depend on whether you are running Kafka Connect |
| 124 | +in distributed mode or standalone mode. Click on the tab corresponding to the |
| 125 | +mode of your deployment: |
| 126 | + |
| 127 | +.. tabs:: |
| 128 | + |
| 129 | + .. tab:: Distributed |
| 130 | + :tabid: distributed |
| 131 | + |
| 132 | + #. Delete the topic specified in the ``offset.storage.topic`` property of your |
| 133 | + Kafka Connect deployment. For more information on deleting topics in Apache Kafka, see the |
| 134 | + `official Apache Kafka documentation <https://kafka.apache.org/081/documentation.html#basic_ops_add_topic>`__. |
| 135 | + |
| 136 | + #. Restart your source connector and continue to process change stream events. |
| 137 | + |
| 138 | + .. tab:: Standalone |
| 139 | + :tabid: standalone |
| 140 | + |
| 141 | + #. Delete the file referenced by the ``offset.storage.file.filename`` property of |
| 142 | + your Kafka Connect deployment. |
| 143 | + |
| 144 | + #. Restart your source connector and continue to process change stream events. |
0 commit comments