Skip to content

Commit 09df146

Browse files
biniona-mongodbmelissamahoney-mongodb
authored andcommitted
Docsp 15820 resume from an invalid resume token (#129)
Co-authored-by: Melissa Mahoney <[email protected]>
1 parent 75d3a41 commit 09df146

File tree

2 files changed

+149
-1
lines changed

2 files changed

+149
-1
lines changed

source/troubleshooting.txt

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,8 @@
22
Troubleshooting
33
===============
44

5-
asdf
5+
.. toctree::
6+
:titlesonly:
7+
:maxdepth: 1
8+
9+
Invalid Resume Token </troubleshooting/recover-from-invalid-resume-token.txt>
Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
====================
2+
Invalid Resume Token
3+
====================
4+
5+
.. default-domain:: mongodb
6+
7+
.. contents:: On this page
8+
:local:
9+
:backlinks: none
10+
:depth: 2
11+
:class: singlecol
12+
13+
Overview
14+
--------
15+
16+
Learn how to recover from an invalid resume token
17+
in a MongoDB Kafka Connector source connector.
18+
19+
Stack Trace
20+
~~~~~~~~~~~
21+
22+
The following stack trace indicates that the source connector has an invalid resume token:
23+
24+
.. code-block:: text
25+
26+
...
27+
org.apache.kafka.connect.errors.ConnectException: ResumeToken not found.
28+
Cannot create a change stream cursor
29+
...
30+
Command failed with error 286 (ChangeStreamHistoryLost): 'PlanExecutor
31+
error during aggregation :: caused by :: Resume of change stream was not
32+
possible, as the resume point may no longer be in the oplog
33+
...
34+
35+
Cause
36+
-----
37+
38+
When the ID of your source connector's resume token does not correspond to any
39+
entry in your MongoDB deployment's :ref:`oplog <replica-set-oplog>`,
40+
your connector has no way to determine where to begin to process your
41+
MongoDB change stream. This issue most commonly occurs when you pause the source
42+
connector and fill the oplog, as outlined in the following scenario:
43+
44+
#. You start a Kafka deployment with a MongoDB Kafka Connector source connector.
45+
#. You produce change stream events in MongoDB, and your connector stores a
46+
resume token corresponding to the most recent oplog entry in MongoDB.
47+
#. You pause your source connector.
48+
#. While your connector sits idle, you fill your MongoDB oplog such that MongoDB
49+
deletes the oplog entry corresponding to your resume token.
50+
#. You restart your source connector, and it is unable to resume
51+
processing as its resume token does not exist in your MongoDB oplog.
52+
53+
For more information on the oplog, see the
54+
:ref:`MongoDB Manual <replica-set-oplog>`.
55+
56+
.. TODO: update doc link to ref once page is written
57+
58+
For more information on change streams, see the
59+
:doc:`guide on change streams </source-connector/fundamentals/change-streams>`.
60+
61+
Solutions
62+
---------
63+
64+
You can recover from an invalid resume token using one of the following
65+
strategies:
66+
67+
- :ref:`Temporarily Tolerate Errors <temporarily-tolerate-errors>`
68+
- :ref:`Delete Stored Offsets <troubleshoot-delete-stored-offsets>`
69+
70+
.. _temporarily-tolerate-errors:
71+
72+
Temporarily Tolerate Errors
73+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
74+
75+
You can configure your source connector to tolerate errors
76+
while you produce a change stream event that updates the
77+
connector's resume token. This recovery strategy is the
78+
simplest, but there is a risk that your connector briefly
79+
ignores errors unrelated to the invalid resume token. If you
80+
aren't comfortable briefly tolerating errors
81+
in your deployment, you can
82+
:ref:`delete stored offsets <troubleshoot-delete-stored-offsets>` instead.
83+
84+
To configure your source connector to temporarily tolerate errors:
85+
86+
#. Set the ``errors.tolerance`` option to tolerate all errors:
87+
88+
.. code-block:: java
89+
90+
errors.tolerance="all"
91+
92+
#. Insert, update, or delete a document in the collection referenced by your source connector to
93+
produce a change stream event that updates your connector's resume token.
94+
95+
#. Once you produce a change stream event, set the ``errors.tolerance``
96+
option to no longer tolerate errors:
97+
98+
.. code-block:: java
99+
100+
errors.tolerance="none"
101+
102+
.. TODO: <Confirm linked page discusses errors.tolerance once it's written>
103+
.. TODO: update doc link to ref once page is written
104+
105+
For more information on the ``errors.tolerance`` option, see the
106+
:doc:`guide on source connector configuration properties </source-connector/configuration-properties>`.
107+
108+
.. _troubleshoot-delete-stored-offsets:
109+
110+
Delete Stored Offsets
111+
~~~~~~~~~~~~~~~~~~~~~
112+
113+
You can delete your Kafka Connect offset data, which contains your resume token,
114+
to allow your connector to resume processing your change stream. This strategy is
115+
more complex than the preceding strategy, but does not risk tolerating errors
116+
unrelated to the invalid resume token.
117+
118+
.. As far as I can tell, there is not a straightforward way to tell at runtime
119+
which mode you are in. The Data Engineer Persona likely knows how they
120+
configured their pipeline, but if they do not know they may
121+
have to attempt both choices.
122+
123+
The steps to perform this strategy depend on whether you are running Kafka Connect
124+
in distributed mode or standalone mode. Click on the tab corresponding to the
125+
mode of your deployment:
126+
127+
.. tabs::
128+
129+
.. tab:: Distributed
130+
:tabid: distributed
131+
132+
#. Delete the topic specified in the ``offset.storage.topic`` property of your
133+
Kafka Connect deployment. For more information on deleting topics in Apache Kafka, see the
134+
`official Apache Kafka documentation <https://kafka.apache.org/081/documentation.html#basic_ops_add_topic>`__.
135+
136+
#. Restart your source connector and continue to process change stream events.
137+
138+
.. tab:: Standalone
139+
:tabid: standalone
140+
141+
#. Delete the file referenced by the ``offset.storage.file.filename`` property of
142+
your Kafka Connect deployment.
143+
144+
#. Restart your source connector and continue to process change stream events.

0 commit comments

Comments
 (0)