@@ -99,7 +99,8 @@ has not been exceeded, the driver MUST retry a transaction that fails with an er
9999"TransientTransactionError" label. Since retrying the entire transaction will entail invoking the callback again,
100100drivers MUST document that the callback may be invoked multiple times (i.e. one additional time per retry attempt) and
101101MUST document the risk of side effects from using a non-idempotent callback. If the retry timeout has been exceeded,
102- drivers MUST NOT retry the transaction and allow ` withTransaction ` to propagate the error to its caller.
102+ drivers MUST NOT retry the transaction and allow ` withTransaction ` to propagate the error to its caller. When retrying,
103+ drivers MUST implement an exponential backoff with jitter following the algorithm described below.
103104
104105If an error bearing neither the UnknownTransactionCommitResult nor the TransientTransactionError label is encountered at
105106any point, the driver MUST NOT retry and MUST allow ` withTransaction ` to propagate the error to its caller.
@@ -113,7 +114,11 @@ needed (e.g. user data to pass as a parameter to the callback).
113114
114115This method should perform the following sequence of actions:
115116
116- 1 . Record the current monotonic time, which will be used to enforce the 120-second timeout before later retry attempts.
117+ 1 . Define the following:
118+ 1 . Record the current monotonic time, which will be used to enforce the 120-second / CSOT timeout before later retry
119+ attempts.
120+ 2 . Set ` retry ` to ` 0 ` . This will be used for backoff later in step 7.
121+ 3 . Set ` TIMEOUT_MS ` to be ` timeoutMS ` if given, otherwise 120-seconds.
1171222 . Invoke [ startTransaction] ( ../transactions/transactions.md#starttransaction ) on the session. If TransactionOptions
118123 were specified in the call to ` withTransaction ` , those MUST be used for ` startTransaction ` . Note that
119124 ` ClientSession.defaultTransactionOptions ` will be used in the absence of any explicit TransactionOptions.
@@ -128,23 +133,35 @@ This method should perform the following sequence of actions:
1281336 . If the callback reported an error:
129134 1 . If the ClientSession is in the "starting transaction" or "transaction in progress" state, invoke
130135 [ abortTransaction] ( ../transactions/transactions.md#aborttransaction ) on the session.
136+
131137 2 . If the callback's error includes a "TransientTransactionError" label and the elapsed time of ` withTransaction ` is
132- less than 120 seconds, jump back to step two.
138+ less than TIMEOUT_MS, calculate the backoffMS to be ` jitter * min(BACKOFF_INITIAL * (1.5**retry), BACKOFF_MAX) `
139+ where:
140+
141+ 1 . jitter is a random float between \[ 0, 1)
142+ 2 . retry is the variable defined in step 1.
143+ 3 . ` BACKOFF_INITIAL ` is 5ms
144+ 4 . ` BACKOFF_MAX ` is 500ms
145+
146+ If elapsed time + ` backoffMS ` > ` TIMEOUT_MS ` , then raise last known error. Otherwise, sleep for ` backoffMS ` ,
147+ increment ` retry ` , and jump back to step two.
148+
133149 3 . If the callback's error includes a "UnknownTransactionCommitResult" label, the callback must have manually
134150 committed a transaction, propagate the callback's error to the caller of ` withTransaction ` and return
135151 immediately.
152+
136153 4 . Otherwise, propagate the callback's error to the caller of ` withTransaction ` and return immediately.
1371547 . If the ClientSession is in the "no transaction", "transaction aborted", or "transaction committed" state, assume the
138155 callback intentionally aborted or committed the transaction and return immediately.
1391568 . Invoke [ commitTransaction] ( ../transactions/transactions.md#committransaction ) on the session.
1401579 . If ` commitTransaction ` reported an error:
141158 1 . If the ` commitTransaction ` error includes a "UnknownTransactionCommitResult" label and the error is not
142- MaxTimeMSExpired and the elapsed time of ` withTransaction ` is less than 120 seconds , jump back to step eight.
143- We will trust ` commitTransaction ` to apply a majority write concern on retry attempts (see:
159+ MaxTimeMSExpired and the elapsed time of ` withTransaction ` is less than TIMEOUT_MS , jump back to step eight. We
160+ will trust ` commitTransaction ` to apply a majority write concern on retry attempts (see:
144161 [ Majority write concern is used when retrying commitTransaction] ( #majority-write-concern-is-used-when-retrying-committransaction ) ).
145162
146163 2 . If the ` commitTransaction ` error includes a "TransientTransactionError" label and the elapsed time of
147- ` withTransaction ` is less than 120 seconds , jump back to step two.
164+ ` withTransaction ` is less than TIMEOUT_MS , jump back to step two.
148165
149166 3 . Otherwise, propagate the ` commitTransaction ` error to the caller of ` withTransaction ` and return immediately.
15016710 . The transaction was committed successfully. Return immediately.
@@ -154,23 +171,39 @@ This method should perform the following sequence of actions:
154171This method can be expressed by the following pseudo-code:
155172
156173``` typescript
174+ var BACKOFF_INITIAL = 5 // 5ms initial backoff
175+ var BACKOFF_MAX = 500 // 500ms max backoff
157176withTransaction (callback , options ) {
158177 // Note: drivers SHOULD use a monotonic clock to determine elapsed time
159178 var startTime = Date .now (); // milliseconds since Unix epoch
179+ // See the CSOT specification for information on calculating timeoutMS for a convenient transaction API call.
180+ var timeout = getCSOTTimeoutIfSet () ?? 120_000 ;
181+ var retry = 0 ;
160182
161183 retryTransaction : while (true ) {
184+ if (retry > 0 ) {
185+ var backoff = Math .random () * min (BACKOFF_INITIAL * (1.5 ** retry ),
186+ BACKOFF_MAX );
187+
188+ if (Date .now () + backoff - startTime >= timeout ) {
189+ throw last_error ;
190+ }
191+ sleep (backoff );
192+ }
193+ retry += 1
162194 this .startTransaction (options ); // may throw on error
163195
164196 try {
165197 callback (this );
166198 } catch (error ) {
199+ var last_error = error ;
167200 if (this .transactionState == STARTING ||
168201 this .transactionState == IN_PROGRESS ) {
169202 this .abortTransaction ();
170203 }
171204
172205 if (error .hasErrorLabel (" TransientTransactionError" ) &&
173- Date .now () - startTime < 120000 ) {
206+ Date .now () - startTime < timeout ) {
174207 continue retryTransaction ;
175208 }
176209
@@ -198,12 +231,12 @@ withTransaction(callback, options) {
198231 */
199232 if (! isMaxTimeMSExpiredError (error ) &&
200233 error .hasErrorLabel (" UnknownTransactionCommitResult" ) &&
201- Date .now () - startTime < 120000 ) {
234+ Date .now () - startTime < timeout ) {
202235 continue retryCommit ;
203236 }
204237
205238 if (error .hasErrorLabel (" TransientTransactionError" ) &&
206- Date .now () - startTime < 120000 ) {
239+ Date .now () - startTime < timeout ) {
207240 continue retryTransaction ;
208241 }
209242
@@ -324,8 +357,8 @@ exceed the user's original intention for `maxTimeMS`.
324357The callback may be executed any number of times. Drivers are free to encourage their users to design idempotent
325358callbacks.
326359
327- A previous design had no limits for retrying commits or entire transactions. The callback is always able indicate that
328- ` withTransaction ` should return to its caller (without future retry attempts) by aborting the transaction directly;
360+ A previous design had no limits for retrying commits or entire transactions. The callback is always able to indicate
361+ that ` withTransaction ` should return to its caller (without future retry attempts) by aborting the transaction directly;
329362however, that puts the onus on avoiding very long (or infinite) retry loops on the application. We expect the most
330363common cause of retry loops will be due to TransientTransactionErrors caused by write conflicts, as those can occur
331364regularly in a healthy application, as opposed to UnknownTransactionCommitResult, which would typically be caused by an
@@ -338,6 +371,16 @@ non-configurable default and is intentionally twice the value of MongoDB 4.0's d
338371parameter (60 seconds). Applications that desire longer retry periods may call ` withTransaction ` additional times as
339372needed. Applications that desire shorter retry periods should not use this method.
340373
374+ ### Backoff Benefits
375+
376+ Previously, the driver would retry transactions immediately, which is fine for low levels of contention. But, as the
377+ server load increases, immediate retries can result in retry storms, unnecessarily further overloading the server.
378+
379+ Exponential backoff is well-researched and accepted backoff strategy that is simple to implement. A low initial backoff
380+ (1-millisecond) and growth value (1.25x) were chosen specifically to mitigate latency in low levels of contention.
381+ Empirical evidence suggests that 500-millisecond max backoff ensured that a transaction did not wait so long as to
382+ exceed the 120-second timeout and reduced load spikes.
383+
341384## Backwards Compatibility
342385
343386The specification introduces a new method on the ClientSession class and does not introduce any backward breaking
@@ -357,6 +400,8 @@ provides an implementation of a technique already described in the MongoDB 4.0 d
357400
358401## Changelog
359402
403+ - 2025-11-20: withTransaction applies exponential backoff when retrying.
404+
360405- 2024-09-06: Migrated from reStructuredText to Markdown.
361406
362407- 2023-11-22: Document error handling inside the callback.
0 commit comments