Skip to content

firestore.BulkWriter.flush() has a race #1772

@thewildllama

Description

@thewildllama

bulkWriter.flush() (and therefore bulkWriter.close() have a race when checking to see if there are more items to flush.

We noticed this in our Cloud Function application that tries to write 30,000 records to a collection.

I have a fix that is working here:
https://github.com/thewildllama/java-firestore/tree/bulk-writer-flush-fix

I detailed the problem in my commit message:

Flush checks the size of the current batch to see if it still has work,
but a race can happen if pendingOpsCount > maxPendingOpCount and
operations have been placed in bufferedOperations.

When flush is called, scheduleCurrentBatchLocked() used to be called
with flush=true, which would call sendBatchLocked() with
flush=true as well. If the call to sendBatchLocked() did not get
rate limited, it would add a listener to the actual batch write that
would call scheduleCurrentBatchLocked() again if flush == true.

If the current batch's size is 0, scheduleCurrentBatchLocked() simply
returns, ending the flush.

If pendingOpsCount > maxPendingOpCount and operations are being buffered into
bufferedOperations instead of a batch, there is a chance that
scheduleCurrentBatchLocked() will execute before any pending operations
complete, and the current batch is empty. Thus ending the "flush".

This fix's approach avoids that with a global flush state and a flush
"endpoint" that marks the last operation that needs to be flushed. This
allows flush() to be called more than once (including a terminal call to
close() if it came to that).

We will end up calling scheduleCurrentBatchLocked() more often from
sendBatchLocked(), which is done via callback... so there's going to
be some more overhead, there. When we did inserts of 30,000 in quick
succession, many of the calls to sendBatchLocked() were coming from
the rate limit backoff branch, and those all had flush=false.

Environment details

OS type and version: Linux/macOS
Java version: 21
Version(s): com.google.cloud:google-cloud-firestore:3.22.0, com.google.cloud:google-cloud-firestore:3.24.2

Steps to reproduce

  1. Insert 30,000 items into a collection within a try with resources block
  2. Application hangs once exiting the try with resources block

Code example

@JsonIgnoreProperties(ignoreUnknown = true)
public record HistoricalReference(
    @JsonProperty("Identifier")
    String identifier,
    @JsonProperty("Start date")
    LocalDate startDate
    // ...
) { }
try (BulkWriter bulkWriter = firestore.bulkWriter()) {
    historicalReferences.forEach(
        historicalReference -> {
            DocumentReference docRef = collectionReference.document(historicalReference.identifier());
            bulkWriter.set(
                docRef,
                // Can't wait for record support!
                OBJECT_MAPPER.convertValue(historicalReference,new TypeReference<>() {})
            );
        }
    );
}

Metadata

Metadata

Assignees

Labels

api: firestoreIssues related to the googleapis/java-firestore API.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions