Skip to content

Parse 4: Case-insensitive indexes cause huge performance problems for regex queries #6559

@rdhelms

Description

@rdhelms

Issue Description


TL;DR

  • Should we add a hint to regex queries to make sure they don't attempt to use the case-insensitive index?
  • When will parse-server with [email protected] be published to npm (the change is already on master)?

TS;WM

As documented, Parse 4 now creates case_insensitive_email and case_insensitive_username indexes in addition to the email_1 and username_1 indexes that it has always created. These case-insensitive indexes significantly help the performance for queries that use the MongoCollection.caseInsensitiveCollation() 👍

However...

The MongoDB docs for regex queries mention that regex queries are not collation-aware and unable to utilize case-insensitive indexes:

Case insensitive regular expression queries generally cannot use indexes effectively. The $regex implementation is not collation-aware and is unable to utilize case-insensitive indexes.

At first this doesn't seem to be a problem...because why would we tell the regex query to use the case-insensitive index? Shouldn't the regex query just continue to use the case-sensitive index, or no index at all? Well...sadly, for whatever reason, when both a case-insensitive index and a case-sensitive index exist and a regex query runs, the MongoDB Query Plan seems to select the case-insensitive index, and the resulting query time is significantly worse. We were seeing regex queries running ~4x slower, and then as CPU use and query backlog increased this effectively crashed our server.

We're in conversation with the Mongo community about why this is the case, as it doesn't seem like the intended behavior. One possibility is that the regex query is selecting the most recently created index for the key being queried, regardless of its collation. We attempted to change the selected index by specifying a collation manually on the regex query, but this didn't seem to have any effect. Whether the query optimizer is behaving as expected is still unknown.

Some good news...

We do seem to have found a workaround. Using the hint() cursor method, we were able to tell our regex queries to use the email_1 index instead of the case_insensitive_email index. On the parse side, this looks like:

const query = new Parse.Query(Parse.User)
query.hint('email_1')

This returned our regex query times back to normal and everything was great 👍

So...I'm opening this issue to

  1. Make everyone aware that this seemed to be a huge performance regression, and what a possible workaround might be
  2. Discuss the possibility of setting the hint() within parse-server for regex queries, so that users don't have to do it themselves

Finally, as a corollary to this, we also wanted to ask about the status of parse-server's version of parse being updated to 2.12.0...we see that the dependency has been updated, but it hasn't been published to npm, yet? Where could we find information about when to expect changes merged to master getting published to npm? The reason we ask is that our workaround using hint() can only work with [email protected] because that is the version that added support for Parse.Query.hint.

Steps to reproduce

  • Spin up a server using Parse 4 pointing to a database with a large number of users (we were using over 4 million when doing our tests)
  • After the indexes have been created, run a find query using regex against the username or email fields.
  • Delete the case_insensitive_email and case_insensitive_username indexes
  • Run the same query again

For us, doing the above led to an almost 4x improvement in query time.

Expected Results

We expected the regex queries to completely ignore the case-insensitive indexes, and to have the same performance as in [email protected]

Actual Outcome

The regex queries are attempting to use the case-insensitive indexes, and are performing ~4x worse.

Environment Setup

  • Server

    • parse-server version : 4.1.0
    • Operating System: various
    • Hardware: various
    • Localhost or remote server?: localhost and Heroku
  • Database

    • MongoDB version: 3.6.12
    • Storage engine: WiredTiger
    • Hardware: unknown
    • Localhost or remote server?: mLab

Logs/Trace

Can add more detail here if there are issues reproducing the issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions