Skip to content

IKeyManager / XmlKeyManager - Fail intermittently in load balanced scenarios #50440

@Dhugal

Description

@Dhugal

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

The asp.net core data protection API's appear to fail intermittently in some load-balanced scenarios.
Specifically key generation across instances does not involve locking so keys generated on different instances are not always shared correctly.

In our case, we're using the asp.net core data protection APIs in a load-balanced environment (Azure App Service with scaled-out instances running asp.net core 7 app).

Occasionally we're hitting an issue with multiple instances starting simultaneously and doing the following:

  1. Read the existing Keys - there may be none or the existing keys may have expired.
  2. Each instance creates a new key and adds it to the store - and then reads the store again.
  3. The keys added by the latter instances are not available on earlier ones.

As we're using these for auth cookies, this means that authenticated users hitting some instances are not validated and therefore have to log back into the application.

The following shows an example what I believe is happening in a simple scenario of 2 instances:

Instance A Instance B
Read Keys (No Valid Keys Found)
Read Keys (No Valid Keys Found)
Create Key A
Read Keys
Create Key B
Read Keys
Key A loaded Keys A & B loaded

This means a user authenticated on Instance B may get issued a cookie using Key B and their subsequent requests to Instance A (via a load balancer) fail authentication (but requests to B work).

We're trying to mitigate this by implementing a cross-instance lock (in our case we use a DistrubutedLock, backed by SQL Server).

The problem is that we don't appear to have a good place to lock the process of reading and writing the new key, as they're separate methods. We can wrap the call to read the keys or create a key in a lock but that doesn't help as we can still get the overlap shown above between these two steps.

What we'd like to be able to do is apply a lock around the process which both reads the keys and generates a new one if required, so that the next instance will always get the key generated by the first one to start and will therefore not generate another one.

Instance A Instance B
Aquire Lock
Read Keys (No Valid Keys Found) Attempt to Aquire Lock
Create Key A Wait for lock
Read Keys Wait for lock
Release Lock Wait for lock
Aquire Lock
Read Keys
Release Lock
Key A loaded Key A loaded

The following section of the KeyRingProvider implements locking, which implies this scenario has been considered, but it doesn't cater for multiple instance scenarios:

https://github.com/dotnet/aspnetcore/blob/2dc991393c29a65df82efdb75e8467b7ace5bb74/src/DataProtection/DataProtection/src/KeyManagement/KeyRingProvider.cs#L180C17-L180C17

Expected Behavior

I believe this could be solved if the IKeyManager (and the default XmlKeyManager) provided a means to wrap/lock the process of reading the keys and determining if they're valid as well as generating a new one.

E.G. IKeyManager.ProcessingKeys (which calls GetAllKeys and, when required, also calls CreateNewKey)

We could then override this and place a lock around the process.

Steps To Reproduce

No response

Exceptions (if any)

No response

.NET Version

7.0.400

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions