Skip to content

0500 Intermittent or Periodic Authentication Issue

Malcolm Stewart edited this page Nov 2, 2020 · 9 revisions

0500 Intermittent or Periodic Authentication Issue

0500.1 Typical Error Messages

  • Cannot generate SSPI context
  • Login failed for user '(null)'
  • Login failed for user ''
  • Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'
  • Login failed for user 'JohnDoe'
  • Login failed for user 'Contoso\JohnDoe'
  • Login failed. The login is from an untrusted domain and cannot be used with Windows authentication.

In the context of the Workflow, the word "Client" refers to the immediate client to SQL Server, e.g. in a 3-tier application, the client could be a web server.

0500.2 Moving Parts

The initial goal is to try to isolate which of the moving parts is causing the problem.

Moving Parts

0500.3 Appropriate Expectations

0500.3.1 This issue may take a while to resolve depending on the frequency that the problem occurs, whether it happens to one of many clients vs. all clients vs. application server, and whether it happens more often at set times of day, e.g. busy periods or during backups or reindexing.

0500.3.2 The most common issues are related to SQL Server performance or slow Domain Controller response. IF using NTLM, then LSASS has a bottleneck and limits how many new connections can be processed at once; additional requests get backed up and may timeout. Some causes, such as Antivirus can be difficult to prove, but are common nonetheless and should be investigated even without hard proof, if other avenues of inquiry do not show promise.

0500.4 Pre-Work

0500.4.1 Please perform the initial data collection and narrowing steps: 0100 Initial Data Collection and Scoping Questions. This will help get a macro perspective of the scope of an issue, such as whether the issue affects multiple computers or just one, or whether only those computers in a specific data center are facing issues. This can help focus the troubleshooting steps. It will also make you prepared for discussing the issue with Microsoft Support should you choose to do so.

Review the public troubleshooting documents listed in section 0015 Self-Help Articles.

0500.4.2 Make sure you understand the application architecture. Make a summary in a succinct form, similar to the below description:

  • There are two domains involved: CONTOSO and FABRIKAM.
  • The client (SPARKY.CONTOSO.COM) is Windows 2012.
  • The user (CONTOSO\JOHNDOE) runs EDGE and connects to a web server (_HTTP://WEB01.CONTOSO.COM/Accounting) using Integrated security.
  • The IIS app pool runs as (CONTOSO\WEB_SVC).
  • The web server connects to SQL Server 2014 (SQLProd01.FABRIKAM.COM\Accounting on port 1433) using the SqlClient .NET 4.6.2 Provider and delegates the user credentials to SQL Server via integrated security.
  • The SQL Server service account is FABRIKAM\SQL_SVC_01.

0500.5 Order of Troubleshooting

In general, troubleshooting should be data driven, which may give way to empirical tests in a more focused context. If the issue is very intermittent and network traces will be difficult to capture, then the empirical methods may be applied first.

Since the issue is intermittent, we can assume configuration, such as Kerberos SPNs, is basically correct.

0500.5.1 Are multiple domains or data centers involved?
If multiple domains or data centers are involved, check whether the users in the local domain/data center have a good experience while users in the other domain or data center do not. If that is the case, it could be a communication latency between data centers or between domain controllers. Use PING to check network latency. Use the RUNAS command with various users to test credential validation latency issues. These commands can eliminate SQL Server from the issue and show a more fundamental issue with the networking infrastructure or domain controller performance.

0500.5.2 SQL Server ERRORLOG The SQL Server ERRORLOG may reveal performance issues on SQL Server, such as entries indicating I/O was taking longer than 15 seconds. The SQL Performance team to have a PSSDIAG run and analyzed. You may want to do this, anyway, if the network trace reveals delays with the SQL Server responses.

The ERRORLOG may also include other domain-related errors, such as the following that indicate some sort of Active Directory performance issue:

	SSPI handshake failed with error code 0x80090311 while establishing a connection with integrated security; the connection has been closed.
	SSPI handshake failed with error code 0x80090304 while establishing a connection with integrated security; the connection has been closed.
These codes translate as following:
	 Error -2146893039 (0x80090311): No authority could be contacted for authentication.
         Error -2146893052 (0x80090304): The Local Security Authority cannot be contacted.

0500.5.3 Examine Client System Event Log
The system event log will have various events, such as KERBEROS, LSA, and NETLOGON events, which indicate the computer was not able to connect to the domain controller for a period of time. To make them easier to find, filter on Error, Warning, and Critical events only. The event times need to be around the time of the outage. If there is a match, this would be an Active Directory issue.

In some cases, this may happen on the SQL Server. Check the logs on that machine, as well.

Source: NETLOGON
Date: 8/12/2012 8:22:16 PM
Event ID: 5719
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: SQLPROD01
Description:
This computer was not able to set up a secure session with a domain controller in domain CONTOSO due to the following: The remote procedure call was cancelled. This may lead to authentication problems. Make sure that this computer is connected to the network. If the problem persists, please contact your domain administrator.

Clone this wiki locally