We just upgraded to ADFS 4.0 from 2.x. Now in the space of 3 weeks we've had two different ADFS servers lose their NetLogon secure channel session with a DC. The symptoms as found in the event log:
In the System log, a Tcpip event 4227 warning:
TCP/IP failed to establish an outgoing connection because the selected local endpoint was recently used to connect to the same remote endpoint. This error typically occurs when outgoing connections are opened and closed at a high rate, causing all available local ports to be used and forcing TCP/IP to reuse a local port for an outgoing connection. To minimize the risk of data corruption, the TCP/IP standard requires a minimum time period to elapse between successive connections from a given local endpoint to a given remote endpoint.
Then a minute later, also in the System log, a NETLOGON 5719 error:
This computer was not able to set up a secure session with a domain controller in domain NETID due to the following:
The RPC server is unavailable.
And immediately after, in the "AD FS/Admin" log event 342 errors of the form:
I am sending some of the perf counters to our Graphite system and one is "\TCPv4\Connections Established." I see nothing unusual with this count around the time that the Tcpip event 4277 is recorded. Maybe that event is a red herring.
NetLogon debug logging was not enabled so nothing to check there. I can turn up the debug log level so we have something the next time this happens, but I first wanted to ask if anyone else has seen this. My initial searching did not turn up much.
Forum info: http://www.activedir.org
Problems unsubscribing? Email admin@xxxxxxxxxxxxxxxx
ADFS losing connection to DC
- 986 Views
- Last Post 20 June 2018
Was MaxConcurrentApi altered on the ADFS server?
"nltest /dbflag:0x2080ffff" should be enough to catch MaxConcurrentApi related issues.
And if it's at the TCP layer, you might want to reduce the default TcpTimedWaitDelay from 120 seconds to somewhere around 30-60 seconds:
Thanks for your response. Yes, I've set the NetLogon debug flags on all of the servers in the farm. I haven't made any changes to MaxConcurrentApi though.
I did see articles on changing the TcpTImedWaitDelay but I don't want to mess with that unless the problem reoccurs and the NetLogon logs indicate that would help.
Have you got Firepower or Palo Alto on the edge? Or some other filter? I haven’t seen this, but my gut is telling me this is a network device killing traffic because it’s classifying as a DOS attack or something. I would make sure everything is whitelisted to the Office 365 IPs just to rule that out first.
Just to be clear: TcpTimedWaitDelay is not related to MaxConcurrentApi and in the past I have witnessed some application servers (