Disk Write Caching and DCs

  • 630 Views
  • Last Post 24 April 2017
Biju_Babu posted this 29 March 2017

Hello,   It is well known that AD doesn’t like Disk Write Caching and it should be disabled.

  What I am curious about;   Some well known Blogs (here and here) says when a server is promoted to DC its by default disabled and if the disk SCSI FUA then it may not be a concern.  I couldn’t find any MS article though. Is this true?   If yes, In Vmware DCs, DiskWrte Caching is still enabled (though I don’t see event ID 1539 in any of these boxes) , is there a better way to check if its indeed enabled? (dskcache.exe failing with some I/O error on VMware VMs)   Thoughts?   Thanks.

Order By: Standard | Newest | Votes
Biju_Babu posted this 04 April 2017

Hello,

 

Any suggestions on the below question?

 

Rgds



 

show

Biju_Babu posted this 14 April 2017

Information shared by MS Expert, incase if anyone is curious.



 



Disk caching

 

There is significant difference between virtual disks and physical disks.  Virtual disks are most likely fully controlled by the VM host, not by the guest itself.  For example, your screen shot was for a VMware virtual SCSI drive.  For

comparison, here are the settings from a Hyper-V IDE drive:

 



 

Notice changing the cache setting is not an option.  It’s controlled automatically by the enlightened Hyper-V driver.  Your VMware SCSI disk driver doesn’t even expose the write-caching policy.

 

However, our recommendation for virtual DCs is to use virtual SCSI and not virtual IDE. 



https://technet.microsoft.com/en-us/library/virtualactivedirectorydomaincontrollervirtualizationhyperv(v=ws.10).aspx  That forces writes to occur to the media device, bypassing any caching.  After that, it is up to the hardware of the VM host to ensure

the data is written to storage in a durable manner.  So the settings you’re looking for don’t exist as they have no value in the scenario you’re looking at.

 

Also note that the ACID nature of the ESE database helps ensures that the DIT should always be in a consistent state due to the roll-forward or roll-backward nature of its transaction log.  It’s possible that in-flight data is still lost,

but the database itself should have full integrity after startup and is not unmountable.  See Extensible Storage Engine Architecture

https://technet.microsoft.com/en-us/library/aa998171(v=exchg.65).aspx

 

 

show

webster posted this 14 April 2017

I find this statement interesting:

 

"Maintain physical domain controllers in each of your domains. This mitigates the risk of

a virtualization platform malfunction that affects all host systems that use that platform."

 

I thought that was "old" thinking.

 

 

Webster

 

show

a-ko posted this 15 April 2017

Unless you’re using Shielded VMs (https://technet.microsoft.com/en-us/windows-server-docs/security/guarded-fabric-shielded-vm/guarded-fabric-and-shielded-vms)

your DCs should be physical if you care about the security of the database.

 

That said, a separate VM infrastructure would also work—as long as the VM Administrators are considered Domain Admins in the process. If they’re not, then your

DC strategy needs to change.

 

Physical makes it much easier…

 

show

ken posted this 18 April 2017

They’re pointing out that it’s another SPOF – if you can survive and restore, or you’re too small to justify the extra hardware/redundancy, then

you can ignore that advice.

 

show

g4ugm posted this 18 April 2017

If everything else is in the Virtual Platform, and that is certainly a trend I have seen, if it fails, having a Physical DC is of little comfort…. Dave 

show

ken posted this 19 April 2017

I agree, it’s certainly not some kind of “get out of jail card” that will solve all your problems. However there may be failure scenarios where something

underneath your virtualisation platform has failed, and if it’s hard configured to use AD as an authentication store, and you don’t have some kind of local “break glass” account, then you need to contact a DC somewhere to be able to get in and reconfigure/bring

the device back up. Having separate virtualisation farms/clusters can also help mitigate this risk in a similar manner though. It’s just like every other SPOF – you just need to look at whether eliminating it actually does anything to risk in your environment.

 

 

show

patrickg posted this 19 April 2017

Been running 100% virtual DC’s for 7+ years without any problems. The bigger concern is “easy” fiscal budget cut choices which introduce single point of failure stacks. The “Physical

DC” scenario these days is a crutch for those who cannot trust their virtualization platform which taken a bit further would question why you are running any prod workloads as VM’s.

 

Some general guidelines about spreading virtual DC’s

1)

Across multiple virtual clusters

2)

Across multiple pools of storage

3)

Across multiple switching stacks

4)

Across multiple vlans

5)

Don’t sync time with the hypvervisor

6)

Do the required registry changes to point time servers elsewhere than the default

7)

Don’t run snapshot based backups on anything older than 2012R2 and if doing snapshot-based backups on all DC’s make sure to stagger them (pref, don’t back them all up)

8)

On the VM configs, place ram reservations for the full memory size of your DC’s.

 

If your overloading the write performance of a DC then log less, move to SSDs, or add more DCs.

 

With regards to disabling write-caching, I have it disabled on most DC’s but not all. I’ve seen little performance difference if the disk is 15k or better….DC writes are heavily security

logs with being otherwise mostly read-focused.

 


~Patrick

 

 

 

show

ken posted this 21 April 2017

Hmm – I must tell our finance guys to cancel our insurance – we don’t need that crutch because we trust our fire suppression and security systems

 

/grin, duck & run

 

😊

 

show

patrickg posted this 22 April 2017

If the solution is designed for a multi-site outage and can still function, I’m not worried. If you want a large reference point, take a look O365’s architecture.

 


~Patrick

 

show

GuyTe posted this 23 April 2017

Well… now imagine a y2k18 bug or rogue ESX/Hyper-V admin or 0-day targeting your virtualization platform and you end up with either having a secondary virtualization platform

(managed by AD team???) for redundancy or you are back to physical servers.



 

Sometimes crutch is all you need.

 

Guy

 

show

ken posted this 24 April 2017

With all due respect, something like O365 is:





  1. Built relatively recently, in something closer to a greenfields environment than the legacy that

    most large enterprises have to deal with
  2. Built by a company that would place Active Directory and its capabilities at the heart of their

    solution
  3. Isn’t a particularly complex solution offering (in terms of the capabilities and SLAs it needs to

    offer)


Not to take away from the complexity or the sheer scale at which O365 operates – it is still a technical and engineering accomplishment on the first

order, but I don’t think it’s the type of evidence one can use to say that mitigations to SPOFs are “a crutch”

 

I’d hazard a guess that my environment is a bit more complex than O365, due to the accumulated legacy cruft, technology diversity, regulatory requirements

and customer expectations (we probably handle as much money in a week as O365 bills in a year, and most customers would be fairly intolerant of even the smallest mistakes). Sometimes a “crutch” is just easier/cheaper than trying to work out whether you can

survive without it, let alone redesigning everything to today’s best practises.

 

Cheers

Ken

 

 

show

Close