The Active Directory Database Epoch / Copy Protection

This blog post will describe and go into details about the may not so known Active Directory Database Epoch / Copy Protection.

This concept that was introduced with Active Directory Application Mode) ADAM initial release (Now follows some ADAM history), but never made it to Active Directory Domain Services (AD DS) until Longhorn/WS2008 for some good reasons. One of the changes with Windows Server 2008 was that AD got exposed as a windows service allowing admins to stop, restart and start the service on DCs, this behavior already existed since day one in Active Directory Application Mode (ADAM ) first introduced as a standalone package to the web in November 2003 and was also targeting Windows XP, in Windows Server 2003 R2, Active Directory Application Mode (ADAM) Service Pack 1 (SP1) is included as a windows component (On CD2) but still ship as a download for other operating systems, Active Directory Application Mode (ADAM) Service Pack 2 (SP2) is the latest version to ship except some QFEs and Security Updates before the source code of Active Directory Application Mode (ADAM) merges into the Directory Service (DS) source depot, integrates into windows builds and get’s available again with Windows Server 2008/Windows 7 rebranded as Active Directory Lightweight Directory Service (AD LDS)  as an installable role with the operating system – no more downloads are available.

The potential issues and damages that this feature is trying to protect from, given the above.

Pre-Windows Server 2008 and ADAM it wasn’t that easy to manually restore or replace the database (DIT) using none supported restore methods for let’s say average sysadmins – you needed to boot into DSRM – Directory Services Restore Mode cause the DB was locked by LSASS/ESE and the database by default was located under C:\Windows folder.

Now with these requirements gone as peer Windows Server 2008 and ADAM/AD LDS – Microsoft wanted to prevent some scenarios:

Potentially foreseen scenarios:

• Stop service, copy off database, restart service, make changes that replicate, stop service, copy old database back in.

• Stop service, copy off database from instance1, stop second service, copy database over data files for instance2.

Both these scenarios breaks the Active Directory replication model, because two different/distinct changes could get the same <OriginatingInvocationID>:<OriginatingUSN> pair (lets call this the ChangeID).  If two changes have the same ChangeID, one of those two changes would fail to replicate, because the DSA will claim to have “seen” the change with the same ChangeID, from the previous instance of the database, and fail to replicate the new change with this ChangeID.  Also, other partners of this instance, will assume they’ve seen any new changes made that match previous changes this DSA has replicated out.

The implementation – a database epoch stored in both in the database (DIT) and the registry, during initialization of the DSA and the DB a random value is written in case of a none-existent epoch or the current epoch +1 is written both to the database (DIT) – more specifically to the  “epoch_col” column in the hiddentable and the “HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters\ DSA Database Epoch” Registry DWORD.

Rules are as following

  1. If both the registry and the database have NULL – it’s considered a match. The epoch is initially set with rand() in both the database and the registry
  2. if the value stored in the database is > than the value stored in the registry – it’s considered a match.
  3. If the value stored in the database and the registry match – it’s considered a match
  4. If either 2 or 3 the epoch is advanced by 1 in both the database and the registry, if any of the updates fail the ESE update to the DIT is rolledback.
  5. If the “HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters\ Disable DSA Database Epoch Check” Registry DWORD is set to 1 it’s considered a match.
  6. If the values do not match as per above 1-3, a “restore” is forced to get a new invocationID for the DSA/DRA and the following event is logged:

Table 1: Epoch mismatch and restore initiated.

Event IDSourceCategoryDescription
2524ActiveDirectory_DomainServiceBackupThe Directory Server detected that the database has been replaced.  This is an unsafe and unsupported operation.   User Action: None.  Active Directory Domain Services was able to recover the database in this instance, but this is not guaranteed in all circumstances. Replacing the database is strongly discouraged.  The user is strongly encouraged to use the backup and restore facility to rollback the database.

The “restore” is forced by writing/setting the “state_col” in the “hiddentable” to “4” aka “BackedupDIT” as well “uns_col” to next USN-1 and “backupexpiration_col” to the next day.

  • If there is any other failure than retrieving the epoch from the database than that the column is null/nonexistent or that the registry value is nonexistent – the DSA is going to fail init and stop hard with the error message:

Table 2: Epoch mismatch fatal

Event IDSourceCategoryDescription
2542ActiveDirectory_DomainServiceBackupThe Directory Server detected that the database has been replaced.  This is an unsafe and unsupported operation. The service will stop until the problem is corrected.   User Action: Restore the previous copy of the database that was in use on this machine. In the future, the user is strongly encouraged to use the backup and restore facility to rollback the database.   This error can be suppressed and the database repaired by removing the following registry key.   Additional Data Registry key: System\CurrentControlSet\Services\NTDS\Parameters Registry value: DSA Database Epoch

Note that this feature could have been implemented differently, technically there is no need to change/advance the epoch each time during init, not even during an originate write to the database (DIT) – however it must only really change if a new originate write is replicated off the local DSA.

I wrote this blog post because I got a question – “So what happens when distribution DIT is mounted” – If you don’t know what the distribution DIT is you can read about it here: https://blog.chrisse.se/?p=1005

The answer to the question can be figured out by reading this post (Rule 1 above), The answer explained is that the “epoch_col” is NULL in the Distribution DIT and once the DSA Initialize on the Distribution DIT for the first time the registry value don’t exist and peer above that is considered a match, a random value is written as the initial epoch to both the database (DIT) and the registry on the DSA.

Bonus: the “state_col” of a distribution DIT should be “1”.