Always ON Database Health Detection Scenario

Hi Mates,

Today I would like to share an interesting scenario and how you need to be cautious.

Database Level Health Detection has been introduced in SQL Server 2016 and it has been explained very well in the Link.

Recently as part of Migration we migrated the database from 2008 to 2016 and it is Microsoft Navision related database.

On the source itself the database has been restricted to 80 GB and as we restore the same properties got reflected on to target as well.

Before adding this database to Always ON we had nearly 10 databases part of AG Group.

All of a sudden today we got alert notification (you can check my blog for setting up ALWAYS ON Availability Group Failover alert) saying the AG group got failed over to other replica.

As part of Investigation I came to know that the recently added database has caused this .

WOW, How come  a database has led to Failover of other databases instead of it going offline?

Error Messages: --

Database XXXXX encountered an error (error type: 2 'DB_SHUTDOWN') causing failure of the availability group 'XXXXGRP'.  Refer to the SQL Server error log for information about the errors that were encountered.  If this condition persists, contact

Database XXXXX was shutdown due to error 3314 in routine 'XdesRMReadWrite::RollbackToLsn'. Restart for non-snapshot databases will be attempted after all connections to the database are aborted.

After digging further the error messages started initially with 1105.

Error Message:
Could not allocate space for object 'XXXX' in database 'XXXX' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, ad

Then later (after 2 hours) we encountered 1101 message:

Could not allocate a new page for database 'XXXX' because of insufficient disk space in filegroup 'PRIMARY'. Create the necessary space by dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for exist



















As per the above setting we have come across 1105 consistently and there after 1101 which triggered the fail-over. So be aware of such situations and make use of the setting.

I hope you learned something new. 

Comments

Durgaprasad said…
Nice article bro. Thanks for sharing