I recently ran into an issue with a poor performing server. One of my usual first steps when troubleshooting an issue like this is to ask SQL Server “Where does it hurt?” by querying the sys.dm_os_wait_stats dynamic management view. Since this view is cumulative, it is often helpful to look compare two snapshots of sys.dm_os_wait_stats taken a short time apart. My favorite tool to do this is the Triage Wait Stats in SQL Server script form the fine folks at Brent Ozar Unlimited.
Upon examination of the results I observed elevated PWAIT_MD_RELATION_CACHE
and MD_LAZYCACHE_RWLOCK
wait types. I had never encountered these wait in any significant amount previously. From searching the Internet I discovered this thread over at SQL Server Central.
It turns out that this is a bug (KB2926712) in SQL Server 2012 and 2014. The problem occurs when an ALTER INDEX ... ONLINE
command is aborted due to a deadlock or user action such as the KILL
command or canceling the query. Indeed in my case there was a regularly scheduled online index rebuild that had been chosen as the deadlock victim.
The immediate resolution is to either restart the instance of SQL Server or take problematic database offline and then bring it online again. This issue has been fixed in Cumulative Update 9 for SQL Server 2012 SP1 and Cumulative Update 1 for SQL Server 2014. The issue can be avoided in the mean time by issuing the ALTER INDEX ... ONLINE
with an elevated deadlock priority. This can be accomplished using the SET DEADLOCK_PRIORITY
command.
Comments