Wait statistics contains performance counters that report information about wait status. The term "wait" means that a thread running on a processor cannot proceed because a resource it requires is unavailable. And, each performance counters are mapped to a wait types.
Checking this is a good way to identify the issues and I would recommend to first look into wait stats when you have a problem. Below are the list of counters that Wait Statistics object contains:
Lock waits - processes waiting on a lock.
Log buffer waits - processes waiting for log buffer to be available.
Log write waits - processes waiting for log buffer to be written.
Memory grant queue waits - processes waiting for memory grant to become available.
Network IO waits - relevant to wait on network I/O.
Non-Page latch waits - relevant to non-page latches.
Page IO latch waits - relevant to page I/O latches.
Page latch waits - relevant to page latches, not including I/O latches.
Thread-safe memory objects waits - processes waiting on thread-safe memory allocators.
Transaction ownership waits - relevant to processes synchronizing access to transaction.
Wait for the worker - relevant to processes waiting for worker to become available.
Workspace synchronization waits - relevant to processes synchronizing access to workspace.
And, a comprehensive Wait Types and Latches Library that was documented by SQLskills team is a powerful method to get initial direction to the problem. Below are the most common:
Means parallel operations are taking place.
There's a skewed distribution on threads or workers are being blocked.
Avoid sudden involuntary changes.
> Do NOT change server wide MAXDOP to 1, this will disable parallelism.
Additional analysis:
> Are there non-zero ID thread showing CXPACKET waits?
> Correlated with PAGEIOLATCH - implies large scans. You may see ACCESS_METHOD_XX latch.
> You can ignore CXCONSUMER waits in 2016+
Possible root-cause:
Table scans because of missing non-clustered indexes or incorrect query plan.
Out of date statistics or carnality issue causing skewed work distribution.
Possible solution:
Make sure statistics are up to date and appropriate index exist.
Proper MAXDOP and Cost of Threshold allocation.
This indicates that the SPID is waiting for access to a thread-safe memory object. The serialization makes sure that while the users are allocating or freeing the memory from the memory object, any other SPIDs that are trying to perform the same task have to wait, and the CMEMTHREAD waittype is set when they are waiting. This waittype is most frequently noticed when ad hoc query plans are being quickly inserted into a procedure cache from many different connections to the instance of SQL Server.
Possible root-cause & solution:
Limit the data that must be inserted or removed from the procedure cache, such as:
> Explicitly parameterize queries so they can be reused.
> Using stored procedures where appropriate will help reduce pressure on the procedure cache.
These are all DBCC CHECKDB related latches.
Do NOT stop running consistency checks.
Possible root-cause:
DBCC_MULTIOBJECTSCANNER latch was identified as a contention point and fixed in 2012 and under a trace flag in SQL 2008 R2 (See Bob Ward's post)
DBCC_OBJECT_METADATA is a bottleneck with computed column indexes (See Paul Randal's post)
A thread is waiting for a lock that cannot be granted because another thread is holding an incompatible lock.
Do not assume that locking is the root cause.
Possible root-cause & solution:
Large escalation from a large update or table scan.
Unnecessary locks for the data being access.
Something is preventing a transaction from releasing its lock quickly.
You may consider using snapshot isolation, different isolation or locking hints.
See locking and blocking reference by Michael Swart.
Encountered during access to File Group Control Block (FGCB) when adding, removing, shrinking or growing in the file group.
Possible root-cause & solution:
Auto growth settings for a file are very low, requiring frequent growth and concurrent use of the filegroup.
File size are set too low for the rate of data entry into the database.
Occurs when a thread hits/exhausted the quantum limit (4ms duration) and voluntary yielded. This wait type is often mistaken for CPU pressure or Spinlock issues, however it could simply be constant CPU usage.
Possible root-cause:
Inefficient SQL statements with amount of CPU time/CPU pressure.
Look for queries doing a lot of logical I/O (e.g. heap table scans, inefficient index scans, full table scans, etc), and tune them.
Possible solution:
Optimize long running queries that is causing CPU pressure.
Enable hyper-threading to give more scheduler for potential concurrent work, especially for OLTP workloads.
Validate/check VM settings for issues.
There's an excessive page splits occurring in indexes.
Exclusive Page Latch used to synchronize access to BUF structures and their associated database pages. Typically, buffer latching occurs during operations that require serialization on a buffer page, (during a page split or during the allocation of a new page, for example). These latches are not held for the duration of a transaction like locks.
Possible root-cause & solution:
Change to a non-random index key
Avoid updating index records to be longer
Provision an index FILLFACTOR to alleviate page splits.
Note: This can also happen to Tempdb system tables with large temp table creation/deletion rate.
This is used for disk-to-memory transfers.
Query workload with multiple concurrent threads to repeatedly create/drop small temp tables and/or work tables.
Cause contention on allocation bitmaps
This can happen (rarely) in user database with VERY
Possible root-cause & solution:
Tune the SQL statement waiting on this waittype. Inefficient SQL statements read more data than necessary and will try to acquire more latches than necessary as a result.
Remove use of mixed extents in all database. (Usually happen in lower version (2000 & 2005) - Add TF118 as per KB Article 328551.
Add TF3226
Create multiple data files to spread and reduce contention.
> less than equals (<=) 8 cores should be #files =#cores
> greater than (>) 8 cores, use 8 files, increase by chunks of 4
> Reference KB 2154845
This waittype indicates an SPID is waiting for a Shared Latch.Latches are used as short-term light-weight synchronization objects and are not held for the duration of a transaction like a lock would be. This type of latch is used to access non-buffer pages - they are not used to synchronize access to buffer pages (PAGELATCH_x is used for that).
Possible root-cause & solution:
Check for memory pressure.
Look for LOG and PAGELATCH_UP wait types. LATCH_x waits can often be improved by solving LOG and PAGELATCH_UP contention.
In the absence of contention, partition the table or index in question to create multiple caches (the caches are per-index).
Converting Heaps into Clustered Indexes can help.
If using SQL 2000, upgrade to SP4 which contains some improvements for this latch.
THREADPOOL - waiting for a worker thread to become available because of heavily loaded system because of a lot of parallel queries running
RESOURCE_SEMAPHORE - waiting for a query execution memory grant like sorts. And, usually indicates concurrent, memory hungry queries.
MSQL_XP - waiting for extended stored procedure call to complete.
This means the SQL Server is waiting for a client to acknowledge receipt of sent data.
Do NOT assume that the problem is network latency.
Possible root-cause and solution:
Poorly-coded application not processing fast enough.
Queries with large set sets that might be processing results one record at a time (RBAR = Row-By-Agonizing_Row) and transmitted back to the client.
Could be FTP retrieving data from SQL Server, bulk inserts (BCP) and unloads, backups to network destinations, etc.
Last resort - check for network hardware issues (see batch request per seconds and bandwidth), incorrect duplex settings or TCP chimney offload problems.
The OLEDB (Object Linking and Embedding Database) mechanism are being used.
Do NOT assume that Linked Server are being used - validate if it's being used and check what is causing the delay.
Possible root-cause:
Queries that are being fired under the Linked Server.
DBCC CHECKDB and related commands use OLEDB internally.
Poor performance of Linked Server.
Waiting for a transaction log block buffer to flush to disk.
Do not assume that the transaction log file I/O system is overloaded.
DO NOT create additional transaction log file.
Possible root-cause & solution:
Common operations that cause log flushes are checkpoints and transaction commits.
Remove unused non-clustered indexes - this will reduce the logging overhead from maintaining DML operations.
Change index keys or introduce FILLFACTORs to reduce page splits
See if it is possible to commit less often in the application or delay commits until other read-only activity finishes.
Check if the transaction logs can be put on faster disks or disks with less contention.
Check if Recovery is set to SIMPLE.
Check to see if transaction logs are sized properly.
This means, the thread is called out to Windows to write to a file.
Do not assume that it's an I/O subsystem problem.
Possible root-cause & solution:
Enable instant file initialization and set manage growth appropriately
Zeroing a large transaction log file during a restore or log file growth.
This means, the thread usually called out to the OS. Thread is staying in SQL Server but does not want to give up CPU. Also, the thread status is RUNNING instead of SUSPENDED.
Possible root-cause & solution:
Increasing PREEMPTIVE_OS_CREATEFILE waits occur when using FILESTREAM on an incorrect prepared NTFS volume.
The thread calling windows to wait on state change of synchronization object.
It usually seen with ASYNC_NETWORK_IO wait.
Possible root-cause & solution:
Follow ASYNC_NETWORK_IO resolution.
Could also be transactional replication job (such as log reader to distribution agent jobs) - See Joe Sack's post.
Sources and good reads:
IEPTO2 - SQLSkills immersion event docs (Recommended Training "https://www.sqlskills.com/sql-server-training/ ")
https://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/
https://www.sqlskills.com/blogs/paul/capturing-wait-statistics-period-time/