To wrap up the Pluralsight course about Database performance anti-patterns, I’m going to collect a few tips and hints about SQL Server that I gathered from the course.
In this post, I will talk about indexes, the query plan cache and some SQL Server settings that help improve performance.
We distinguish three main types of indexes. Depending on the usage of the data and the performance requirements, you want to choose to implement a different kind of index in your database schema.
A covering index is an index that contains all the columns needed by a query. This enables fast execution because the query only needs to consult the index, with no additional lookup in the table. We typically want covering indexes to make searching fast.
We call an index non-covering when it does not contain all the columns requested by a query, and an additional key lookup is necessary to retrieve the non-covered columns. This is useful for search but with an additional cost.
A heap is not ordered and a table scan is required to find data. Because of this property, a heap is fast for INSERTs and slow for searching.
The overhead of indexing
Using indexes incurs storage and processing overhead. More storage is needed for the copies of sorted sets on the disk. Additionally, UPDATEs and INSERTs become slower because the indexes need to be re-sorted.
When considering adding an index on a column, you should take into account the ratio of reads to writes, knowing that:
- read-heavy tables perform better with indexes
- write-heavy tables perform worse with indexes
To find out where reads and writes are more frequent, audit the usage of your tables and columns.
Indexes often have the single most significant impact on the speed of a query, especially with large amounts of data.
It is usually a good idea to put an index on a column that has a foreign key constraint.
- it speeds up queries that do JOINs.
- it can even speed up inserts and updates by allowing SQL Server to verify that the foreign key constraint is met.
Also, it might make sense to have the foreign key as the primary column in the clustered index, so that lookups in the table by the primary index key are faster.
Order of index keys
The order of the keys in an index do matter. Consider the following index:
CREATE NONCLUSTERED INDEX [IDX_FullName] ON [Person].[Person] ( LastName ASC, FirstName ASC, MiddleName ASC )
Queries that search for the first name only cannot take advantage of this index because this index is first sorted by last name.
Let’s have a look at the following sorted index:
If we only look at the FirstName column, we see that the names are not sorted, since LastName takes precedence.
Query plan cache
SQL Server maintains the compiled query plans in a cache. When it receives a query, it looks in the cache if there is already a plan that can be reused. If a corresponding plan is found, SQL Server skips the optimisation step and can directly use the compiled plan instead.
Parameterised queries can take advantage of the query plans in the cache:
select * from Person where lastname = @last; -- set @last = 'Albert' select * from Person where lastname = @last; -- set @last = 'John' select * from Person where lastname = @last; -- set @last = 'Mary'
The query plan is compiled once and will be reused with the different parameters.
Typically, Entity Framework parameterises the queries to improve query plan reuse.
If literals are passed instead of parameters, a new plan will likely be compiled and you lose the possibility to reuse cached plans.
Queries with literals instead of parameters:
select * from Person where lastname = 'Albert'; select * from Person where lastname = 'John'; select * from Person where lastname = 'Mary';
These queries are likely to generate different query plans.
When working with legacy applications that generate queries with literals, you can use an option to force parameterisation.
With SQL Server, you can force parameterisation for a database. What this does is replace literals with parameters, so query plan reuse is more likely.
ALTER DATABASE DBName SET PARAMETERIZATION FORCED; ALTER DATABASE DBName SET PARAMETERIZATION SIMPLE; -- don't force anymore
You should be warned that the performance can actually become worse with this setting, and it should be used carefully (ie. do not “try” it directly in production 😉 ).
SQL Server automatic statistics update
The startup parameter -T2371 provides better handling of the automatic update of the statistics, using dynamic thresholds on row counts instead of a fixed percentage. The more rows there are in a table, the smaller the necessary percentage of modified rows in order to trigger a statistics update.
This flag is especially effective for large tables with many single INSERTs and UPDATEs.
You can set this flag by opening SQL Server Configuration Manager, selecting SQL Server Services, right-clicking on the server instance and choosing Properties. In the startup parameters tab, add -T2371.
In this post, we saw that there are three different kinds of indexes:
- covering indexes
- non-covering indexes
We also saw that it often makes sense to define indexes on foreign keys to speed up JOINs and possibly INSERTs and UPDATEs as well. Adding an index or not depends largely on whether the usage of the table is write- or read-heavy.
Finally, we saw SQL Server options that can prove beneficial to performance:
- forcing parameterisation
- enhancing automatic statistics update with dynamic thresholding on large tables