Category Archives: Database

A quick look at Entity Framework 7

Just a short post to recollect a few things I gathered from Pluralsight course Looking ahead to EF7.

Example project

Based on the demos of the course, I put together a small example application, using Entity Framework 7 beta6. The code is found on my github repository.

Configuring the DbContext

EF7 supports two ways to configure the DbContext:

  1. directly in the DbContext class
  2. by injecting the configuration options into the context’s constructor

Configuration in the DbContext

In this case, the context configures itself, in this case to use an in-memory database, with the OnConfiguring() method being invoked by EF.

namespace Context.EF7
{
    using Domain;
    using Microsoft.Data.Entity;
    using Microsoft.Data.Entity.Infrastructure;

    public class VideoStoreContext: DbContext
    {
        public DbSet<Movie> Movies { get; set; }
        public DbSet<Actor> Actors { get; set; }

        protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        {
            optionsBuilder.UseInMemoryDatabase();
        }
    }
}

Injecting the options into the DbContext

The alternative is to pass the options as a constructor parameter to the context.

namespace Context.EF7
{
    using Domain;
    using Microsoft.Data.Entity;
    using Microsoft.Data.Entity.Infrastructure;

    public class VideoStoreContext: DbContext
    {
        public DbSet<Movie> Movies { get; set; }
        public DbSet<Actor> Actors { get; set; }
 
        public VideoStoreContext(DbContextOptions options) : base(options)
        {

        }
    }
}

Providers

In the example above, we saw the use of the optionsBuilder:

optionsBuilder.UseInMemoryDatabase();

It is worth noting that UseInMemoryDatabase() is an extention method that gets pulled in when we include the entityframework.inmemory package.

To use SQL Server instead, we would include the package entityframework.sqlserver and configure the context as such, using the UseSqlServer() extension method from that dll:

optionsBuilder.UseSqlServer("Server=(localdb)\\mssqllocaldb; Database=EF7Demo; Trusted_Connection=True;");

Summary

When looking at both options above, we see that configuring the context with OnConfiguring() locks the underlying provider (as well as other options).

The second way, by injecting the options into the constructor, is much more flexible. This approach should be preferred to using OnConfiguring().

That way, we can, for example, use SQL Server for the application and use an in-memory provider for the tests.

The application code would look something like this:

var optionsBuilder = new DbContextOptionsBuilder();
optionsBuilder.UseSqlServer("Server = (localdb)\\mssqllocaldb; Database=EF7Samurai; Trusted_Connection=True;");

using (var context = new VideoStoreContext(optionsBuilder.Options))
{
    // do cool stuff with the database here
}

And in the tests, we use an in-memory database:

var optionsBuilder = new DbContextOptionsBuilder();
optionsBuilder.UseInMemoryDatabase(); // extension method from package entityframework.inmemory

var context = new VideoStoreContext(optionsBuilder.Options);
// test stuff
context.Dispose();

Pretty cool, huh?

SQL Server tips

To wrap up the Pluralsight course about Database performance anti-patterns, I’m going to collect a few tips and hints about SQL Server that I gathered from the course.

In this post, I will talk about indexes, the query plan cache and some SQL Server settings that help improve performance.

Indexes

We distinguish three main types of indexes. Depending on the usage of the data and the performance requirements, you want to choose to implement a different kind of index in your database schema.

Covering indexes

A covering index is an index that contains all the columns needed by a query. This enables fast execution because the query only needs to consult the index, with no additional lookup in the table. We typically want covering indexes to make searching fast.

Non-covering indexes

We call an index non-covering when it does not contain all the columns requested by a query, and an additional key lookup is necessary to retrieve the non-covered columns. This is useful for search but with an additional cost.

Heaps

A heap is not ordered and a table scan is required to find data. Because of this property, a heap is fast for INSERTs and slow for searching.

The overhead of indexing

Using indexes incurs storage and processing overhead. More storage is needed for the copies of sorted sets on the disk. Additionally, UPDATEs and INSERTs become slower because the indexes need to be re-sorted.

When considering adding an index on a column, you should take into account the ratio of reads to writes, knowing that:

  • read-heavy tables perform better with indexes
  • write-heavy tables perform worse with indexes

To find out where reads and writes are more frequent, audit the usage of your tables and columns.

Indexes often have the single most significant impact on the speed of a query, especially with large amounts of data.

Foreign keys

It is usually a good idea to put an index on a column that has a foreign key constraint.

  • it speeds up queries that do JOINs.
  • it can even speed up inserts and updates by allowing SQL Server to verify that the foreign key constraint is met.

Also, it might make sense to have the foreign key as the primary column in the clustered index, so that lookups in the table by the primary index key are faster.

Order of index keys

The order of the keys in an index do matter. Consider the following index:

CREATE NONCLUSTERED INDEX [IDX_FullName] ON [Person].[Person] 
(
    LastName ASC, FirstName ASC, MiddleName ASC
)

Queries that search for the first name only cannot take advantage of this index because this index is first sorted by last name.

Let’s have a look at the following sorted index:

Index example
LastName FirstName MiddleName
Griffin Brian D.
Griffin Meg B.
Griffin Peter A.
Smith Francine O.
Smith Roger X.
Smith Stan S.

If we only look at the FirstName column, we see that the names are not sorted, since LastName takes precedence.

Query plan cache

SQL Server maintains the compiled query plans in a cache. When it receives a query, it looks in the cache if there is already a plan that can be reused. If a corresponding plan is found, SQL Server skips the optimisation step and can directly use the compiled plan instead.

Parameterised queries can take advantage of the query plans in the cache:

select * from Person where lastname = @last; -- set @last = 'Albert'
select * from Person where lastname = @last; -- set @last = 'John'
select * from Person where lastname = @last; -- set @last = 'Mary'

The query plan is compiled once and will be reused with the different parameters.

Typically, Entity Framework parameterises the queries to improve query plan reuse.

If literals are passed instead of parameters, a new plan will likely be compiled and you lose the possibility to reuse cached plans.

Queries with literals instead of parameters:

select * from Person where lastname = 'Albert';
select * from Person where lastname = 'John';
select * from Person where lastname = 'Mary';

These queries are likely to generate different query plans.

When working with legacy applications that generate queries with literals, you can use an option to force parameterisation.

Forcing parameterisation

With SQL Server, you can force parameterisation for a database. What this does is replace literals with parameters, so query plan reuse is more likely.

ALTER DATABASE DBName SET PARAMETERIZATION FORCED;
ALTER DATABASE DBName SET PARAMETERIZATION SIMPLE; -- don't force anymore

You should be warned that the performance can actually become worse with this setting, and it should be used carefully (ie. do not “try” it directly in production 😉 ).

SQL Server automatic statistics update

The startup parameter -T2371 provides better handling of the automatic update of the statistics, using dynamic thresholds on row counts instead of a fixed percentage. The more rows there are in a table, the smaller the necessary percentage of modified rows in order to trigger a statistics update.

This flag is especially effective for large tables with many single INSERTs and UPDATEs.

You can set this flag by opening SQL Server Configuration Manager, selecting SQL Server Services, right-clicking on the server instance and choosing Properties. In the startup parameters tab, add -T2371.

Summary

In this post, we saw that there are three different kinds of indexes:

  1. covering indexes
  2. non-covering indexes
  3. heaps

We also saw that it often makes sense to define indexes on foreign keys to speed up JOINs and possibly INSERTs and UPDATEs as well. Adding an index or not depends largely on whether the usage of the table is write- or read-heavy.

Finally, we saw SQL Server options that can prove beneficial to performance:

  • forcing parameterisation
  • enhancing automatic statistics update with dynamic thresholding on large tables

Debugging SQL queries with Entity Framework 6

After taking a look at ways to improve the SQL queries generated by Entity Framework and how to dimension your data and how to retrieve it, I want to document a few more interesting tidbits I gathered from Russ Thomas’ course on Pluralsight.

This time, let’s talk about how to:

  • find out what SQL queries Entity Framework generates
  • log all the database commands
  • execute stored procedures and custom SQL queries from Entity Framework

Debugging and logging

When the performance is not where it should be, it is useful to find out what SQL queries are being executed and see where most time is being spent.

SQL Server profiler

A good way to find out what’s going on behind the scenes is to use SQL Server profiler to trace all calls to the database. That way, you can see what SQL statements are being executed, how often, and how much time they take.

Unfortunately I couldn’t get my hands on a license yet so I will leave this for later.

Using ToString() on IQueryables

With Entity Framework, you can very easily find out what the generated SQL is by calling the ToString() method on the query object itself:

using (var db = new AdventureWorks2012Entities())
{
    IQueryable<string> emails = from person in db.People
                             join email in db.EmailAddresses
                             on person.BusinessEntityID equals email.BusinessEntityID
                             select email.EmailAddress1;

    string generatedQuery = emails.ToString(); // show generated SQL query
    Console.WriteLine(generatedQuery);
}

The code above prints the following SQL statement:

SELECT 
    [Extent1].[EmailAddress] AS [EmailAddress]
    FROM [Person].[EmailAddress] AS [Extent1] 

We can see that Entity Framework optimised the query: since we are interested in all email addresses, the join on the Person table is not necessary and a simple select on EmailAddress is performed.

Logging the database operations with DbContext.Database.Log

Logging all database operations can be achieved by assigning the DbContext.Database.Log property with an action that takes a string as a parameter.

Database.Log examples:

using (var db = new CompanyContext())
using (var streamWriter = File.AppendText("database.log"))
{
    // define how to log the database operations
    db.Database.Log = Console.WriteLine;
    db.Database.Log = s => Trace.WriteLine(s);
    db.Database.Log = (string message) => { /* custom loggging */ };
    db.Database.Log = streamWriter.WriteLine;
    
    // perform database operations
    // var query = from person in db.People ...

    // disable logging
    db.Database.Log = (string message) => { /* do nothing */ };
}

You can log the way you choose and disable it later by assigning an empty Action to the Log property again.

Every interaction with the database is logged, including the time and the value of the parameters.

Creating tables

Opened connection at 25.06.2015 11:37:45 +02:00
Started transaction at 25.06.2015 11:37:45 +02:00

CREATE TABLE [dbo].[Departments] (
    [departmentID] [int] NOT NULL IDENTITY,
    [deptName] [nvarchar](max),
    [deptFloor] [nvarchar](max),
    CONSTRAINT [PK_dbo.Departments] PRIMARY KEY ([departmentID])
)
-- Executing at 25.06.2015 11:37:45 +02:00
-- Completed in 8 ms with result: 0

Inserting data

Opened connection at 25.06.2015 11:37:45 +02:00
Started transaction at 25.06.2015 11:37:45 +02:00

INSERT [dbo].[Departments]([deptName], [deptFloor])
VALUES (@0, NULL)
SELECT [departmentID]
FROM [dbo].[Departments]
WHERE @@ROWCOUNT > 0 AND [departmentID] = scope_identity()

-- @0: 'engineering' (Type = String, Size = -1)
-- Executing at 25.06.2015 11:37:45 +02:00
-- Completed in 1 ms with result: SqlDataReader
Committed transaction at 25.06.2015 11:37:45 +02:00
Closed connection at 25.06.2015 11:37:45 +02:00

Retrieving data

Opened connection at 25.06.2015 11:37:45 +02:00

SELECT 
    1 AS [C1], 
    [Extent1].[deptName] AS [deptName]
    FROM [dbo].[Departments] AS [Extent1]
-- Executing at 25.06.2015 11:37:45 +02:00
-- Completed in 0 ms with result: 0

The information found in this log can prove really useful to find errors and problems with queries. It should probably not be enabled at all times for performances reasons, though.

Intercepting commands with IDbCommandInterceptor

Another very neat option is to implement a class that implements the interface IDbCommandInterceptor and to inject it in the database configuration’s command interceptors.

For every command sent to the database, the registered interceptors are called and at that point you take actions depending on the type of the command being executed.

We will be using this mechanism to log all interactions with the database, but it can also be used for different purposes. For example, intercepting DELETEs and inserting an entry into a history table somewhere.

Before going into the details, let’s look at the interface:
IDbCommandInterceptor

namespace System.Data.Entity.Infrastructure.Interception
{
  public interface IDbCommandInterceptor : IDbInterceptor
  {
    void NonQueryExecuting(DbCommand command, DbCommandInterceptionContext<int> interceptionContext);

    void NonQueryExecuted(DbCommand command, DbCommandInterceptionContext<int> interceptionContext);

    void ReaderExecuting(DbCommand command, DbCommandInterceptionContext<DbDataReader> interceptionContext);

    void ReaderExecuted(DbCommand command, DbCommandInterceptionContext<DbDataReader> interceptionContext);

    void ScalarExecuting(DbCommand command, DbCommandInterceptionContext<object> interceptionContext);

    void ScalarExecuted(DbCommand command, DbCommandInterceptionContext<object> interceptionContext);
  }
}

There are three types of commands:

  • Scalar: commands that return a single value such as a number
  • Reader: commands that return a data set
  • NonQuery: commands with no return value, such as CREATEs and INSERTs

For each type of command, the distinction between -Executing (command start) and -Executed (command end) is made. That way, we can for example measure the execution time or recognise which commands are aborted and never finished.

We can also notice that commands of type Scalar have a result of type object, Reader a result of type DbDataReader, and NonQuery a result of type int.

For this example, I will simply log all commands in a log file.

Implementation of IDbCommandInterceptor:

public class CustomSqlLogger : IDbCommandInterceptor
{
    public void NonQueryExecuting(DbCommand command, DbCommandInterceptionContext<int> interceptionContext)
    {
        LogSql(command, interceptionContext);
    }

    public void NonQueryExecuted(DbCommand command, DbCommandInterceptionContext<int> interceptionContext)
    {
        LogSql(command, interceptionContext);
    }

    public void ReaderExecuting(DbCommand command, DbCommandInterceptionContext<DbDataReader> interceptionContext)
    {
        LogSql(command, interceptionContext);
    }

    public void ReaderExecuted(DbCommand command, DbCommandInterceptionContext<DbDataReader> interceptionContext)
    {
        LogSql(command, interceptionContext);
    }

    public void ScalarExecuting(DbCommand command, DbCommandInterceptionContext<object> interceptionContext)
    {
        LogSql(command, interceptionContext);
    }

    public void ScalarExecuted(DbCommand command, DbCommandInterceptionContext<object> interceptionContext)
    {
        LogSql(command, interceptionContext);
    }

    private void LogSql<T>(DbCommand dbCommand, DbCommandInterceptionContext<T> interceptionContext, [CallerMemberName] string callerMethodName = "")
    {
        using (var fileWriter = File.AppendText(@"sql_commands.log"))
        {
            fileWriter.WriteLine(
                "Intercepted {0}:\n{1}\nWith result: {2}\n\n",
                callerMethodName, dbCommand.CommandText, interceptionContext.Result);
        }
    }
}

Notice the clever use of CallerMemberNameAttribute to be able to print out the command type 😉

For this interceptor to be used, you need to register it in a specialised DbConfiguration class.

Custom database configuration class:

public class CustomDatabaseConfiguration : DbConfiguration
{
    public CustomDatabaseConfiguration()
    {
        AddInterceptor(new CustomSqlLogger());
    }
}

This configuration class is used automagically by Entity Framework.

Interceptors can also be injected via config file.

Configuration-based interceptor injection:

<entityFramework>
    <interceptors>
      <interceptor type="SQLLogger.CustomSqlLogger, SQLLogger"></interceptor>
    </interceptors>
</entityFramework>

Log output

This custom logging interceptor will produce the following output when running a sample application.

Scalar:

Intercepted ScalarExecuting:
SELECT Count(*)
FROM INFORMATION_SCHEMA.TABLES AS t
WHERE t.TABLE_SCHEMA + '.' + t.TABLE_NAME IN ('dbo.Departments','dbo.Employees')
    OR t.TABLE_NAME = 'EdmMetadata'
With result: 

Intercepted ScalarExecuted:
SELECT Count(*)
FROM INFORMATION_SCHEMA.TABLES AS t
WHERE t.TABLE_SCHEMA + '.' + t.TABLE_NAME IN ('dbo.Departments','dbo.Employees')
    OR t.TABLE_NAME = 'EdmMetadata'
With result: 2

The Scalar command has no result while executing, and returns the value 2 once finished.

Reader:

Intercepted ReaderExecuting:
SELECT 
    1 AS [C1], 
    [Extent1].[deptName] AS [deptName]
    FROM [dbo].[Departments] AS [Extent1]
With result: 

Intercepted ReaderExecuted:
SELECT 
    1 AS [C1], 
    [Extent1].[deptName] AS [deptName]
    FROM [dbo].[Departments] AS [Extent1]
With result: System.Data.SqlClient.SqlDataReader

This Reader command gives a SqlDataReader as a result to the query to retrieve all departments.

NonQuery:

Intercepted NonQueryExecuting:
CREATE TABLE [dbo].[Departments] (
    [departmentID] [int] NOT NULL IDENTITY,
    [deptName] [varchar](50),
    [deptFloor] [varchar](50),
    CONSTRAINT [PK_dbo.Departments] PRIMARY KEY ([departmentID])
)
With result: 0

Intercepted NonQueryExecuted:
CREATE TABLE [dbo].[Departments] (
    [departmentID] [int] NOT NULL IDENTITY,
    [deptName] [varchar](50),
    [deptFloor] [varchar](50),
    CONSTRAINT [PK_dbo.Departments] PRIMARY KEY ([departmentID])
)
With result: -1

This NonQuery command is executed by the Entity Framework the first time the application is started, to create the necessary tables.

Entity Framework tips

Disable entity tracking with extension AsNoTracking()

AsNoTracking() is an extension method that allows you to optimise your queries for large data sets by telling Entity Framework not to track the results of the query.

With this option, the results are streamed directly into memory and no additional processing is made and the entities are not added to the DbContext.

This means that entities retrieved with such a query cannot be updated since they are not know to the context. If you wish to modify and save some entities that were retrieved from a AsNoTracking() query, you first need to attach it to the context and set the correct state before doing so.

This option is especially useful when you are dealing with large quantities of data that you only want to read.

Say a company has a huge number of employees and has an overview page where all names are displayed:

IQueryable<string> query = db.Employees
                    .Select(e => e.lastName + " " + e.firstName)
                    .OrderBy(fullName => fullName)
                    .AsNoTracking();

Since we are only reading data, it makes sense to disable tracking for this query.

Execute stored procedures

When working with existing stored procedures (either legacy or by design), it is best practice to create corresponding methods in your DbContext subclass. The execution is made through ObjectContext.ExecuteFunction(), which you feed the name of the procedure as well as the list of required parameters.

Executing a stored procedure:

public partial class CompanyContext : DbContext
{
    public CompanyContext(): base("name=CompanyDb") { }

    // ...

    public virtual int UpdateEmployeeStoredProcedure(Nullable<int> businessEntityID, string jobTitle, Nullable<System.DateTime> hireDate)
    {
        var businessEntityIDParameter = businessEntityID.HasValue ?
            new ObjectParameter("BusinessEntityID", businessEntityID) :
            new ObjectParameter("BusinessEntityID", typeof(int));

        var jobTitleParameter = jobTitle != null ?
            new ObjectParameter("JobTitle", jobTitle) :
            new ObjectParameter("JobTitle", typeof(string));

        var hireDateParameter = hireDate.HasValue ?
            new ObjectParameter("HireDate", hireDate) :
            new ObjectParameter("HireDate", typeof(System.DateTime));

        return ((IObjectContextAdapter)this).ObjectContext.ExecuteFunction("uspUpdateEmployee", businessEntityIDParameter, jobTitleParameter, hireDateParameter);
    }
}

You need to cast the DbContext to IObjectContextAdapter because the implementation of ObjectContext in DbContext is explicit.

Execute custom SQL queries from Entity Framework

When you want precise control over a SQL query instead of relying on Entity Framework to generate it, you can create custom queries using DbContext.Database.SqlQuery(). The type T specifies which type Entity Framework should try to convert the results to.

Executing a custom SQL query:

public static void PrintEmployees(string firstNameToFind)
{
    using (var db = new CompanyContext())
    {
        DbRawSqlQuery<Employee> employeesSqlQuery = 
            db.Database.SqlQuery<Employee>("select top 42 * from Employees where firstName={0}",
              firstNameToFind);

        foreach (var employee in employeesSqlQuery)
        {
            Console.WriteLine(employee.firstName);
        }
    }
}

Summary

In this post, we saw four ways to debug queries generated by Entity Framework:

  • SQL Server profiler
  • using ToString() on the query object
  • logging database commands using DbContext.Database.Log
  • intercepting database commands by injecting a IDbCommandInterceptor

Additionally, we saw how to:

  • disable change tracking on queries with the extension method AsNoTracking()
  • call stored procedures using DbContext.ObjectContext.ExecuteFunction()
  • execute custom SQL queries with DbContext.Database.SqlQuery<T>()

Entity Framework 6 performance tips

After some initial ramblings exploring ways to avoid performance problems with Entity Framework 6, this post continues in the same spirit with the following menu:

  • Data types and sizes
  • Loading mechanisms:
    1. Lazy loading
    2. Eager loading
    3. Paging

Choosing the right size

It may seem unimportant at first, but choosing appropriate data types and sizes can make a huge difference when the database starts to grow. When tables contains thousands and millions of rows, choosing an appropriate data size can lead to orders of magnitude smaller or bigger database sizes.

For example, use smallint instead of int if you know that the stored data will always be contained in the range [-32768, 32767]. Using varchar(50) instead of nvarchar(500) when you know that the first names stored will not exceed 50 letters and cannot contain unicode characters.

This not only saves valuable disk and memory storage, it also helps reduce the page life expectancy and reduces the amount of data we have to read and transfer over the network.

Entity Framework code first defaults

Entity Framework chose to err on the safe side and set the size of the data to the maximum possible, when the database schema is generated from the code, such as:

  • nvarchar(max) for string
  • decimal(18,2) for decimal numeric values
  • int for integer numerical values

While being safe and preventing bad surprises when the columns’ size is too small for the application, it will certainly also lead to a pretty nasty memory usage if these defaults are not changed and the database starts to grow.

Another problem is that creating indexes on those columns becomes impossible:

CREATE NONCLUSTERED INDEX idx_FullName ON dbo.Employee
(
    LastName ASC, FirstName ASC
)

This will fail if either column has a size ‘max’ (for example varchar(max)).

Changing data type and size after the fact

Entity Framework is pretty flexible and will let you change the data types and sizes of your columns, as long as it can convert to and from the C# types.

When modifying the columns, we have the following options:

  • alter the table manually/via script
  • code annotations
  • overriding DbContext.OnModelCreating()

Alter the table manually/via script

As mentioned above, Entity Framework allows changes to the data type and size, as long as it is compatible with the C# types.

The problem with this approach is that you lose the flexibility to be able to deploy the application to different environments. Manually managing differing database schemas is tedious and hence not recommended.

A better solution is to include it in a migration so that it is handled automatically by Entity Framework.

Code annotations

Code annotations is a simple and fast way to define size constraints to guide Entity Framework when generating the database schema. The MaxLength attribute defines the maximal size, and the Required attribute specifies that a column cannot be NULL:

class Department 
{
    [MaxLength(50)]
    [Required] // NOT NULL
    public string FirstName {get; set;}
}

Annotating the properties of your entities makes it straightforward to define appropriate limits to the size of the data.

If you want finer control over the generated schema, you can override the method OnModelCreating().

Overriding DbContext.OnModelCreating()

Entity Framework offers the possibility to override the method OnModelCreating() that is called when the model is created. In this method, you can use a model builder with a fluent API to define finer constraints on the properties of your entities:

public class CompanyContext : DbContext
{
    // ...
    protected override void OnModelCreating(DbModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Employee>()
                .Property(u => u.firstName)
                .IsUnicode(false)
                .HasMaxLength(50)
                .IsRequired(); // NOT NULL

        modelBuilder.Entity<Employee>()
            .Property(e => e.salary)
            .HasPrecision(10, 2);
    }
    // ...
}

Loading mechanisms

Lazy loading is enabled by default in Entity Framework. When a property is accessed, Entity Framework will load the corresponding entities from the database if necessary.

Lazy loading example
The following code sample loads all Persons from the database and iterates over them. The EmailAddresses are loaded lazily, separately for each Person.

public void PrintEmailAddressesLazyLoaded()
{
    using (var db = new AdventureWorks2012Entities())
    {
        IQueryable<Person> matchingPeople = 
            from people in db.People
            select people;

        foreach (Person p in matchingPeople)
        {
            string emailAddresses = string.Join(", ", p.EmailAddresses); // EmailAddresses lazy loaded
            Console.WriteLine(emailAddresses);
        }
    }
}

Eager loading example
As opposed to the previous example, the code below loads all email addresses from all people upfront, in a single query.

public void PrintEmailAddressesEagerLoaded()
{
    using (var db = new AdventureWorks2012Entities())
    {
        // all emails eagerly loaded
        var emails = from people in db.People
                     join email in db.EmailAddresses
                     on people.BusinessEntityID equals email.BusinessEntityID
                     select new { email.EmailAddress1 };

        string emailAddresses = string.Join(", ", emails);
        Console.WriteLine(emailAddresses);
    }
}

Eager versus lazy loading

There is no one true answer to the question of using whether lazy or eager loading. It depends on the context, the amount of data involved and the desired responsivity.

The following sections list situations where either method is appropriate.

When to use eager loading

  • processing requires all the data
  • full scope of the request is known
  • automated processes

In cases where we know what we need and the data we need does not depend on external factors like user input, we can load all the necessary data eagerly.

When to use lazy loading

  • navigation is unknown (typically, user inputs)
  • scope of the request is unknown
  • batch work

When we do not know exactly what data we need because it depends on external factors (like user inputs), or when the data set is too large to be fully loaded, loading the data lazily when we need it is appropriate.

Paging: combining set-based thinking with lazy loading

When the data set is large and navigation depends on user inputs, it makes sense to implement paging to load the data: we only load part of the total result set to the user an allow him to see more if he wishes so.

In the code sample below, we combine set-based thinking to define a query that will return part of the whole result set. The data is loaded lazily in the sense that we only load the pages that the user wants to see.

Paging example

public void PrintEmailAddressesPaged(int pageNumber = 0)
{
    const int pageSize = 10;
    using (var db = new AdventureWorks2012Entities())
    {
        IQueryable<string> emails = db.People
            .SelectMany(p => p.EmailAddresses)
            .OrderBy(e => e.EmailAddress1)
            .Select(e => e.EmailAddress1)
            .Skip(pageNumber * pageSize) // skip the previous pages
            .Take(pageSize);

        foreach (var e in emails)
        {
            Console.WriteLine(e);
        }
    }
}

Tip: pre-load the next page

In a scenario where we use paging to present a large set of data to the user, we can improve the user’s perception of application responsivity by pre-loading the next page (or wherever is he most likely to navigate to next) in the background, while the user is looking at the current set of data.

That way, the data is already there and the user gets a snappy response from the application, which improves the user experience.

Disabling lazy loading in Entity Framework

It is possible to disable lazy loading altogether:

public class CompanyContext : DbContext
{
    public CompanyContext() : base("name=CompanyDb")
    {
        this.Configuration.LazyLoadingEnabled = false;
    }
    // ...
}

However, one must be aware that doing so has an important impact on the behaviour of the application: the places in the code where data was loaded lazily will simply obtain empty objects and empty lists of data. No error or warning is displayed.

The previous example method PrintEmailAddressesLazyLoaded() will simply have an empty output as a result.

When lazy loading is disabled, the code must load all data eagerly, like in the PrintEmailAddressesEagerLoaded() example.

Be deliberate!

Whichever loading mechanism you choose to use, you should be deliberate and use it knowingly, aware of the impacts and tradeoffs inherent to the choice.

Do not just use or reuse code without knowing how the data is retrieved from the database. This only leads to performance problems and surprises in the behaviour of the application.

Summary

In this post, we saw:

  • that choosing a sensible data type and size is important
  • that the defaults chosen by Entity Framework are too large and need to be adapted
  • that there are three ways to modify the type and size of the columns:
    1. manually
    2. using code annotations
    3. overriding OnModelCreating() in your DbContext subclass
  • example of loading mechanisms and when to use what:
    1. lazy loading
    2. eager loading
    3. paging

As a developer, you should be aware of how the data is retrieved from the database and why it is done that way.

Getting more performance out of Entity Framework 6

I’ve been wanting to catch up on my database-related knowledge and have finally gotten around to setting enough time aside to studying it. After a short refresher of basic SQL, I picked the following course on Pluralsight: Entity Framework Database Performance Anti-patterns by Russ Thomas.

This course contains lots of tips and advice on what to consider in order to keep data retrieval up to speed in a project. From the perspective of the software developer, it informs me of the necessary design considerations and trade-offs to make so that the user experience remains smooth and responsive.

Additionally, Russ explains how shifting your thinking from procedural to set-based helps find more efficient ways to retrieve data while still maintaining a high level of abstraction.

This post goes over the following subjects:

  • projection
  • nested LINQ smell
  • improving SQL queries generated by Entity Framework

YAGNI – You ain’t gonna need it

The classical mistake of the programmer is to fetch way more data from the database than what is actually needed.

Filtering the results in the client

public void PrintMatchingUserBusinessIds(string lastNameToFind)
{
    using (var db = new AdventureWorks2012Entities())
    {
         var matchingPeople = from people in db.People
            select people;

        foreach (Person p in matchingPeople)
        {
            if (string.Equals(lastNameToFind, p.LastName))
            {
                Console.WriteLine(p.BusinessEntityID);  
            }                    
        }
    }
}

The query that will get sent to SQL Server looks like this:

SELECT * FROM Person;

The problems we have here is that:

1. we load all entries from the People table, even though we are only interested in those that have the wanted last name.
2. all the columns from the People table are loaded and converted to objects, but we only use the property BusinessEntityID.

Let’s fix problem 1: filtering the results in the database

public void PrintMatchingUserBusinessIds(string lastNameToFind)
{
    using (var db = new AdventureWorks2012Entities())
    {
        var matchingPeople = from people in db.People
                     where people.LastName.Equals(lastNameToFind) // filter in the database
                     select people;

        foreach (Person p in matchingPeople)
        {
            Console.WriteLine(p.BusinessEntityID);
        }
    }
}

Now it’s a little better. The query filters the people in the database:

SELECT * FROM Person p
WHERE p.LastName = @lastname;

Projection

However, we are still fetching i-don’t-know-how-many-columns, but we need just BusinessEntityID. Let’s fix problem 2 using a projection on BusinessEntityID:

public void PrintMatchingUserBusinessIds(string lastNameToFind)
{
    using (var db = new AdventureWorks2012Entities())
    {
        var matchingPeople = from people in db.People
                     where people.LastName.Equals(lastNameToFind)
                     select new { people.BusinessEntityID }; // projection

        foreach (var p in matchingPeople)
        {
            Console.WriteLine(p.BusinessEntityID);                    
        }
    }
}

This time, Entity Framework will generate a SQL query that will look like this:

SELECT p.BusinessEntityID FROM Person p
WHERE p.LastName = @lastname;

@lastname being the parameter containing the last name we are searching for.

Non set-based work

When programming with Entity Framework or with other ORMs, it’s so easy to just think procedurally: I need to fetch these People objects, and then I am going to use this and that property from these objects.
The ORM does its job and abstracts the details of the database away from the coder, which is nice because it makes his life a easier. However, we need to be aware of the impact our code will have on the database and how we can prevent performance problems.

Take a look at the code below:

public void PrintEmailAddresses(string lastNameToFind)
{
    using (var db = new AdventureWorks2012Entities())
    {
        var matchingPeople = 
            from people in db.People 
            where people.LastName.Equals(lastNameToFind)
            select people;

        foreach (var p in matchingPeople)
        {
            foreach (var address in p.EmailAddresses)
            {
                Console.WriteLine(address.EmailAddress1);
            }
        }
    }
}

At first, it may appear pretty harmless. We find a list of people with the given last name and print all their email addresses.

What we need to be careful about is that we are navigating the property EmailAddresses of the table Person: the email addresses are located in another table EmailAddress. In essence, Entity Framework will first fetch all the people with the given last name. We will then iterate over the list of people. When we try to access p.EmailAddresses, Entity Framework will send a query to get all the email addresses for the current person. Note that a separate query is executed for every single person.

-- fetch the matching people
SELECT * FROM Person p
WHERE p.LastName = @lastname;

-- fetch the email addresses for a person
SELECT * FROM  EmailAddress e
WHERE e.BusinessEntityID = @currentPersonBusinessEntityID;

It is inefficient to execute a separate query for each Person. Obviously we should join the tables Person and EmailAddress instead. And as we saw before, we are retrieving too much data with these SELECT * statements: we are only using the EmailAddress1 property (Note: the name of the property is EmailAddress1 to avoid a collision with the name of the table).

The name of the smell is nested LINQ query. Every time we have a LINQ query or a for/foreach loop inside a loop, something is likely to be wrong.

To avoid this kind of problem, the programmer needs to shift his thinking from procedural to set-based.

Set-based thinking

When coding, software engineers are used to thinking in a procedural way.
Procedural thinking:
Get a list of people with the specified last name and print their email addresses.
We are thinking in terms of rows of data.

Set-based thinking:
Join tables Person and EmailAddress, get the email addresses for people with the specified last name. Here, we are thinking in terms of columns.

When it comes to databases, procedural thinking has a lot more overhead than set-based thinking: it produces more queries to the database and tends to fetch more data than needed. With the set-based thinking we focus on how to get to the meat (the email address). With procedural thinking we focus on obtaining People objects to use their EmailAddresses property.

Now what does set-based thinking look like in our example, and how do we make it better?

Back to the email addresses’ example

To improve the email address code sample, let’s now think in a set-based way: we have people, and we want to get to their email addresses. The columns we need are Person.LastName to match the last name and EmailAddress.EmailAddress1. To get from table Person to EmailAddress, we need to join them.

Our first try is to use Entity Framework’s Include method to tell it it needs navigate EmailAddresses.

Solution 1: Include() method

public void PrintEmailAddresses(string lastNameToFind)
{
    using (var db = new AdventureWorks2012Entities())
    {
        var matchingPeople = 
                     from people in db.People.Include("EmailAddresses") // navigate property EmailAddresses
                     where people.LastName.Equals(lastNameToFind)
                     select people;

        foreach (var p in matchingPeople)
        {
            foreach (var address in p.EmailAddresses)
            {
                Console.WriteLine(address.EmailAddress1);
            }
        }
    }
}

The code above will cause Entity Framework to perform a join:

SELECT * FROM Person p
INNER JOIN EmailAddress e ON p.BusinessEntityID = e.BusinessEntityID;
WHERE p.LastName = @lastname;

This is better, but we are reading all the properties of Person and EmailAddress. Unfortunately, we cannot give more details to Entity Framework in the Include method and tell it to do a projection on EmailAddress1.

To have more control, we need to specify our query more precisely by defining the join and the projection explicitly. To do so, we can use the LINQ syntax.

Solution 2: Explicit join with LINQ

public void PrintEmailAddresses(string lastNameToFind)
{
    using (var db = new AdventureWorks2012Entities())
    {
        var emails = from people in db.People
                     join email in db.EmailAddresses                           // explicit join
                     on people.BusinessEntityID equals email.BusinessEntityID
                     where people.LastName.Equals(lastNameToFind)
                     select new { email.EmailAddress1 };                       // projection

        foreach (var e in emails)
        {
            Console.WriteLine(e.EmailAddress1);
        }
    }
}

Thanks to our improved LINQ query, Entity Framework now produces a more efficient SQL query:

SELECT p.BusinessEntityID, e.EmailAddress1 FROM Person p
INENR JOIN EmailAddress e ON p.BusinessEntityID = e.BusinessEntityID
WHERE p.LastName = @lastname;

We now obtain the wanted list of email addresses and got rid of the nested LINQ smell by removing the inner foreach loop.

Note that with this approach, Entity Framework still includes p.BusinessEntityID in the select list, although we do not need it.

An alternative is to use a LINQ extension method chain and performing SelectMany() on EmailAddresses.

Solution 3: LINQ extension method chain with SelectMany()

public void PrintEmailAddresses(string lastNameToFind)
{
    using (var db = new AdventureWorks2012Entities())
    {
        IQueryable< string > emails = db.People
            .Where(p => p.LastName.Equals(lastNameToFind))
            .SelectMany(p => p.EmailAddresses)
            .Select(email => email.EmailAddress1);  // projection
            
        foreach (var e in emails)
        {
            Console.WriteLine(e);
        } 
    }
}

With that code, we actually do not specify the join explicitly: Entity Framework infers from our SelectMany(p => p.EmailAddresses) that it needs to join Person to EmailAddress. The SQL query now looks like this:

SELECT e.EmailAddress1 FROM Person p
INNER JOIN EmailAddress e ON p.BusinessEntityID = e.BusinessEntityID
WHERE p.LastName = @lastname;

We see here that the generated SQL query reads less information than the one from solution 2: p.BusinessEntityID is no longer in the select list. This query is closest to what one would write manually to get the email addresses.

Recap of the email example

The first code example was using a nested foreach loop, in which a navigation property was accessed, and resulted in a separate query being executed for each matching entry from the first table. This is inefficient and does not scale: for large result sets, the performance will really go down the drain.

Solution 1 showed improvement: using the method Include(“PropertyName”), we can inform Entity Framework that we will need information from another table and will generate a single query that will perform a JOIN. This alternative improves scalability. Unfortunately, Entity Framework will read all the columns of both tables and we have no way to tell it to do a projection.

In solution 2, we made up for the lack of projection from the previous solution by using LINQ syntax to explicitly join the tables and project the columns that we need. That way, we avoid read data that we do not need. This solution scales well and the amount of data we retrieve is reduced to almost the minimum. It is however noteworthy to mention that the primary key used to join the table is included in the select list, which is not strictly needed.

Solution 3 uses a LINQ method chain with SelectMany() and is the most efficient solution: improving upon solution 2, a join is performed on the tables, and this time only the necessary columns are included in the select list: the primary key is only used in the JOIN.

Depending on the amount of data and the performance, either of solution 1 to 3 can be used. If you know the size of the result set will always remain small, solution 1 is the easiest to implement. When performance is needed, either solution 2 or 3 can be used, depending on your project’s preference in terms of code style and readability. If performance is critical, solution 3 will be slightly better since it saves a read on the primary key column.

Conclusion

In this post, we saw how to prevent performance problems with Entity Framework by:

  • using projection to avoid reading data we don’t use
  • applying set-based thinking to define how to get the data from the database
  • recognising and fixing the nested LINQ smell
  • using techniques to generate more efficient queries:
    1. Include() method
    2. Explicit join with LINQ
    3. LINQ method chain with SelectMany()