Entity Framework 6 performance tips

After some initial ramblings exploring ways to avoid performance problems with Entity Framework 6, this post continues in the same spirit with the following menu:

  • Data types and sizes
  • Loading mechanisms:
    1. Lazy loading
    2. Eager loading
    3. Paging

Choosing the right size

It may seem unimportant at first, but choosing appropriate data types and sizes can make a huge difference when the database starts to grow. When tables contains thousands and millions of rows, choosing an appropriate data size can lead to orders of magnitude smaller or bigger database sizes.

For example, use smallint instead of int if you know that the stored data will always be contained in the range [-32768, 32767]. Using varchar(50) instead of nvarchar(500) when you know that the first names stored will not exceed 50 letters and cannot contain unicode characters.

This not only saves valuable disk and memory storage, it also helps reduce the page life expectancy and reduces the amount of data we have to read and transfer over the network.

Entity Framework code first defaults

Entity Framework chose to err on the safe side and set the size of the data to the maximum possible, when the database schema is generated from the code, such as:

  • nvarchar(max) for string
  • decimal(18,2) for decimal numeric values
  • int for integer numerical values

While being safe and preventing bad surprises when the columns’ size is too small for the application, it will certainly also lead to a pretty nasty memory usage if these defaults are not changed and the database starts to grow.

Another problem is that creating indexes on those columns becomes impossible:

CREATE NONCLUSTERED INDEX idx_FullName ON dbo.Employee
(
    LastName ASC, FirstName ASC
)

This will fail if either column has a size ‘max’ (for example varchar(max)).

Changing data type and size after the fact

Entity Framework is pretty flexible and will let you change the data types and sizes of your columns, as long as it can convert to and from the C# types.

When modifying the columns, we have the following options:

  • alter the table manually/via script
  • code annotations
  • overriding DbContext.OnModelCreating()

Alter the table manually/via script

As mentioned above, Entity Framework allows changes to the data type and size, as long as it is compatible with the C# types.

The problem with this approach is that you lose the flexibility to be able to deploy the application to different environments. Manually managing differing database schemas is tedious and hence not recommended.

A better solution is to include it in a migration so that it is handled automatically by Entity Framework.

Code annotations

Code annotations is a simple and fast way to define size constraints to guide Entity Framework when generating the database schema. The MaxLength attribute defines the maximal size, and the Required attribute specifies that a column cannot be NULL:

class Department 
{
    [MaxLength(50)]
    [Required] // NOT NULL
    public string FirstName {get; set;}
}

Annotating the properties of your entities makes it straightforward to define appropriate limits to the size of the data.

If you want finer control over the generated schema, you can override the method OnModelCreating().

Overriding DbContext.OnModelCreating()

Entity Framework offers the possibility to override the method OnModelCreating() that is called when the model is created. In this method, you can use a model builder with a fluent API to define finer constraints on the properties of your entities:

public class CompanyContext : DbContext
{
    // ...
    protected override void OnModelCreating(DbModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Employee>()
                .Property(u => u.firstName)
                .IsUnicode(false)
                .HasMaxLength(50)
                .IsRequired(); // NOT NULL

        modelBuilder.Entity<Employee>()
            .Property(e => e.salary)
            .HasPrecision(10, 2);
    }
    // ...
}

Loading mechanisms

Lazy loading is enabled by default in Entity Framework. When a property is accessed, Entity Framework will load the corresponding entities from the database if necessary.

Lazy loading example
The following code sample loads all Persons from the database and iterates over them. The EmailAddresses are loaded lazily, separately for each Person.

public void PrintEmailAddressesLazyLoaded()
{
    using (var db = new AdventureWorks2012Entities())
    {
        IQueryable<Person> matchingPeople = 
            from people in db.People
            select people;

        foreach (Person p in matchingPeople)
        {
            string emailAddresses = string.Join(", ", p.EmailAddresses); // EmailAddresses lazy loaded
            Console.WriteLine(emailAddresses);
        }
    }
}

Eager loading example
As opposed to the previous example, the code below loads all email addresses from all people upfront, in a single query.

public void PrintEmailAddressesEagerLoaded()
{
    using (var db = new AdventureWorks2012Entities())
    {
        // all emails eagerly loaded
        var emails = from people in db.People
                     join email in db.EmailAddresses
                     on people.BusinessEntityID equals email.BusinessEntityID
                     select new { email.EmailAddress1 };

        string emailAddresses = string.Join(", ", emails);
        Console.WriteLine(emailAddresses);
    }
}

Eager versus lazy loading

There is no one true answer to the question of using whether lazy or eager loading. It depends on the context, the amount of data involved and the desired responsivity.

The following sections list situations where either method is appropriate.

When to use eager loading

  • processing requires all the data
  • full scope of the request is known
  • automated processes

In cases where we know what we need and the data we need does not depend on external factors like user input, we can load all the necessary data eagerly.

When to use lazy loading

  • navigation is unknown (typically, user inputs)
  • scope of the request is unknown
  • batch work

When we do not know exactly what data we need because it depends on external factors (like user inputs), or when the data set is too large to be fully loaded, loading the data lazily when we need it is appropriate.

Paging: combining set-based thinking with lazy loading

When the data set is large and navigation depends on user inputs, it makes sense to implement paging to load the data: we only load part of the total result set to the user an allow him to see more if he wishes so.

In the code sample below, we combine set-based thinking to define a query that will return part of the whole result set. The data is loaded lazily in the sense that we only load the pages that the user wants to see.

Paging example

public void PrintEmailAddressesPaged(int pageNumber = 0)
{
    const int pageSize = 10;
    using (var db = new AdventureWorks2012Entities())
    {
        IQueryable<string> emails = db.People
            .SelectMany(p => p.EmailAddresses)
            .OrderBy(e => e.EmailAddress1)
            .Select(e => e.EmailAddress1)
            .Skip(pageNumber * pageSize) // skip the previous pages
            .Take(pageSize);

        foreach (var e in emails)
        {
            Console.WriteLine(e);
        }
    }
}

Tip: pre-load the next page

In a scenario where we use paging to present a large set of data to the user, we can improve the user’s perception of application responsivity by pre-loading the next page (or wherever is he most likely to navigate to next) in the background, while the user is looking at the current set of data.

That way, the data is already there and the user gets a snappy response from the application, which improves the user experience.

Disabling lazy loading in Entity Framework

It is possible to disable lazy loading altogether:

public class CompanyContext : DbContext
{
    public CompanyContext() : base("name=CompanyDb")
    {
        this.Configuration.LazyLoadingEnabled = false;
    }
    // ...
}

However, one must be aware that doing so has an important impact on the behaviour of the application: the places in the code where data was loaded lazily will simply obtain empty objects and empty lists of data. No error or warning is displayed.

The previous example method PrintEmailAddressesLazyLoaded() will simply have an empty output as a result.

When lazy loading is disabled, the code must load all data eagerly, like in the PrintEmailAddressesEagerLoaded() example.

Be deliberate!

Whichever loading mechanism you choose to use, you should be deliberate and use it knowingly, aware of the impacts and tradeoffs inherent to the choice.

Do not just use or reuse code without knowing how the data is retrieved from the database. This only leads to performance problems and surprises in the behaviour of the application.

Summary

In this post, we saw:

  • that choosing a sensible data type and size is important
  • that the defaults chosen by Entity Framework are too large and need to be adapted
  • that there are three ways to modify the type and size of the columns:
    1. manually
    2. using code annotations
    3. overriding OnModelCreating() in your DbContext subclass
  • example of loading mechanisms and when to use what:
    1. lazy loading
    2. eager loading
    3. paging

As a developer, you should be aware of how the data is retrieved from the database and why it is done that way.