Category Archives: Coding

A quick look at Entity Framework 7

Just a short post to recollect a few things I gathered from Pluralsight course Looking ahead to EF7.

Example project

Based on the demos of the course, I put together a small example application, using Entity Framework 7 beta6. The code is found on my github repository.

Configuring the DbContext

EF7 supports two ways to configure the DbContext:

  1. directly in the DbContext class
  2. by injecting the configuration options into the context’s constructor

Configuration in the DbContext

In this case, the context configures itself, in this case to use an in-memory database, with the OnConfiguring() method being invoked by EF.

namespace Context.EF7
{
    using Domain;
    using Microsoft.Data.Entity;
    using Microsoft.Data.Entity.Infrastructure;

    public class VideoStoreContext: DbContext
    {
        public DbSet<Movie> Movies { get; set; }
        public DbSet<Actor> Actors { get; set; }

        protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        {
            optionsBuilder.UseInMemoryDatabase();
        }
    }
}

Injecting the options into the DbContext

The alternative is to pass the options as a constructor parameter to the context.

namespace Context.EF7
{
    using Domain;
    using Microsoft.Data.Entity;
    using Microsoft.Data.Entity.Infrastructure;

    public class VideoStoreContext: DbContext
    {
        public DbSet<Movie> Movies { get; set; }
        public DbSet<Actor> Actors { get; set; }
 
        public VideoStoreContext(DbContextOptions options) : base(options)
        {

        }
    }
}

Providers

In the example above, we saw the use of the optionsBuilder:

optionsBuilder.UseInMemoryDatabase();

It is worth noting that UseInMemoryDatabase() is an extention method that gets pulled in when we include the entityframework.inmemory package.

To use SQL Server instead, we would include the package entityframework.sqlserver and configure the context as such, using the UseSqlServer() extension method from that dll:

optionsBuilder.UseSqlServer("Server=(localdb)\\mssqllocaldb; Database=EF7Demo; Trusted_Connection=True;");

Summary

When looking at both options above, we see that configuring the context with OnConfiguring() locks the underlying provider (as well as other options).

The second way, by injecting the options into the constructor, is much more flexible. This approach should be preferred to using OnConfiguring().

That way, we can, for example, use SQL Server for the application and use an in-memory provider for the tests.

The application code would look something like this:

var optionsBuilder = new DbContextOptionsBuilder();
optionsBuilder.UseSqlServer("Server = (localdb)\\mssqllocaldb; Database=EF7Samurai; Trusted_Connection=True;");

using (var context = new VideoStoreContext(optionsBuilder.Options))
{
    // do cool stuff with the database here
}

And in the tests, we use an in-memory database:

var optionsBuilder = new DbContextOptionsBuilder();
optionsBuilder.UseInMemoryDatabase(); // extension method from package entityframework.inmemory

var context = new VideoStoreContext(optionsBuilder.Options);
// test stuff
context.Dispose();

Pretty cool, huh?

Refactoring: extract creation logic into factory

This step of the command line video store refactoring series (1, 2 and 3) is pretty straightforward: we will further simplify the loop in MainClass by moving creation logic into a separate class.

The code for starting this exercise is here. The end of this step is there as well.

The state of affairs

The Run() method of the MainClass still contains a loop having several responsibilities:

while (true)
{
    string input = _in.ReadLine();
    if (string.IsNullOrEmpty(input))
    {
        break;
    }
    string[] rentalData = input.Split(' ');
    var rental = new Rental(movieRepository.GetByKey(int.Parse(rentalData[0])), int.Parse(rentalData[1]));

    // add frequent renter points
    frequentRenterPoints += rental.GetFrequentRenterPoints();

    // show figures for this rental
    result += "\t" + rental.GetMovieName() + "\t" + rental.GetAmount().ToString("0.0", CultureInfo.InvariantCulture) + "\n";
    totalAmount += rental.GetAmount();
}

The line creating the Rental object stands out by its relative complexity:

var rental = new Rental(movieRepository.GetByKey(int.Parse(rentalData[0])), int.Parse(rentalData[1]));

All by itself, it does the following:

  1. parses the entered rental data
  2. fetches the corresponding Movie
  3. creates the Rental

Due to the several responsibilities involved, we will have to modify MainClass if the low-level parsing of the rental data changes. Let’s prevent this by moving this logic to a separate class that takes care of this.

Introduce a RentalFactory class

Extract method

The first step is to extract a method (Ctrl + R, M) out the loop for the Rental creation statement. Since I know this method will get moved elsewhere, I’m making it public already:

public Rental CreateRental(string[] rentalData)
{
    var rental = new Rental(movieRepository.GetByKey(int.Parse(rentalData[0])), int.Parse(rentalData[1]));
    return rental;
}

However, this new method used the field movieRepository, which is only accessible in MainClass. To circumvent this, we can pass the repository by applying Introduce Parameter (Ctrl + R, P) on the field:

public Rental CreateRental(string[] rentalData, MovieRepository repository)
{
    var rental = new Rental(repository.GetByKey(int.Parse(rentalData[0])), int.Parse(rentalData[1]));
    return rental;
}

Create class RentalFactory

We have two alternatives for creating this new class:

  1. add a new field of type RentalFactory to MainClass and let Resharper generate an empty class for us (see example)
  2. use Extract Class on CreateRental()

The first option, that we used in the previous step, generates a new, empty class, to which we can then move our method. The second has the advantage of creating the new class and moving the method in one go.

Let’s do it.

Point to the method CreateRental() and choose Extract Class (Ctrl + Shift + R, E).

extract-class

MainClass now looks cleaner:

public MainClass(TextReader @in, TextWriter @out)
{
    _out = @out;
    _in = @in;
    rentalFactory = new RentalFactory();
}

// ...

Rental rental = RentalFactory.CreateRental(rentalData, movieRepository);

The generated RentalFactory class:

public class RentalFactory
{
    public RentalFactory()
    {
    }

    public Rental CreateRental(string[] rentalData, MovieRepository repository)
    {
        var rental = new Rental(repository.GetByKey(int.Parse(rentalData[0])), int.Parse(rentalData[1]));
        return rental;
    }
}

We got this done with minimal manual typing, nice.

However, we still have a little work to do:

public class MainClass
{
    // ...
    private readonly MovieRepository movieRepository = new MovieRepository();
    private readonly RentalFactory rentalFactory;
    
    // why the public getter???
    public RentalFactory RentalFactory
    {
        get
        {
            return rentalFactory;
        }
    }
    // ...
}

When extracting the RentalFactory class, Resharper created a new field, and encapsulated it with a public getter. This is not necessary, and I actually do not want the factory to be available outside the class.

Let’s get rid of the getter by applying Inline Method (Ctrl + R, I) to it:

Rental rental = rentalFactory.CreateRental(rentalData, movieRepository);

Does anyone know how to prevent Resharper from creating this getter? I did not find any setting which could deactivate this behaviour.

Change parameter to field

When looking at the rentalFactory.CreateRental() call above, we see that we need to pass in a MovieRepository so that the Rental can be created.

It would be better to pass the repository into the factory into the constructor, and not having to worry about it in subsequent CreateRental() calls.

To do so, let’s go into RentalFactory, and apply Introduce Field (Ctrl + R, F) on the repository parameter:

public class RentalFactory
{
    private MovieRepository repository;

    public RentalFactory()
    {
    }

    public Rental CreateRental(string[] rentalData, MovieRepository repository)
    {
        this.repository = repository;
        var rental = new Rental(repository.GetByKey(int.Parse(rentalData[0])), int.Parse(rentalData[1]));
        return rental;
    }
}

The field is assigned in the CreateRental() method, which is not ideal, but we can leave it like that for the moment.

Also, note the empty constructor, which is convenient to have. We are now going to add a new constructor where we can inject the MovieRepository. To do so, we will use the code snippet ctorf:

code-snippet-ctorf

This snippet creates a new constructor that takes all the classes’ fields as parameters and initialises them in one go. Quite handy to have.

public class RentalFactory
{
    private MovieRepository repository;

    public RentalFactory(MovieRepository repository)
    {
        this.repository = repository;
    }

    public RentalFactory()
    {
    }

    public Rental CreateRental(string[] rentalData, MovieRepository repository)
    {
        this.repository = repository;
        var rental = new Rental(repository.GetByKey(int.Parse(rentalData[0])), int.Parse(rentalData[1]));
        return rental;
    }
}

If we didn’t have the explicit empty constructor, we would have a compile error at this point, because MainClass calls the RentalFactory constructor without parameter.

Now, we can go to the usage of that empty constructor and insert the movieRepository.

public MainClass(TextReader @in, TextWriter @out)
{
    _out = @out;
    _in = @in;
    rentalFactory = new RentalFactory(movieRepository);
}

Cleanup

Like after every refactoring, we have some cleanup to do before calling it a day.

MainClass

The initialisation of the MovieRepository and RentalFactory are not consistent: the former is initialised at its declaration, whereas the latter is in the constructor.

Let’s make it consistent by moving the initialisation of the MovieRepository field to the constructor. Resharper offers a way to do it automagically by pressing Alt + Enter while the caret is either before or after the ‘=’ sign:

move-initialisation-to-constructor

public class MainClass
{
    // ...
    private readonly MovieRepository movieRepository;
    private readonly RentalFactory rentalFactory;
    // ...
    public MainClass(TextReader @in, TextWriter @out)
    {
        _out = @out;
        _in = @in;
        movieRepository = new MovieRepository();
        rentalFactory = new RentalFactory(movieRepository);
    }
    // ...
}

RentalFactory

There are a few things we can improve in the factory:

public class RentalFactory
{
    private MovieRepository repository;

    public RentalFactory(MovieRepository repository)
    {
        this.repository = repository;
    }

    public RentalFactory()
    {
    }

    public Rental CreateRental(string[] rentalData, MovieRepository repository)
    {
        this.repository = repository;
        var rental = new Rental(repository.GetByKey(int.Parse(rentalData[0])), int.Parse(rentalData[1]));
        return rental;
    }
}

Remove unused empty constructor

The parameterless constructor is empty and is never called. Remove it with Alt + Enter or Alt + Del.

Remove the first line of CreateRental()

CreateRental() still gets the MovieRepository passed as a parameter, and uses this, instead of the field, to fetch the movie. This is useless because the field gets initialised in the constructor.

We can safely delete the redundant field assignment. Additionally, we use field this.repository instead of parameter repository when creating the Rental:

public Rental CreateRental(string[] rentalData, MovieRepository repository)
{
    var rental = new Rental(this.repository.GetByKey(int.Parse(rentalData[0])), int.Parse(rentalData[1]));
    return rental;
}

Remove unused parameter repository from CreateRental()

The parameter repository is now unused and can be removed: Alt + Enter or Alt + Del on it.
After this, we can also remove the redundant this qualifier on the field (Alt + Enter).

Rename field repository to movieRepository

Let’s make the name of the field describe itself better by renaming it from repository to movieRepository. We do this by using the Rename refactoring on it (Ctrl + R, R) on the field. In the dialog box, choose the option to also rename related symbols, so that the constructor parameter gets renamed as well.

Rename method CreateRental() to CreateFrom()

The name of the method unnecessarily repeats the word ‘Rental’: we do not need to say it again when we are working with a class named RentalFactory. Let’s rename the method to CreateFrom() by using Ctrl + R, R on it.

That way, the naming makes more sense on the caller side: the factory creates the rental from the data.

rentalFactory.CreateFrom(rentalData);

Introduce local variables for movie and daysRented

The Rental constructor takes two parameters, and at the moment, it is not clear which parameter is what. We can the code more self-describing by introducing two local variables with sensible names.

Let’s introduce a new local variable on the first constructor parameter (Ctrl + R, V), and name it movie. Note that we can take advantage of the autocomplete the Resharper provides us to choose the name.

Now we do the same for the second parameter (Ctrl + R, V) and use the suggested name daysRented.

Movie movie = movieRepository.GetByKey(int.Parse(rentalData[0]));
int daysRented = int.Parse(rentalData[1]);

Note that we could take it further and introduce another variable for the key handed to the movieRepository. But since it is already pretty obvious that it is a key by just looking at the name of the method GetByKey(), we can leave it as is with a clean coder conscience.

Inline local variable rental

This is probably a matter of taste, but when we create a variable and do nothing else with it than to just return it, I prefer to simply inline that variable and return the corresponding expression.

This assumes however that the expression is simple enough to be understood on its own, without the need to clarify it by using a descriptive variable name.

This is the case here, it is obvious that we are instantiating a new Rental object.

We hence apply Inline Variable (Ctrl + R, I).

Make field movieRepository readonly

Again a matter of taste: field movieRepository can be made readonly.

I like to mark fields as readonly, because it gives the guarantee that the field is not going to be modified anywhere.

This also prevents the intentional or deliberate introduction of temporary field smells.

Are we clean now?

After performing the cleanup steps above, we are left with the following code for the RentalFactory:

public class RentalFactory
{
    private readonly MovieRepository movieRepository;

    public RentalFactory(MovieRepository movieRepository)
    {
        this.movieRepository = movieRepository;
    }

    public Rental CreateFrom(string[] rentalData)
    {
        Movie movie = movieRepository.GetByKey(int.Parse(rentalData[0]));
        int daysRented = int.Parse(rentalData[1]);
        return new Rental(movie, daysRented);
    }
}

In my opionion, the code is minimal, simple, and self-explanatory. Any suggestions for further improvements?

Summary

In this exercise, we removed the responsibility of parsing and creating rentals from the main class to a new factory class.

We improved the code by:

  • extracting methods
  • extracting a class
  • introducing a field
  • using code snippets
  • introducing local variables to make the code better speak for itself

The code for the end of this exercise can be found here.

In the next refactoring step, we will again use split loop to separate the remaining responsibilities from the main while loop.

Refactoring: split loop

This is the third episode of the command line video store refactoring exercises (1, 2).

The focus of this step is to see how to split loops (and other control structures like if) in order to be able to extract and separate the concerns.

The state of the code at the beginning of the exercise is here, and the code from the end of the exercise there.

The Problem

Currently, MainClass is responsible for the loading, the creation and the retrieval of the Movies. This is not ideal because it has too many responsibilities. MainClass should act as a controller and define the high-level processes of the video store and should not be burdened with additional tasks such as loading and providing access to the store’s movies.

MainClass loads and creates the Movies:

// read movies from file
var movies = new Dictionary<int, Movie>();
using (FileStream fs = File.Open(@"movies.cvs", FileMode.Open, FileAccess.Read))
using (BufferedStream bs = new BufferedStream(fs))
using (StreamReader reader = new StreamReader(bs))
{
    while (!reader.EndOfStream)
    {
        string line = reader.ReadLine();
        string[] movieData = line.Split(';');
        var movie = new Movie(int.Parse(movieData[0]), movieData[1], movieData[2]);
        movies.Add(movie.Key, movie);

        _out.WriteLine(movie.Key + ": " + movie.Name);
    }
}

Further in MainClass, the movies dictionary is used to find the movie to pass on the Rental‘s constructor.

while (true)
{
    // ...
    string[] rentalData = input.Split(' ');
    var rental = new Rental(movies[int.Parse(rentalData[0])], int.Parse(rentalData[1]));
    // ...
}

The solution to The Problem

The task of providing access to domain objects is typically done by a repository. We will introduce a new class MovieRepository that will take care of loading and retrieving the Movies.

Several concerns in the loop

When we take a look at the while loop that reads user input and creates the Movies, we see that there is a second concern being handled: output to the user.

while (!reader.EndOfStream)
{
    string line = reader.ReadLine();
    string[] movieData = line.Split(';');
    var movie = new Movie(int.Parse(movieData[0]), movieData[1], movieData[2]);
    movies.Add(movie.Key, movie);

    _out.WriteLine(movie.Key + ": " + movie.Name);
}

Because of this, we cannot extract the loading from the display. When wanting to separate different concerns from a control structure such as a loop, we use the split loop pattern.

Split loop

The idea behind split loop is to duplicate the control structure at hand (once for every concern). Then, in each loop, we remove the duplicate concerns until there is only one concern per loop.

Typically, the first loop constructs a collection over which the duplicated loops will then iterate.

In this example, we have two concerns in a while loop. This loop builds a dictionary of Movies. To separate the display concern from the creation, we will add a new loop over the Movies that will take care of the display.

// read movies from file
var movies = new Dictionary<int, Movie>();
using (FileStream fs = File.Open(@"movies.cvs", FileMode.Open, FileAccess.Read))
using (BufferedStream bs = new BufferedStream(fs))
using (StreamReader reader = new StreamReader(bs))
{
    while (!reader.EndOfStream)
    {
        string line = reader.ReadLine();
        string[] movieData = line.Split(';');
        var movie = new Movie(int.Parse(movieData[0]), movieData[1], movieData[2]);
        movies.Add(movie.Key, movie);
    }
    
}
// duplicated loop on the same collection
foreach (Movie movie in movies.Values)
{
    _out.WriteLine(movie.Key + ": " + movie.Name);
}

The code above shows how we split the loop in two: we added a new foreach loop over movies and moved the _out.WriteLine(); statement over there.

Now that each loop manages one concern, we can extract methods correspondingly.

Extract methods

After splitting the loop, we can extract the behaviours into separate methods. At the moment I am only interested in the loading of the Movies, so I will not extract a method for displaying them yet. I’ll call the loading method GetAll().

Another thing I am extracting a method for is the lookup in the movies dictionary when we create Rentals, which I’ll call GetByKey().

public Dictionary<int, Movie> GetAll()
{
    var movies = new Dictionary<int, Movie>();
    using (FileStream fs = File.Open(@"movies.cvs", FileMode.Open, FileAccess.Read))
    using (BufferedStream bs = new BufferedStream(fs))
    using (StreamReader reader = new StreamReader(bs))
    {
        while (!reader.EndOfStream)
        {
            string line = reader.ReadLine();
            string[] movieData = line.Split(';');
            var movie = new Movie(int.Parse(movieData[0]), movieData[1], movieData[2]);
            movies.Add(movie.Key, movie);
        }
    }
    return movies;
}

public Movie GetByKey(Dictionary<int, Movie> movies, int key)
{
    return movies[key];
}

Note that I made the methods public and non-static, which will make it straightforward to move them to another class later.

Move the methods to a new class

Now that we have prepared methods for the new repository, we need to create a new class. We need to keep in mind that we want to move the methods GetByKey() and GetAll() to this new class.

As we saw previously, Resharper suggests possible target classes when moving instance methods: fields and parameters of the methods.

In this case, we have either no parameter (GetAll()) or types that do not belong to our code (GetByKey(Dictionary<TKey, TValue>, int)). The remaining possibility is to have a field from a class that we have control on.

Create new class as a field

In order to prepare for the move, we need to create a new class. As mentioned before, we need a field in our class to provide a move target for the methods we extracted earlier.

MovieRepository is added as a field. Let Resharper generate the class from this statement (Alt + Enter):

public class MainClass
{
    private readonly TextReader _in;
    private readonly TextWriter _out;
    private readonly MovieRepository movieRepository = new MovieRepository(); // move target
    // ...
}

Move instance methods

Use the refactoring Move instance method (Ctrl + R, O) on methods GetAll() and GetByKey() and move them to MovieRepository.

public class MovieRepository
{
    public Dictionary<int, Movie> GetAll()
    {
        var movies = new Dictionary<int, Movie>();
        using (FileStream fs = File.Open(@"movies.cvs", FileMode.Open, FileAccess.Read))
        using (BufferedStream bs = new BufferedStream(fs))
        using (StreamReader reader = new StreamReader(bs))
        {
            while (!reader.EndOfStream)
            {
                string line = reader.ReadLine();
                string[] movieData = line.Split(';');
                var movie = new Movie(int.Parse(movieData[0]), movieData[1], movieData[2]);
                movies.Add(movie.Key, movie);
            }
        }
        return movies;
    }

    public Movie GetByKey(Dictionary<int, Movie> movies, int key)
    {
        return movies[key];
    }
}

Cleaning up

The MovieRepository class is now operational, but there is still some cleaning up to do.

Load the Movies once

The GetAll() method loads the movies from the file each time it is called. Let’s improve this by moving the loading code to the constructor and storing the Movies in a field that will be reused by the repository.

With the caret on the movies variable in GetAll(), and press Ctrl + R, F to Introduce field. Choose non-static and to intialise the field in the constructor.

Introduce field

We now have a movies field that is instantiated in a new constructor.

The next thing to do is to cut the using block in GetAll() and paste it into the constructor.

We obtain the following code:

public class MovieRepository
{
    private readonly Dictionary<int, Movie> movies;

    public MovieRepository()
    {
        movies = new Dictionary<int, Movie>();
        using (FileStream fs = File.Open(@"movies.cvs", FileMode.Open, FileAccess.Read))
        using (BufferedStream bs = new BufferedStream(fs))
        using (StreamReader reader = new StreamReader(bs))
        {
            while (!reader.EndOfStream)
            {
                string line = reader.ReadLine();
                string[] movieData = line.Split(';');
                var movie = new Movie(int.Parse(movieData[0]), movieData[1], movieData[2]);
                movies.Add(movie.Key, movie);
            }
        }
    }
    
    public Dictionary<int, Movie> GetAll()
    {
        return movies;
    }
    
    // ...
}

Parameter hides field

In the GetBykey() method, you can see that the parameter movies hides the field with the same name. Moreover, callers of this method should not have to provide the movies dictionary themselves.

public Movie GetByKey(Dictionary<int, Movie> movies, int key)
{
    return movies[key];
}

Disambiguate by qualifying the field with this.

public Movie GetByKey(Dictionary<int, Movie> movies, int key)
{
    return this.movies[key];
}

We can now remove the unused parameter by pressing Alt + Enter on it, Remove parameter (and update usages). Also remove the now unnecessary this qualifier.

public Movie GetByKey(int key)
{
    return movies[key];
}

Encapsulate Dictionary<>

The fact that MovieRepository uses a Dictionary to internally store the Movie is an implementation that we want to hide so that it can be changed when needed, with no impact on the users of the class.

Hence, the GetAll() method should not return a dictionary, but a list of Movies. That way, the consumers of the repository do not have to access the dictionary’s Values property, which will make usage more readable and simpler.

Let’s fix this by having GetAll() return movies.Values.ToList() instead of movies. Use Alt + Enter on the now erroneous return statement and choose Change type of method to List<Movie>:

public List<Movie> GetAll()
{
    return movies.Values.ToList();
}

After that, the usage in MainClass can be simplified to the following:

foreach (Movie movie in movieRepository.GetAll())
{
    _out.WriteLine(movie.Key + ": " + movie.Name);
}

Summary

In this exercise, we saw how to:

  • split loops (and other control structures) by duplicating them
  • introduce a field in the source class for the target type where we want to move instance methods to

The technique to split loops can be used also for other control flow structures like if, switch and such.

The full code for the end of this exercise is located there.

In the next exercise, we will continue to move responsibilities out of MainClass.

Refactoring: introduce new class and move behaviour to it

This post is the step following the beginning of the refactoring exercise on the command line video store code base (Java, C#).

In the previously described refactoring step, we removed a primitive obsession smell and introduced a new class Movie.

The code for the beginning of this exercise can be found on my repo on branch E2. The end of this refactoring step is located here.

This time, we will continue beating that unwieldy piece of procedural code into submission by further extracting objects out of it.

Making implicit concepts explicit

If we look at the current state, we may notice that the code contains an implicit concept.

In the big while loop, we “determine amounts for the rental”. But what exactly is done there? Rental data is read from the user’s input: the movie and the number of days it was rented for. Based on this, the code calculates frequent renter point and the amount that will be billed to the customer.

There is an opportunity to make the concept of Rental explicit: a Rental is comprised of a movie and a number of days, and we can calculate the renter points and the billed amount based on this data.

This just what we need to make the code object-oriented: data and behaviour. Let’s introduce a new domain class named Rental.

Parallel change for Rental

As we did with the Movie class, we introduce a new Rental class with the parallel change strategy. After the rental data is read from user input, we create a new Rental object, taking movie and daysRented as its constructor parameters (line 14):

decimal totalAmount = 0;
int frequentRenterPoints = 0;
string result = "Rental Record for " + customerName + "\n";
while (true)
{
    string input = _in.ReadLine();
    if (string.IsNullOrEmpty(input))
    {
        break;
    }
    string[] rentalData = input.Split(' ');
    Movie movie = movies[int.Parse(rentalData[0])];
    int daysRented = int.Parse(rentalData[1]);
    var rental = new Rental(movie, daysRented); // parallel change

    decimal thisAmount = 0;
    //determine amounts for rental
    switch (movie.Category)
    {
        case "REGULAR":
            thisAmount += 2;
            if (daysRented > 2)
                thisAmount += (daysRented - 2)*1.5m;
            break;
        case "NEW_RELEASE":
            thisAmount += daysRented*3;
            break;
        case "CHILDRENS":
            thisAmount += 1.5m;
            if (daysRented > 3)
                thisAmount += (daysRented - 3)*1.5m;
            break;
    }

    // add frequent renter points
    frequentRenterPoints++;
    // add bonus for a two day new release rental
    if (movie.Category.Equals("NEW_RELEASE") && daysRented > 1)
    {
        frequentRenterPoints++;
    }
    // show figures for this rental
    result += "\t" + movie.Name + "\t" + thisAmount.ToString("0.0", CultureInfo.InvariantCulture) + "\n";
    totalAmount += thisAmount;
}

We use Alt + Enter on the constructor to have Resharper generate the class:

public class Rental
{
    public readonly Movie movie;
    public readonly int daysRented;

    public Rental(Movie movie, int daysRented)
    {
        this.movie = movie;
        this.daysRented = daysRented;
    }
}

Note that we didn’t encapsulate the fields movie and daysRented yet: from the code, I am not sure yet if we will need access to those from outside the class. Since MainClass is using local variables for movie and daysRented, we know however that we need public access now. For this reason, I am making Rental‘s fields public, and will make them private if noone needs to access them after we move the behaviour into the Rental class. If access from outside is needed, I will encapsulate the field(s) with a public getter.

Preparing for narrow change

The goal now is to replace the usage of the individual variables movie and daysRented with accesses to our new class Rental.

Since those are used in several places inside the loop, the best is to do it with a narrow change: the idea behind it is to use an existing variable, to assign it something new (ie. the change we want to apply in several places), and then to inline that variable, so that our change takes effect everywhere where the variable was used.

The advantage is that we apply the change only once. It is also quicker and safer than using search and replace in the code: inlining the variable is applied only where it makes sense (scope of the variable). With search and replace, we need to manually check each occurrence and verify whether it’s safe or not to do the change.

In this example, we want to replace:

  • movie with rental.movie
  • daysRented with rental.daysRented

To do this, we need to have the following assignments:

Movie movie = rental.movie;
int daysRented = rental.daysRented;

from the following code:

string[] rentalData = input.Split(' ');
Movie movie = movies[int.Parse(rentalData[0])];
int daysRented = int.Parse(rentalData[1]);
var rental = new Rental(movie, daysRented); // 'inline here' on both parameters

Watch out, now here is the trick:

The first thing we do is to inline the movie and daysRented given to the Rental constructor. To do so, use the refactoring Inline here while the caret is on movie / daysRented, in order to inline just this occurrence of the variable.

NOTE: unfortunately, the Inline here refactoring is not available in Resharper 8.2.3. We need to resort to copying the values by hand.

After this inlining, we move the instantiation of Rental from line 4 to line 2. After that, we can do our narrow change and assign the new values to both variables. We obtain the following:

string[] rentalData = input.Split(' ');
var rental = new Rental(movies[int.Parse(rentalData[0])], int.Parse(rentalData[1]));
// narrow change on both variables
Movie movie = rental.movie;
int daysRented = rental.daysRented;
// now we can inline the two variables above to apply the changes everywhere

Now we simply inline both these variables, and our changes are applied everywhere in the scope.

The loop looks like this:

while (true)
{
    string input = this._in.ReadLine();
    if (string.IsNullOrEmpty(input))
    {
        break;
    }
    string[] rentalData = input.Split(' ');
    var rental = new Rental(movies[int.Parse(rentalData[0])], int.Parse(rentalData[1]));

    decimal thisAmount = 0;
    //determine amounts for rental
    switch (rental.movie.Category)
    {
        case "REGULAR":
            thisAmount += 2;
            if (rental.daysRented > 2)
                thisAmount += (rental.daysRented - 2)*1.5m;
            break;
        case "NEW_RELEASE":
            thisAmount += rental.daysRented*3;
            break;
        case "CHILDRENS":
            thisAmount += 1.5m;
            if (rental.daysRented > 3)
                thisAmount += (rental.daysRented - 3)*1.5m;
            break;
    }

    // add frequent renter points
    frequentRenterPoints++;
    // add bonus for a two day new release rental
    if (rental.movie.Category.Equals("NEW_RELEASE") && rental.daysRented > 1)
    {
        frequentRenterPoints++;
    }
    // show figures for this rental
    result += "\t" + rental.movie.Name + "\t" + thisAmount.ToString("0.0", CultureInfo.InvariantCulture) + "\n";
    totalAmount += thisAmount;
}

Move behaviour to the new class

This parallel change to use Rental now allows us to move behaviour (methods) there. The responsibilities of calculating the frequent renter points and the bill amount belongs to Rental.

Extract method: GetAmount()

Let’s start with the calculation of the rental amount. We can extract a separate method from the big loop in MainClass with the Extract method refactoring:

A few points to note here:

When extracting the method, do not make it static: we want it to be an instance method of Rental.

Thanks to the previous preparation with the narrow change, this method now has a single parameter rental. This is exactly what we need to make the move to that class possible.

In this case, I already made the method public during the extraction because I know that MainClass will call it. It is also possible to leave it private while extracting and make it public with the move refactoring.

Move method: GetAmount()

We can now apply the refactoring Move instance method (Ctrl + R, O) to the method GetAmount().

When applying this refactoring, Resharper gives a list of possible targets. The possible targets are:

  • the types present in the parameters of the method
  • the types of the fields of the source class,

as long as those types belong to us and we can modify them (it wouldn’t give String as a possible destination, for example).

In this case, MainClass has no appropriate field (TextReader and TextWriter are part of the .NET framework). Hence, the only possible target type is our Rental class from the method’s parameter:
Move instance method

Extract method: GetFrequentRenterPoints()

The next thing we can extract from the while loop is the calculation of the frequent renter points.

int frequentRenterPoints = 0;
while (true)
{
    // ...
    // add frequent renter points
    frequentRenterPoints++;
    // add bonus for a two day new release rental
    if (rental.movie.Category.Equals("NEW_RELEASE") && rental.daysRented > 1)
    {
        frequentRenterPoints++;
    }
    // ...
}

However, if we extract this piece of code, we get the following:

int frequentRenterPoints = 0;
while (true)
{
    // ...
    frequentRenterPoints = GetFrequentRenterPoints(frequentRenterPoints, rental);
    // ...
}
private int GetFrequentRenterPoints(int frequentRenterPoints, Rental rental)
{
    // add frequent renter points
    frequentRenterPoints++;
    // add bonus for a two day new release rental
    if (rental.movie.Category.Equals("NEW_RELEASE") && rental.daysRented > 1)
    {
        frequentRenterPoints++;
    }
    return frequentRenterPoints;
}

This is not good. Because the same variable frequentRenterPoints is used for all rentals, we have to pass in the current state of the points in order to get the new value.

Actually, what this method should do is return the renter points for just this rental, and then we aggregate the points from all rentals to get the total in frequentRenterPoints.

The Trick

The trick is to introduce a new local variable in this scope. In this new local variable, we calculate the points for only the current rental in the loop. At the end of the code block, we add the value of this new variable to the total:

// add frequent renter points
int currentRenterPoints = 1; // new local variable
// add bonus for a two day new release rental
if (rental.movie.Category.Equals("NEW_RELEASE") && rental.daysRented > 1)
{
    currentRenterPoints++;
}
frequentRenterPoints += currentRenterPoints; // move assignment to the end

Now, we can extract the method as follows:

while (true)
{
    // add frequent renter points
    frequentRenterPoints += GetFrequentRenterPoints(rental);
}
private int GetFrequentRenterPoints(Rental rental)
{
    int currentRenterPoints = 1;
    // add bonus for a two day new release rental
    if (rental.movie.Category.Equals("NEW_RELEASE") && rental.daysRented > 1)
    {
        currentRenterPoints++;
    }
    return currentRenterPoints;
}

The method GetFrequentRenterPoints() is now ready to be moved to the Rental class.

Move method: GetFrequentRenterPoints()

After moving GetFrequentRenterPoints(), the while loop looks like this:

while (true)
{
    // ...
    decimal thisAmount = rental.GetAmount();
    // add frequent renter points
    frequentRenterPoints += rental.GetFrequentRenterPoints();

    // show figures for this rental
    result += "\t" + rental.movie.Name + "\t" + thisAmount.ToString("0.0", CultureInfo.InvariantCulture) + "\n";
    totalAmount += thisAmount;
}

Cleaning up

Do not talk to strangers

When looking at the while loop above, we see a violation of the law of Demeter: MainClass is accessing rental.movie.Name! MainClass breaks encapsulation and knows details that are internal to Rental. To fix this, let’s extract a new method GetMovieName() and move it to Rental, where it belongs.

MainClass now accesses the rental object, instead of rental.movie:

result += "\t" + rental.GetMovieName() + "\t" + thisAmount.ToString("0.0", CultureInfo.InvariantCulture) + "\n";

Field visibility

When we go back to the Rental class, we see that we can now the two fields private. We don’t need to encapsulate them.
Rental now looks like this:

public class Rental
{
    private readonly Movie movie;
    private readonly int daysRented;

    public Rental(Movie movie, int daysRented)
    {
        this.movie = movie;
        this.daysRented = daysRented;
    }

    public decimal GetAmount()
    {
        decimal thisAmount = 0;
        //determine amounts for rental
        switch (movie.Category)
        {
            case "REGULAR":
                thisAmount += 2;
                if (daysRented > 2)
                {
                    thisAmount += (daysRented - 2) * 1.5m;
                }
                break;
            case "NEW_RELEASE":
                thisAmount += daysRented * 3;
                break;
            case "CHILDRENS":
                thisAmount += 1.5m;
                if (daysRented > 3)
                {
                    thisAmount += (daysRented - 3) * 1.5m;
                }
                break;
        }
        return thisAmount;
    }

    public int GetFrequentRenterPoints()
    {
        int currentRenterPoints = 1;
        // add bonus for a two day new release rental
        if (movie.Category.Equals("NEW_RELEASE") && daysRented > 1)
        {
            currentRenterPoints++;
        }
        return currentRenterPoints;
    }

    public string GetMovieName()
    {
        return movie.Name;
    }
}

Summary

During this refactoring, we

  • created a new class for rental-related responsibilities
  • used the parallel change strategy to introduce the new class
  • used narrow change to use the new Rental class everywhere the previous variables were used
  • extracted methods and moved them to the new class
  • introduced a local variable in order to be able to extract a method

The full code for this solved exercise is located here.

In the next step of this refactoring series, we will look at how to remove the responsibility of loading and creating Movies from the MainClass.

Refactoring: introduce new class with Parallel Change

This post is the first in a series of refactorings of the command line video store exercise. You can find the original version in Jave on Rick’s repository. I will be working on the corresponding C# version.

The goal of this series is to start with procedural code and turn it into object-oriented code. This series presumes you are using Resharper.

This exercise is divided into 8 steps. In this post, I will go over the first step. You can find the full code on the E1 branch. When this step is done, the code looks like the one on the E2 branch.

In this step, we will introduce a new Movie class and move the corresponding data there.

Before you start, I suggest you take a look at the full main class here.

The movies are loaded from a file:

public void Run()
{
    // read movies from file
    var movies = new Dictionary<int, string[]>();
    using (FileStream fs = File.Open(@"movies.cvs", FileMode.Open, FileAccess.Read))
    using (BufferedStream bs = new BufferedStream(fs))
    using (StreamReader reader = new StreamReader(bs))
    {
        while (!reader.EndOfStream)
        {
            string line = reader.ReadLine();
            string[] movie = line.Split(';');
            movies.Add(int.Parse(movie[0]), movie);
            _out.WriteLine(movie[0] + ": " + movie[1]);
        }
    }
    // ...
}

The code uses a dictionary to store the movies by an integer key. You can see that a movie is represented by a primitive structure: an array of string. This array contains the following information about a movie:

  • movie[0]: the key
  • movie[1]: the name
  • movie[2]: the category

We want to replace this structure (primitive obession smell) with a Movie class that contains the data above.

To do so, we will proceed by introducing a parallel change: create a new dictionary of Movie instead of string[].

The key to the parallel change is to find all writes to the old structure and duplicating those writes to the new structure. When all writes also go to the new structure, we can start replacing the reads from the old structure to the new one.

Introduce Movie class

To introduce the Movie class, duplicate the movies variable (Ctrl + D) to movies1 Dictionary<int, Movie>. Since the Movie class does not exist yet, the code does not compile. We will fix that soon.

In the loop where the movies dictionary is filled, we add a new Movie to movies1. The Movie constructor takes the three elements from the string array as parameters: key, name and category.

Once this is done, we are ready to generate the Movie class. Place the cursor on the Movie constructor and press Alt + Enter, Create class ‘Movie’ to let Resharper create a class for us:

Create Movie

Note that we introduced a local variable for key to avoid having the parse code three times inside the loop, and that we renamed the movie[] array to movieData[] for more clarity.

Now that we have the Movie class, we have to remove the NotImplementedException from the constructor.

By default, Resharper will create the new class inside the current file, so we will need to use Alt + Enter on the class name to move the class to a separate file.

We create and initialise fields by hitting Alt + Enter on each constructor parameter and by choosing the option Introduce and intialise field.

Since we know that these fields will be used by the MainClass, we can encapsulate them so that we have public getters. This is the result:

Now the code compiles again and the tests are still green.

Parallel change: replace reads

The next step is to replace the reads on the old string array structure with our new Movie accessors. In the while loop, we replace all reads to movie[] with the corresponding property from the movie1 instance of Movie.

Every thing looks good, the tests are still green. Now we can remove the reads and writes to the old structure, since we are now using the new one everywhere.

We notice that movie[] is not used anymore in the while loop: remove it. The old movies dictionary is now only used to add the old string array to it. Remove it as well. Now that the old structure is gone, we can rename movies1 and movie1 to movies and movie:

public void Run()
{
    // read movies from file
    var movies = new Dictionary<int, Movie>();
    using (FileStream fs = File.Open(@"movies.cvs", FileMode.Open, FileAccess.Read))
    using (BufferedStream bs = new BufferedStream(fs))
    using (StreamReader reader = new StreamReader(bs))
    {
        while (!reader.EndOfStream)
        {
            string line = reader.ReadLine();
            string[] movieData = line.Split(';');
            int key = int.Parse(movieData[0]);
            var movie = new Movie(key, movieData[1], movieData[2]);
            movies.Add(key, movie);

            _out.WriteLine(movie.Key + ": " + movie.Name);
        }
    }
    // ...
    while (true)
    {
        string input = _in.ReadLine();
        if (string.IsNullOrEmpty(input))
        {
            break;
        }
        string[] rental = input.Split(' ');
        Movie movie = movies[int.Parse(rental[0])];
        decimal thisAmount = 0;

        int daysRented = int.Parse(rental[1]);
        //determine amounts for rental
        switch (movie.Category)
        {
            case "REGULAR":
                thisAmount += 2;
                if (daysRented > 2)
                    thisAmount += (daysRented - 2)*1.5m;
                break;
            case "NEW_RELEASE":
                thisAmount += daysRented*3;
                break;
            case "CHILDRENS":
                thisAmount += 1.5m;
                if (daysRented > 3)
                    thisAmount += (daysRented - 3)*1.5m;
                break;
        }

        // add frequent renter points
        frequentRenterPoints++;
        // add bonus for a two day new release rental
        if (movie.Category.Equals("NEW_RELEASE") && daysRented > 1)
        {
            frequentRenterPoints++;
        }
        // show figures for this rental
        result += "\t" + movie.Name + "\t" + thisAmount.ToString("0.0", CultureInfo.InvariantCulture) + "\n";
        totalAmount += thisAmount;
    }
    // ...
}

There is no behaviour that we can move to the Movie class, so we will leave it at that for this step.

Summary

In this step, we saw how to:

  • recognise the code smell primitive obsession.
  • quickly creating a new class by typing “new MyClass(parameters)” and using Resharper to generate the class.
  • using the parallel change strategy:
    1. introduce a new structure,
    2. duplicate all writes to the old structure for the new structure,
    3. replace reads from the old structure to use the new structure,
    4. remove the old structure.

You can find the full code on this branch.

After this refactoring, the MainClass is still pretty messy and does too many different things. In the next step, we will introduce a new class for the concept of rental.

Debugging SQL queries with Entity Framework 6

After taking a look at ways to improve the SQL queries generated by Entity Framework and how to dimension your data and how to retrieve it, I want to document a few more interesting tidbits I gathered from Russ Thomas’ course on Pluralsight.

This time, let’s talk about how to:

  • find out what SQL queries Entity Framework generates
  • log all the database commands
  • execute stored procedures and custom SQL queries from Entity Framework

Debugging and logging

When the performance is not where it should be, it is useful to find out what SQL queries are being executed and see where most time is being spent.

SQL Server profiler

A good way to find out what’s going on behind the scenes is to use SQL Server profiler to trace all calls to the database. That way, you can see what SQL statements are being executed, how often, and how much time they take.

Unfortunately I couldn’t get my hands on a license yet so I will leave this for later.

Using ToString() on IQueryables

With Entity Framework, you can very easily find out what the generated SQL is by calling the ToString() method on the query object itself:

using (var db = new AdventureWorks2012Entities())
{
    IQueryable<string> emails = from person in db.People
                             join email in db.EmailAddresses
                             on person.BusinessEntityID equals email.BusinessEntityID
                             select email.EmailAddress1;

    string generatedQuery = emails.ToString(); // show generated SQL query
    Console.WriteLine(generatedQuery);
}

The code above prints the following SQL statement:

SELECT 
    [Extent1].[EmailAddress] AS [EmailAddress]
    FROM [Person].[EmailAddress] AS [Extent1] 

We can see that Entity Framework optimised the query: since we are interested in all email addresses, the join on the Person table is not necessary and a simple select on EmailAddress is performed.

Logging the database operations with DbContext.Database.Log

Logging all database operations can be achieved by assigning the DbContext.Database.Log property with an action that takes a string as a parameter.

Database.Log examples:

using (var db = new CompanyContext())
using (var streamWriter = File.AppendText("database.log"))
{
    // define how to log the database operations
    db.Database.Log = Console.WriteLine;
    db.Database.Log = s => Trace.WriteLine(s);
    db.Database.Log = (string message) => { /* custom loggging */ };
    db.Database.Log = streamWriter.WriteLine;
    
    // perform database operations
    // var query = from person in db.People ...

    // disable logging
    db.Database.Log = (string message) => { /* do nothing */ };
}

You can log the way you choose and disable it later by assigning an empty Action to the Log property again.

Every interaction with the database is logged, including the time and the value of the parameters.

Creating tables

Opened connection at 25.06.2015 11:37:45 +02:00
Started transaction at 25.06.2015 11:37:45 +02:00

CREATE TABLE [dbo].[Departments] (
    [departmentID] [int] NOT NULL IDENTITY,
    [deptName] [nvarchar](max),
    [deptFloor] [nvarchar](max),
    CONSTRAINT [PK_dbo.Departments] PRIMARY KEY ([departmentID])
)
-- Executing at 25.06.2015 11:37:45 +02:00
-- Completed in 8 ms with result: 0

Inserting data

Opened connection at 25.06.2015 11:37:45 +02:00
Started transaction at 25.06.2015 11:37:45 +02:00

INSERT [dbo].[Departments]([deptName], [deptFloor])
VALUES (@0, NULL)
SELECT [departmentID]
FROM [dbo].[Departments]
WHERE @@ROWCOUNT > 0 AND [departmentID] = scope_identity()

-- @0: 'engineering' (Type = String, Size = -1)
-- Executing at 25.06.2015 11:37:45 +02:00
-- Completed in 1 ms with result: SqlDataReader
Committed transaction at 25.06.2015 11:37:45 +02:00
Closed connection at 25.06.2015 11:37:45 +02:00

Retrieving data

Opened connection at 25.06.2015 11:37:45 +02:00

SELECT 
    1 AS [C1], 
    [Extent1].[deptName] AS [deptName]
    FROM [dbo].[Departments] AS [Extent1]
-- Executing at 25.06.2015 11:37:45 +02:00
-- Completed in 0 ms with result: 0

The information found in this log can prove really useful to find errors and problems with queries. It should probably not be enabled at all times for performances reasons, though.

Intercepting commands with IDbCommandInterceptor

Another very neat option is to implement a class that implements the interface IDbCommandInterceptor and to inject it in the database configuration’s command interceptors.

For every command sent to the database, the registered interceptors are called and at that point you take actions depending on the type of the command being executed.

We will be using this mechanism to log all interactions with the database, but it can also be used for different purposes. For example, intercepting DELETEs and inserting an entry into a history table somewhere.

Before going into the details, let’s look at the interface:
IDbCommandInterceptor

namespace System.Data.Entity.Infrastructure.Interception
{
  public interface IDbCommandInterceptor : IDbInterceptor
  {
    void NonQueryExecuting(DbCommand command, DbCommandInterceptionContext<int> interceptionContext);

    void NonQueryExecuted(DbCommand command, DbCommandInterceptionContext<int> interceptionContext);

    void ReaderExecuting(DbCommand command, DbCommandInterceptionContext<DbDataReader> interceptionContext);

    void ReaderExecuted(DbCommand command, DbCommandInterceptionContext<DbDataReader> interceptionContext);

    void ScalarExecuting(DbCommand command, DbCommandInterceptionContext<object> interceptionContext);

    void ScalarExecuted(DbCommand command, DbCommandInterceptionContext<object> interceptionContext);
  }
}

There are three types of commands:

  • Scalar: commands that return a single value such as a number
  • Reader: commands that return a data set
  • NonQuery: commands with no return value, such as CREATEs and INSERTs

For each type of command, the distinction between -Executing (command start) and -Executed (command end) is made. That way, we can for example measure the execution time or recognise which commands are aborted and never finished.

We can also notice that commands of type Scalar have a result of type object, Reader a result of type DbDataReader, and NonQuery a result of type int.

For this example, I will simply log all commands in a log file.

Implementation of IDbCommandInterceptor:

public class CustomSqlLogger : IDbCommandInterceptor
{
    public void NonQueryExecuting(DbCommand command, DbCommandInterceptionContext<int> interceptionContext)
    {
        LogSql(command, interceptionContext);
    }

    public void NonQueryExecuted(DbCommand command, DbCommandInterceptionContext<int> interceptionContext)
    {
        LogSql(command, interceptionContext);
    }

    public void ReaderExecuting(DbCommand command, DbCommandInterceptionContext<DbDataReader> interceptionContext)
    {
        LogSql(command, interceptionContext);
    }

    public void ReaderExecuted(DbCommand command, DbCommandInterceptionContext<DbDataReader> interceptionContext)
    {
        LogSql(command, interceptionContext);
    }

    public void ScalarExecuting(DbCommand command, DbCommandInterceptionContext<object> interceptionContext)
    {
        LogSql(command, interceptionContext);
    }

    public void ScalarExecuted(DbCommand command, DbCommandInterceptionContext<object> interceptionContext)
    {
        LogSql(command, interceptionContext);
    }

    private void LogSql<T>(DbCommand dbCommand, DbCommandInterceptionContext<T> interceptionContext, [CallerMemberName] string callerMethodName = "")
    {
        using (var fileWriter = File.AppendText(@"sql_commands.log"))
        {
            fileWriter.WriteLine(
                "Intercepted {0}:\n{1}\nWith result: {2}\n\n",
                callerMethodName, dbCommand.CommandText, interceptionContext.Result);
        }
    }
}

Notice the clever use of CallerMemberNameAttribute to be able to print out the command type 😉

For this interceptor to be used, you need to register it in a specialised DbConfiguration class.

Custom database configuration class:

public class CustomDatabaseConfiguration : DbConfiguration
{
    public CustomDatabaseConfiguration()
    {
        AddInterceptor(new CustomSqlLogger());
    }
}

This configuration class is used automagically by Entity Framework.

Interceptors can also be injected via config file.

Configuration-based interceptor injection:

<entityFramework>
    <interceptors>
      <interceptor type="SQLLogger.CustomSqlLogger, SQLLogger"></interceptor>
    </interceptors>
</entityFramework>

Log output

This custom logging interceptor will produce the following output when running a sample application.

Scalar:

Intercepted ScalarExecuting:
SELECT Count(*)
FROM INFORMATION_SCHEMA.TABLES AS t
WHERE t.TABLE_SCHEMA + '.' + t.TABLE_NAME IN ('dbo.Departments','dbo.Employees')
    OR t.TABLE_NAME = 'EdmMetadata'
With result: 

Intercepted ScalarExecuted:
SELECT Count(*)
FROM INFORMATION_SCHEMA.TABLES AS t
WHERE t.TABLE_SCHEMA + '.' + t.TABLE_NAME IN ('dbo.Departments','dbo.Employees')
    OR t.TABLE_NAME = 'EdmMetadata'
With result: 2

The Scalar command has no result while executing, and returns the value 2 once finished.

Reader:

Intercepted ReaderExecuting:
SELECT 
    1 AS [C1], 
    [Extent1].[deptName] AS [deptName]
    FROM [dbo].[Departments] AS [Extent1]
With result: 

Intercepted ReaderExecuted:
SELECT 
    1 AS [C1], 
    [Extent1].[deptName] AS [deptName]
    FROM [dbo].[Departments] AS [Extent1]
With result: System.Data.SqlClient.SqlDataReader

This Reader command gives a SqlDataReader as a result to the query to retrieve all departments.

NonQuery:

Intercepted NonQueryExecuting:
CREATE TABLE [dbo].[Departments] (
    [departmentID] [int] NOT NULL IDENTITY,
    [deptName] [varchar](50),
    [deptFloor] [varchar](50),
    CONSTRAINT [PK_dbo.Departments] PRIMARY KEY ([departmentID])
)
With result: 0

Intercepted NonQueryExecuted:
CREATE TABLE [dbo].[Departments] (
    [departmentID] [int] NOT NULL IDENTITY,
    [deptName] [varchar](50),
    [deptFloor] [varchar](50),
    CONSTRAINT [PK_dbo.Departments] PRIMARY KEY ([departmentID])
)
With result: -1

This NonQuery command is executed by the Entity Framework the first time the application is started, to create the necessary tables.

Entity Framework tips

Disable entity tracking with extension AsNoTracking()

AsNoTracking() is an extension method that allows you to optimise your queries for large data sets by telling Entity Framework not to track the results of the query.

With this option, the results are streamed directly into memory and no additional processing is made and the entities are not added to the DbContext.

This means that entities retrieved with such a query cannot be updated since they are not know to the context. If you wish to modify and save some entities that were retrieved from a AsNoTracking() query, you first need to attach it to the context and set the correct state before doing so.

This option is especially useful when you are dealing with large quantities of data that you only want to read.

Say a company has a huge number of employees and has an overview page where all names are displayed:

IQueryable<string> query = db.Employees
                    .Select(e => e.lastName + " " + e.firstName)
                    .OrderBy(fullName => fullName)
                    .AsNoTracking();

Since we are only reading data, it makes sense to disable tracking for this query.

Execute stored procedures

When working with existing stored procedures (either legacy or by design), it is best practice to create corresponding methods in your DbContext subclass. The execution is made through ObjectContext.ExecuteFunction(), which you feed the name of the procedure as well as the list of required parameters.

Executing a stored procedure:

public partial class CompanyContext : DbContext
{
    public CompanyContext(): base("name=CompanyDb") { }

    // ...

    public virtual int UpdateEmployeeStoredProcedure(Nullable<int> businessEntityID, string jobTitle, Nullable<System.DateTime> hireDate)
    {
        var businessEntityIDParameter = businessEntityID.HasValue ?
            new ObjectParameter("BusinessEntityID", businessEntityID) :
            new ObjectParameter("BusinessEntityID", typeof(int));

        var jobTitleParameter = jobTitle != null ?
            new ObjectParameter("JobTitle", jobTitle) :
            new ObjectParameter("JobTitle", typeof(string));

        var hireDateParameter = hireDate.HasValue ?
            new ObjectParameter("HireDate", hireDate) :
            new ObjectParameter("HireDate", typeof(System.DateTime));

        return ((IObjectContextAdapter)this).ObjectContext.ExecuteFunction("uspUpdateEmployee", businessEntityIDParameter, jobTitleParameter, hireDateParameter);
    }
}

You need to cast the DbContext to IObjectContextAdapter because the implementation of ObjectContext in DbContext is explicit.

Execute custom SQL queries from Entity Framework

When you want precise control over a SQL query instead of relying on Entity Framework to generate it, you can create custom queries using DbContext.Database.SqlQuery(). The type T specifies which type Entity Framework should try to convert the results to.

Executing a custom SQL query:

public static void PrintEmployees(string firstNameToFind)
{
    using (var db = new CompanyContext())
    {
        DbRawSqlQuery<Employee> employeesSqlQuery = 
            db.Database.SqlQuery<Employee>("select top 42 * from Employees where firstName={0}",
              firstNameToFind);

        foreach (var employee in employeesSqlQuery)
        {
            Console.WriteLine(employee.firstName);
        }
    }
}

Summary

In this post, we saw four ways to debug queries generated by Entity Framework:

  • SQL Server profiler
  • using ToString() on the query object
  • logging database commands using DbContext.Database.Log
  • intercepting database commands by injecting a IDbCommandInterceptor

Additionally, we saw how to:

  • disable change tracking on queries with the extension method AsNoTracking()
  • call stored procedures using DbContext.ObjectContext.ExecuteFunction()
  • execute custom SQL queries with DbContext.Database.SqlQuery<T>()

Entity Framework 6 performance tips

After some initial ramblings exploring ways to avoid performance problems with Entity Framework 6, this post continues in the same spirit with the following menu:

  • Data types and sizes
  • Loading mechanisms:
    1. Lazy loading
    2. Eager loading
    3. Paging

Choosing the right size

It may seem unimportant at first, but choosing appropriate data types and sizes can make a huge difference when the database starts to grow. When tables contains thousands and millions of rows, choosing an appropriate data size can lead to orders of magnitude smaller or bigger database sizes.

For example, use smallint instead of int if you know that the stored data will always be contained in the range [-32768, 32767]. Using varchar(50) instead of nvarchar(500) when you know that the first names stored will not exceed 50 letters and cannot contain unicode characters.

This not only saves valuable disk and memory storage, it also helps reduce the page life expectancy and reduces the amount of data we have to read and transfer over the network.

Entity Framework code first defaults

Entity Framework chose to err on the safe side and set the size of the data to the maximum possible, when the database schema is generated from the code, such as:

  • nvarchar(max) for string
  • decimal(18,2) for decimal numeric values
  • int for integer numerical values

While being safe and preventing bad surprises when the columns’ size is too small for the application, it will certainly also lead to a pretty nasty memory usage if these defaults are not changed and the database starts to grow.

Another problem is that creating indexes on those columns becomes impossible:

CREATE NONCLUSTERED INDEX idx_FullName ON dbo.Employee
(
    LastName ASC, FirstName ASC
)

This will fail if either column has a size ‘max’ (for example varchar(max)).

Changing data type and size after the fact

Entity Framework is pretty flexible and will let you change the data types and sizes of your columns, as long as it can convert to and from the C# types.

When modifying the columns, we have the following options:

  • alter the table manually/via script
  • code annotations
  • overriding DbContext.OnModelCreating()

Alter the table manually/via script

As mentioned above, Entity Framework allows changes to the data type and size, as long as it is compatible with the C# types.

The problem with this approach is that you lose the flexibility to be able to deploy the application to different environments. Manually managing differing database schemas is tedious and hence not recommended.

A better solution is to include it in a migration so that it is handled automatically by Entity Framework.

Code annotations

Code annotations is a simple and fast way to define size constraints to guide Entity Framework when generating the database schema. The MaxLength attribute defines the maximal size, and the Required attribute specifies that a column cannot be NULL:

class Department 
{
    [MaxLength(50)]
    [Required] // NOT NULL
    public string FirstName {get; set;}
}

Annotating the properties of your entities makes it straightforward to define appropriate limits to the size of the data.

If you want finer control over the generated schema, you can override the method OnModelCreating().

Overriding DbContext.OnModelCreating()

Entity Framework offers the possibility to override the method OnModelCreating() that is called when the model is created. In this method, you can use a model builder with a fluent API to define finer constraints on the properties of your entities:

public class CompanyContext : DbContext
{
    // ...
    protected override void OnModelCreating(DbModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Employee>()
                .Property(u => u.firstName)
                .IsUnicode(false)
                .HasMaxLength(50)
                .IsRequired(); // NOT NULL

        modelBuilder.Entity<Employee>()
            .Property(e => e.salary)
            .HasPrecision(10, 2);
    }
    // ...
}

Loading mechanisms

Lazy loading is enabled by default in Entity Framework. When a property is accessed, Entity Framework will load the corresponding entities from the database if necessary.

Lazy loading example
The following code sample loads all Persons from the database and iterates over them. The EmailAddresses are loaded lazily, separately for each Person.

public void PrintEmailAddressesLazyLoaded()
{
    using (var db = new AdventureWorks2012Entities())
    {
        IQueryable<Person> matchingPeople = 
            from people in db.People
            select people;

        foreach (Person p in matchingPeople)
        {
            string emailAddresses = string.Join(", ", p.EmailAddresses); // EmailAddresses lazy loaded
            Console.WriteLine(emailAddresses);
        }
    }
}

Eager loading example
As opposed to the previous example, the code below loads all email addresses from all people upfront, in a single query.

public void PrintEmailAddressesEagerLoaded()
{
    using (var db = new AdventureWorks2012Entities())
    {
        // all emails eagerly loaded
        var emails = from people in db.People
                     join email in db.EmailAddresses
                     on people.BusinessEntityID equals email.BusinessEntityID
                     select new { email.EmailAddress1 };

        string emailAddresses = string.Join(", ", emails);
        Console.WriteLine(emailAddresses);
    }
}

Eager versus lazy loading

There is no one true answer to the question of using whether lazy or eager loading. It depends on the context, the amount of data involved and the desired responsivity.

The following sections list situations where either method is appropriate.

When to use eager loading

  • processing requires all the data
  • full scope of the request is known
  • automated processes

In cases where we know what we need and the data we need does not depend on external factors like user input, we can load all the necessary data eagerly.

When to use lazy loading

  • navigation is unknown (typically, user inputs)
  • scope of the request is unknown
  • batch work

When we do not know exactly what data we need because it depends on external factors (like user inputs), or when the data set is too large to be fully loaded, loading the data lazily when we need it is appropriate.

Paging: combining set-based thinking with lazy loading

When the data set is large and navigation depends on user inputs, it makes sense to implement paging to load the data: we only load part of the total result set to the user an allow him to see more if he wishes so.

In the code sample below, we combine set-based thinking to define a query that will return part of the whole result set. The data is loaded lazily in the sense that we only load the pages that the user wants to see.

Paging example

public void PrintEmailAddressesPaged(int pageNumber = 0)
{
    const int pageSize = 10;
    using (var db = new AdventureWorks2012Entities())
    {
        IQueryable<string> emails = db.People
            .SelectMany(p => p.EmailAddresses)
            .OrderBy(e => e.EmailAddress1)
            .Select(e => e.EmailAddress1)
            .Skip(pageNumber * pageSize) // skip the previous pages
            .Take(pageSize);

        foreach (var e in emails)
        {
            Console.WriteLine(e);
        }
    }
}

Tip: pre-load the next page

In a scenario where we use paging to present a large set of data to the user, we can improve the user’s perception of application responsivity by pre-loading the next page (or wherever is he most likely to navigate to next) in the background, while the user is looking at the current set of data.

That way, the data is already there and the user gets a snappy response from the application, which improves the user experience.

Disabling lazy loading in Entity Framework

It is possible to disable lazy loading altogether:

public class CompanyContext : DbContext
{
    public CompanyContext() : base("name=CompanyDb")
    {
        this.Configuration.LazyLoadingEnabled = false;
    }
    // ...
}

However, one must be aware that doing so has an important impact on the behaviour of the application: the places in the code where data was loaded lazily will simply obtain empty objects and empty lists of data. No error or warning is displayed.

The previous example method PrintEmailAddressesLazyLoaded() will simply have an empty output as a result.

When lazy loading is disabled, the code must load all data eagerly, like in the PrintEmailAddressesEagerLoaded() example.

Be deliberate!

Whichever loading mechanism you choose to use, you should be deliberate and use it knowingly, aware of the impacts and tradeoffs inherent to the choice.

Do not just use or reuse code without knowing how the data is retrieved from the database. This only leads to performance problems and surprises in the behaviour of the application.

Summary

In this post, we saw:

  • that choosing a sensible data type and size is important
  • that the defaults chosen by Entity Framework are too large and need to be adapted
  • that there are three ways to modify the type and size of the columns:
    1. manually
    2. using code annotations
    3. overriding OnModelCreating() in your DbContext subclass
  • example of loading mechanisms and when to use what:
    1. lazy loading
    2. eager loading
    3. paging

As a developer, you should be aware of how the data is retrieved from the database and why it is done that way.

Getting more performance out of Entity Framework 6

I’ve been wanting to catch up on my database-related knowledge and have finally gotten around to setting enough time aside to studying it. After a short refresher of basic SQL, I picked the following course on Pluralsight: Entity Framework Database Performance Anti-patterns by Russ Thomas.

This course contains lots of tips and advice on what to consider in order to keep data retrieval up to speed in a project. From the perspective of the software developer, it informs me of the necessary design considerations and trade-offs to make so that the user experience remains smooth and responsive.

Additionally, Russ explains how shifting your thinking from procedural to set-based helps find more efficient ways to retrieve data while still maintaining a high level of abstraction.

This post goes over the following subjects:

  • projection
  • nested LINQ smell
  • improving SQL queries generated by Entity Framework

YAGNI – You ain’t gonna need it

The classical mistake of the programmer is to fetch way more data from the database than what is actually needed.

Filtering the results in the client

public void PrintMatchingUserBusinessIds(string lastNameToFind)
{
    using (var db = new AdventureWorks2012Entities())
    {
         var matchingPeople = from people in db.People
            select people;

        foreach (Person p in matchingPeople)
        {
            if (string.Equals(lastNameToFind, p.LastName))
            {
                Console.WriteLine(p.BusinessEntityID);  
            }                    
        }
    }
}

The query that will get sent to SQL Server looks like this:

SELECT * FROM Person;

The problems we have here is that:

1. we load all entries from the People table, even though we are only interested in those that have the wanted last name.
2. all the columns from the People table are loaded and converted to objects, but we only use the property BusinessEntityID.

Let’s fix problem 1: filtering the results in the database

public void PrintMatchingUserBusinessIds(string lastNameToFind)
{
    using (var db = new AdventureWorks2012Entities())
    {
        var matchingPeople = from people in db.People
                     where people.LastName.Equals(lastNameToFind) // filter in the database
                     select people;

        foreach (Person p in matchingPeople)
        {
            Console.WriteLine(p.BusinessEntityID);
        }
    }
}

Now it’s a little better. The query filters the people in the database:

SELECT * FROM Person p
WHERE p.LastName = @lastname;

Projection

However, we are still fetching i-don’t-know-how-many-columns, but we need just BusinessEntityID. Let’s fix problem 2 using a projection on BusinessEntityID:

public void PrintMatchingUserBusinessIds(string lastNameToFind)
{
    using (var db = new AdventureWorks2012Entities())
    {
        var matchingPeople = from people in db.People
                     where people.LastName.Equals(lastNameToFind)
                     select new { people.BusinessEntityID }; // projection

        foreach (var p in matchingPeople)
        {
            Console.WriteLine(p.BusinessEntityID);                    
        }
    }
}

This time, Entity Framework will generate a SQL query that will look like this:

SELECT p.BusinessEntityID FROM Person p
WHERE p.LastName = @lastname;

@lastname being the parameter containing the last name we are searching for.

Non set-based work

When programming with Entity Framework or with other ORMs, it’s so easy to just think procedurally: I need to fetch these People objects, and then I am going to use this and that property from these objects.
The ORM does its job and abstracts the details of the database away from the coder, which is nice because it makes his life a easier. However, we need to be aware of the impact our code will have on the database and how we can prevent performance problems.

Take a look at the code below:

public void PrintEmailAddresses(string lastNameToFind)
{
    using (var db = new AdventureWorks2012Entities())
    {
        var matchingPeople = 
            from people in db.People 
            where people.LastName.Equals(lastNameToFind)
            select people;

        foreach (var p in matchingPeople)
        {
            foreach (var address in p.EmailAddresses)
            {
                Console.WriteLine(address.EmailAddress1);
            }
        }
    }
}

At first, it may appear pretty harmless. We find a list of people with the given last name and print all their email addresses.

What we need to be careful about is that we are navigating the property EmailAddresses of the table Person: the email addresses are located in another table EmailAddress. In essence, Entity Framework will first fetch all the people with the given last name. We will then iterate over the list of people. When we try to access p.EmailAddresses, Entity Framework will send a query to get all the email addresses for the current person. Note that a separate query is executed for every single person.

-- fetch the matching people
SELECT * FROM Person p
WHERE p.LastName = @lastname;

-- fetch the email addresses for a person
SELECT * FROM  EmailAddress e
WHERE e.BusinessEntityID = @currentPersonBusinessEntityID;

It is inefficient to execute a separate query for each Person. Obviously we should join the tables Person and EmailAddress instead. And as we saw before, we are retrieving too much data with these SELECT * statements: we are only using the EmailAddress1 property (Note: the name of the property is EmailAddress1 to avoid a collision with the name of the table).

The name of the smell is nested LINQ query. Every time we have a LINQ query or a for/foreach loop inside a loop, something is likely to be wrong.

To avoid this kind of problem, the programmer needs to shift his thinking from procedural to set-based.

Set-based thinking

When coding, software engineers are used to thinking in a procedural way.
Procedural thinking:
Get a list of people with the specified last name and print their email addresses.
We are thinking in terms of rows of data.

Set-based thinking:
Join tables Person and EmailAddress, get the email addresses for people with the specified last name. Here, we are thinking in terms of columns.

When it comes to databases, procedural thinking has a lot more overhead than set-based thinking: it produces more queries to the database and tends to fetch more data than needed. With the set-based thinking we focus on how to get to the meat (the email address). With procedural thinking we focus on obtaining People objects to use their EmailAddresses property.

Now what does set-based thinking look like in our example, and how do we make it better?

Back to the email addresses’ example

To improve the email address code sample, let’s now think in a set-based way: we have people, and we want to get to their email addresses. The columns we need are Person.LastName to match the last name and EmailAddress.EmailAddress1. To get from table Person to EmailAddress, we need to join them.

Our first try is to use Entity Framework’s Include method to tell it it needs navigate EmailAddresses.

Solution 1: Include() method

public void PrintEmailAddresses(string lastNameToFind)
{
    using (var db = new AdventureWorks2012Entities())
    {
        var matchingPeople = 
                     from people in db.People.Include("EmailAddresses") // navigate property EmailAddresses
                     where people.LastName.Equals(lastNameToFind)
                     select people;

        foreach (var p in matchingPeople)
        {
            foreach (var address in p.EmailAddresses)
            {
                Console.WriteLine(address.EmailAddress1);
            }
        }
    }
}

The code above will cause Entity Framework to perform a join:

SELECT * FROM Person p
INNER JOIN EmailAddress e ON p.BusinessEntityID = e.BusinessEntityID;
WHERE p.LastName = @lastname;

This is better, but we are reading all the properties of Person and EmailAddress. Unfortunately, we cannot give more details to Entity Framework in the Include method and tell it to do a projection on EmailAddress1.

To have more control, we need to specify our query more precisely by defining the join and the projection explicitly. To do so, we can use the LINQ syntax.

Solution 2: Explicit join with LINQ

public void PrintEmailAddresses(string lastNameToFind)
{
    using (var db = new AdventureWorks2012Entities())
    {
        var emails = from people in db.People
                     join email in db.EmailAddresses                           // explicit join
                     on people.BusinessEntityID equals email.BusinessEntityID
                     where people.LastName.Equals(lastNameToFind)
                     select new { email.EmailAddress1 };                       // projection

        foreach (var e in emails)
        {
            Console.WriteLine(e.EmailAddress1);
        }
    }
}

Thanks to our improved LINQ query, Entity Framework now produces a more efficient SQL query:

SELECT p.BusinessEntityID, e.EmailAddress1 FROM Person p
INENR JOIN EmailAddress e ON p.BusinessEntityID = e.BusinessEntityID
WHERE p.LastName = @lastname;

We now obtain the wanted list of email addresses and got rid of the nested LINQ smell by removing the inner foreach loop.

Note that with this approach, Entity Framework still includes p.BusinessEntityID in the select list, although we do not need it.

An alternative is to use a LINQ extension method chain and performing SelectMany() on EmailAddresses.

Solution 3: LINQ extension method chain with SelectMany()

public void PrintEmailAddresses(string lastNameToFind)
{
    using (var db = new AdventureWorks2012Entities())
    {
        IQueryable< string > emails = db.People
            .Where(p => p.LastName.Equals(lastNameToFind))
            .SelectMany(p => p.EmailAddresses)
            .Select(email => email.EmailAddress1);  // projection
            
        foreach (var e in emails)
        {
            Console.WriteLine(e);
        } 
    }
}

With that code, we actually do not specify the join explicitly: Entity Framework infers from our SelectMany(p => p.EmailAddresses) that it needs to join Person to EmailAddress. The SQL query now looks like this:

SELECT e.EmailAddress1 FROM Person p
INNER JOIN EmailAddress e ON p.BusinessEntityID = e.BusinessEntityID
WHERE p.LastName = @lastname;

We see here that the generated SQL query reads less information than the one from solution 2: p.BusinessEntityID is no longer in the select list. This query is closest to what one would write manually to get the email addresses.

Recap of the email example

The first code example was using a nested foreach loop, in which a navigation property was accessed, and resulted in a separate query being executed for each matching entry from the first table. This is inefficient and does not scale: for large result sets, the performance will really go down the drain.

Solution 1 showed improvement: using the method Include(“PropertyName”), we can inform Entity Framework that we will need information from another table and will generate a single query that will perform a JOIN. This alternative improves scalability. Unfortunately, Entity Framework will read all the columns of both tables and we have no way to tell it to do a projection.

In solution 2, we made up for the lack of projection from the previous solution by using LINQ syntax to explicitly join the tables and project the columns that we need. That way, we avoid read data that we do not need. This solution scales well and the amount of data we retrieve is reduced to almost the minimum. It is however noteworthy to mention that the primary key used to join the table is included in the select list, which is not strictly needed.

Solution 3 uses a LINQ method chain with SelectMany() and is the most efficient solution: improving upon solution 2, a join is performed on the tables, and this time only the necessary columns are included in the select list: the primary key is only used in the JOIN.

Depending on the amount of data and the performance, either of solution 1 to 3 can be used. If you know the size of the result set will always remain small, solution 1 is the easiest to implement. When performance is needed, either solution 2 or 3 can be used, depending on your project’s preference in terms of code style and readability. If performance is critical, solution 3 will be slightly better since it saves a read on the primary key column.

Conclusion

In this post, we saw how to prevent performance problems with Entity Framework by:

  • using projection to avoid reading data we don’t use
  • applying set-based thinking to define how to get the data from the database
  • recognising and fixing the nested LINQ smell
  • using techniques to generate more efficient queries:
    1. Include() method
    2. Explicit join with LINQ
    3. LINQ method chain with SelectMany()