Taking Entity Framework Core data seeding to the next level with Bogus

Introduction

If you have ever worked with setting up a database for a new project, you likely used some type of data seeding to populate the database with some fake data. This will make testing your application a lot easier and save a lot of time compared to writing and executing some business logic that stores new values in the table.

However, creating realistic seed data can be cumbersome. Let’s say that you want to see how your application would work when there are thousands or millions of rows in your table. You don’t want to type all that fake data by hand, right?

In this blog post you will learn how you can use Entity Framework Core (called ‘EF Core’ from now on) together with Bogus to generate amazing seed data quickly and easily!

You can definitely find other blog posts about this topic as well, but in this blog post I will also talk about determinism which is something I couldn’t find a lot about online. This is, I would say, the most important thing to understand when generating real seed data.

info
This blog post contains multiple long code examples. You can click on them to unfold and read them.

Explaining Bogus

In this article I expect you to already know what Entity Framework Core is and that you have some experience with it. If you don’t know it, feel free to click on the link above to learn more.

So, let’s talk about Bogus instead! I’ll try to cover a lot of basics and interesting features here, but I recommend that you take a look at the documentation as well!

Bogus is a .NET library (ported from faker.js) that allows you to generate fake data. It can generate realistic-looking data for all kinds of fields, such as names, addresses, phone numbers, etc. Its API is inspired by FluentValidation which makes Bogus very easy to use for simple and complex scenarios.

Code example

Using .NET 6 and top level statements, a basic example of Bogus (v34.0.2) would look like this:

Click to view the code example
using Bogus;
using System;
using System.Collections.Generic;

GenerateData();
Console.WriteLine("----------");
GenerateData();

void GenerateData()
{
    var productId = 1;

    // The faker class is the "entrypoint" of the Bogus library!
    var productFaker = new Faker<Product>()
        .RuleFor(x => x.Id, f => productId++) // Each item that is generated will get an incremented Id
        .RuleFor(x => x.Name, f => f.Commerce.ProductName()); // Grab a random ProductName

    List<Product> products = productFaker.Generate(count: 2);

    products.ForEach(x => Console.WriteLine(x));
}

class Product
{
    public int Id { get; set; }
    public string Name { get; set; } = null!;

    public override string ToString() => $"Id: {Id}, Name: {Name}";
}

/* Output:
Id: 1, Name: Sleek Frozen Salad
Id: 2, Name: Tasty Frozen Shirt
----------
Id: 1, Name: Ergonomic Plastic Bacon
Id: 2, Name: Unbranded Granite Pants
*/

This example uses only a few options of Bogus’s vast library of features. You see we use the Commerce class which contains generators for commerce. There are a lot of other options like names, addresses, companies, dates, finance, images and so much more. You can even make your own generator or use free/paid community-made ones!

Determinism

A very important concept that often is forgotten when generating seed data is determinism. Determinism, in this context, means that you would get the same output from a random data generator over multiple executions. That might sound a bit contradictory, but it is exactly what we want in this scenario.

When you use EF Core together with seed data, you usually use migrations. A migration is a file that contains all the changes you made in your database schema. The important thing to realize is that this will also run your data seeding generator again, which must generate the same seed data as before. In our case, we want Bogus to output the exact same products again if we ran our previous code example.

If Bogus generate different seed data each time you added a migration, each migration would try to update the seed data with different data. This would be very messy!

Luckily, Bogus has very good support for determinism. I’ll explain the basics here, but I recommend that you read the documentation about it as well!

Bogus has 2 ways to control the deterministic behaviour:

Regardless of which method you choose, it’s important to realize that you should never use System.Random together with Bogus. Bogus has its own randomizer logic which works together with the strategies mentioned above. It looks like this:

var names = new[] { "Laptop", "Phone", "Microphone", "Monitor" };

// Bad: This would not work together with Bogus's deterministic strategies:
var productFaker = new Faker<Product>()
    .RuleFor(x => x.Name, f => names[new System.Random().Next(0, names.Length)]);

// Good: This uses Bogus's built-in randomizer:
var productFaker = new Faker<Product>()
    .RuleFor(x => x.Name, f => f.PickRandom(names));

Let’s move on to explaining Global Seed and Local Seed strategies.

To keep things brief, the Product class in the next few code examples will only contain a Name property.

Global Seed

The Global Seed strategy is quite simple. You can use the static Randomizer.Seed class and property to set the seed that Bogus should use to generate data. This seed can be any integer that you like. For example:

// Setting the Randomizer.Seed property means you are using the Global Seed strategy
Randomizer.Seed = new Random(1338);

var productFaker = new Faker<Product>()
    .RuleFor(x => x.Name, f => f.Commerce.ProductName());

var product = productFaker.Generate(count: 5);

You can run this code example multiple times and it will generate the same products everytime. This is very easy to use, but the downside is that this will impact ALL data generation of other Faker instances as well, which you might not want.

A bigger downside, which is very important for EF Core data migrations, is that global seeds can’t deal with any schema changes. We want Bogus to generate data for new properties, while generating the same values as in the past for existing properties. However, if you were to add a Description property to the Product class using the Global Seed strategy, the names of the products will also change. This will mess up your migrations because adding properties to classes is something that happens all the time.

The reason for this is explained by the author as follows: This is due to the newly added property which has the effect of shifting the entire global static pseudo-random sequence off by +1.

To fix this problem, we want to use the Local Seed strategy, which we will explain next!

Local Seed

Let’s start out by saying that you can use the Global Seed and Local Seed strategies together. If you specify a Local Seed, it will override the Global Seed for that particular Faker instance.

To use the Local Seed strategy, you want to provide a seed to the Faker<T> instance itself. This means that each instance can have a different seed, and you can even provide a different seed every time you generate some data:

var productFaker = new Faker<Product>()
    .RuleFor(x => x.Name, f => f.Commerce.ProductName());

// Calling UseSeed() on a Faker instance means you are using the Local Seed strategy
var product = productFaker.UseSeed(1338).Generate(count: 5);

One thing that took me a long time to realize is that, while this is a valid example of Local Seed usage, it has the same problems as the Global Seed strategy when it comes to making schema changes. I had so much trouble with it that I assumed it must have been a bug and made a GitHub issue. Luckily, the author told me that I was simply using the library wrong. Oops!

So, what would be the right way to use the Local Seed strategy when it comes to dealing with schema changes? Well, it looks like this:

var productFaker = new Faker<Product>()
    .RuleFor(x => x.Name, f => f.Commerce.ProductName());
    // New property!
    .RuleFor(x => x.Description, f => f.Commerce.ProductDescription()); 

var products = Enumerable.Range(1, 5)
    .Select(i => SeedRow(productFaker, i)) // Each product will have the current index as its seed
    .ToList();

static T SeedRow<T>(Faker<T> faker, int rowId) where T : class
{
    var recordRow = faker.UseSeed(rowId).Generate();
    return recordRow;
}

The big change here is that each product will be generated using its own seed. Product 1 will have a seed of 1, product 2 will a seed of 2, etc. This allows us to add a new Description column while keeping the generated Name for each product the same!

Final notes on determinism

Bogus has some important guidelines for getting determinism right. Make sure you pay attention to these:

  • Add new RuleFor rules last in Faker<T> declarations.
  • Avoid changing existing rules.
  • Always use Faker<T>.UseSeed(int) to avoid using the global static seed as a source for randomness.

Using Bogus and EF core together in a demo application

So, as we determined from the Determinism section, we want to use Bogus together with the Local Seed Strategy so our migrations will stay clean whilst also supporting schema changes.

I’ve created a demo application that I will showcase in this section of the post. This demo application deals with edge cases and complicated relationships that you will also find in real applications.

A class diagram showcasing a Product class that has a list of ProductCategory instances using a join table called ProductProductCategory

Our entity classes in this example

The application has a Product with an Id, Name, CreationDate and Description. A Product can also have a list of Categories. A Category can also belong to multiple Products, which means we have a many-to-many relationship and need a join table called ProductProductCategory.

Let’s now take a look at the actual code so you can start using it in your projects!

info
The source code of this demo application can be found on GitHub.

Entities

The entities we will be using are defined in the Product.cs file:

Click to view the code example
public class Product
{
    public int Id { get; set; }
    public string Name { get; set; } = null!;
    public DateTimeOffset CreationDate { get; set; }
    public ICollection<ProductProductCategory> ProductProductCategories { get; set; } = new List<ProductProductCategory>();
    public string Description { get; set; } = null!;
}

public class ProductProductCategory
{
    public int ProductId { get; set; }
    public int CategoryId { get; set; }

    public Product Product { get; set; } = null!;
    public ProductCategory Category { get; set; } = null!;
}

public class ProductCategory
{
    public int Id { get; set; }
    public string Name { get; set; } = null!;
}

EF Core configuration

After setting up our entities, it’s time to set up EF Core by defining tables and columns. I prefer to use the Fluent API for this, so that is what you will see in this code example.

You will also see a reference to the DatabaseSeeder class. This class will contain all the code required to generate our realistic seed data. How this works will be covered in the next section , but it’s important to see that we end up using the results from our DatabaseSeeder in EF Core’s HasData() methods which will save the seeding data for our migrations.

Click to view the code example
public class BogusContext : DbContext
{
    public DbSet<Product> Products => Set<Product>();
    public DbSet<ProductCategory> ProductCategories => Set<ProductCategory>();
    public DbSet<ProductProductCategory> ProductProductCategories => Set<ProductProductCategory>();

    public BogusContext(DbContextOptions<BogusContext> options) : base(options) { }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        // Configure the tables
        modelBuilder.ApplyConfiguration(new ProductConfiguration());
        modelBuilder.ApplyConfiguration(new ProductProductCategoryConfiguration());
        modelBuilder.ApplyConfiguration(new ProductCategoryConfiguration());

        // Generate seed data with Bogus
        var databaseSeeder = new DatabaseSeeder();

        // Apply the seed data on the tables
        modelBuilder.Entity<Product>().HasData(databaseSeeder.Products);
        modelBuilder.Entity<ProductCategory>().HasData(databaseSeeder.ProductCategories);
        modelBuilder.Entity<ProductProductCategory>().HasData(databaseSeeder.ProductProductCategories);

        base.OnModelCreating(modelBuilder);
    }
}

internal class ProductConfiguration : IEntityTypeConfiguration<Product>
{
    public void Configure(EntityTypeBuilder<Product> builder)
    {
        builder.ToTable("Product");
        builder.HasKey(x => x.Id);
        builder.Property(x => x.Name).IsRequired();
        builder.Property(x => x.CreationDate).IsRequired();
        builder.Property(x => x.Description).IsRequired();
    }
}

internal class ProductCategoryConfiguration : IEntityTypeConfiguration<ProductCategory>
{
    public void Configure(EntityTypeBuilder<ProductCategory> builder)
    {
        builder.ToTable("ProductCategory");
        builder.HasKey(x => x.Id);
        builder.Property(x => x.Name).IsRequired();
    }
}

internal class ProductProductCategoryConfiguration : IEntityTypeConfiguration<ProductProductCategory>
{
    public void Configure(EntityTypeBuilder<ProductProductCategory> builder)
    {
        builder.ToTable("ProductProductCategory");

        builder.HasKey(x => new { x.ProductId, x.CategoryId });

        builder.HasOne(x => x.Product)
            .WithMany(x => x.ProductProductCategories)
            .HasForeignKey(x => x.ProductId);

        builder.HasOne(b => b.Category)
            .WithMany()
            .HasForeignKey(x => x.CategoryId);
    }
}
info
Take a look at the Alternatives section in case you dislike using migrations to set up seed data.

Setting up seed data

Now it’s time for the most exciting part! How are EF Core and Bogus configured together? For this, we will take a look at the DatabaseSeeder class:

Click to view the code example
public class DatabaseSeeder
{      
    public IReadOnlyCollection<Product> Products { get; } = new List<Product>();
    public IReadOnlyCollection<ProductCategory> ProductCategories { get; } = new List<ProductCategory>();
    public IReadOnlyCollection<ProductProductCategory> ProductProductCategories { get; } = new List<ProductProductCategory>();

    public DatabaseSeeder()
    {
        Products = GenerateProducts(amount: 1000);
        ProductCategories = GenerateProductCategories(amount: 50);
        ProductProductCategories = GenerateProductProductCategories(amount: 1000, Products, ProductCategories);
    }

    private static IReadOnlyCollection<Product> GenerateProducts(int amount)
    {
        var productId = 1;
        var productFaker = new Faker<Product>()
            .RuleFor(x => x.Id, f => productId++) // Each product will have an incrementing id.
            .RuleFor(x => x.Name, f => f.Commerce.ProductName())
            // The refDate is very important! Without it, it will generate a random date based
            // on the CURRENT date on your system.
            // Generating a date based on the system date is not deterministic!
            // So the solution is to pass in a constant date instead
            // which will be used to generate a random date
            .RuleFor(x => x.CreationDate, f => f.Date.FutureOffset(
                refDate: new DateTimeOffset(2023, 1, 16, 15, 15, 0, TimeSpan.FromHours(1))))
            .RuleFor(x => x.Description, f => f.Commerce.ProductDescription());

        var products = Enumerable.Range(1, amount)
            .Select(i => SeedRow(productFaker, i))
            .ToList();

        return products;
    }

    private static IReadOnlyCollection<ProductCategory> GenerateProductCategories(int amount)
    {
        var categoryId = 1;
        var categoryFaker = new Faker<ProductCategory>()
            .RuleFor(x => x.Id, f => categoryId++) // Each category will have an incrementing id.
            .RuleFor(x => x.Name, f => f.Commerce.Categories(1).First());

        var categories = Enumerable.Range(1, amount)
            .Select(i => SeedRow(categoryFaker, i))
            .ToList();

        return categories;
    }

    private static IReadOnlyCollection<ProductProductCategory> GenerateProductProductCategories(
        int amount,
        IEnumerable<Product> products,
        IEnumerable<ProductCategory> productCategories)
    {
        // Now we set up the faker for our join table.
        // We do this by grabbing a random product and category that were generated.
        var productProductCategoryFaker = new Faker<ProductProductCategory>()
            .RuleFor(x => x.ProductId, f => f.PickRandom(products).Id)
            .RuleFor(x => x.CategoryId, f => f.PickRandom(productCategories).Id);
 
        var productProductCategories = Enumerable.Range(1, amount)
            .Select(i => SeedRow(productProductCategoryFaker, i))
            // We do this GroupBy() + Select() to remove the duplicates
            // from the generated join table entities
            .GroupBy(x => new { x.ProductId, x.CategoryId })
            .Select(x => x.First())
            .ToList();

        return productProductCategories;
    }

    private static T SeedRow<T>(Faker<T> faker, int rowId) where T : class
    {
        var recordRow = faker.UseSeed(rowId).Generate();
        return recordRow;
    }
}

Of course, my simple DatabaseSeeder class is just an example of how you can set up Bogus to work with EF Core. I like this basic version because it shows that it isn’t coupled to EF Core. You can improve this class to any way you see fit. Some inspiration:

  • Change the class to support dependency injection if you ever need this in business logic
  • Make it fit for unit test purposes by generating a single Product instead.

These will be left as exercises to the reader 😉.

Final result

Now that we have set up our seed data, we can run dotnet ef migrations add AddedSeedData which will generate a lot of SQL INSERT statements to process your data. Go ahead and run dotnet ef database update to put the fake data in your database!

The final result will look something like this:

Results of a SQL query showing the generated products

The final products

Results of a SQL query showing the generated categories

The final categories

If you want to remove the seed data, simply remove the HasData() calls and create a new migration!

Alternatives

Seed data without migrations

Some might say that using migrations for seed data is a bad idea, because it will also be deployed to your production database. Some also prefer to keep their migrations “pure” and only let them contain database schema changes. These are valid points!

You might wonder how seed data is used in those cases. A common approach is to have the seed data in a .sql file in the repository in which the fake data is stored and inserted into the database. A team member can then run these commands on their local database.

But how can you get that fake data a SQL file? Well, EF Core has a feature for this called dotnet ef migrations script. You can use this in your terminal to create a sql script of all of your migrations so this can be deployed on a database. You can use this by following these steps:

  • Set up seed data like you saw in this blog post

  • Generate a new migration with dotnet ef migrations add SeedData

    • The name doesn’t matter because we will be removing the migration again
  • Use dotnet ef migrations script -o seed_data.sql

  • Now you can open that file in your favorite text editor and simply remove everything except for the SQL code that is part of your seed data migration.

    • A short example of what the final result could look like:

      BEGIN TRANSACTION;
      GO
      
      IF EXISTS (SELECT * FROM [sys].[identity_columns] WHERE [name] IN (N'Id', N'CreationDate', N'Name') AND [object_id] = OBJECT_ID(N'[Product]'))
          SET IDENTITY_INSERT [Product] ON;
      INSERT INTO [Product] ([Id], [CreationDate], [Name])
      VALUES (1, '2023-10-25T06:30:07.5920852+01:00', N'Gorgeous Wooden Shoes'),
      (2, '2024-01-11T04:15:43.8510278+01:00', N'Licensed Cotton Keyboard'), 
      -- .... The rest of the Products would be inserted here (in batches)
      
      -- .... Other entities like ProductCategory would be inserted here
      COMMIT;
      GO
      
  • Now run dotnet ef migrations remove to remove the migration you just created

Now, I’m not saying that this idea is perfect. I get the irony of still using migrations to generate seed data even though the goal was to avoid migrations. If you have a good suggestion for this problem, let me know in the comments below!

Finishing up

I hope you learned something new today and that you will start using Bogus for all of your data seeding needs. If you want to learn more, take a look at some blog posts mentioned in the Bogus GitHub repository.

And don’t forget: if you enjoy using Bogus, consider supporting the project by donating and/or contributing!