How to (efficiently) delete millions of records in a SQL table

A while ago, I blogged about how to efficiently update millions of records in a SQL table, today I’d like to show you a simple trick to do the same when deleting millions of records in a SQL table.

The idea is the same, you don’t want to do it all at once, instead, you want to delete a fixed number of records at a time. Why do we need to do this? For starters, when you delete data from a table, a lot of things happen behind the scenes, and because of this, you need to be careful when removing data, especially, large amounts of data.

Many activities are performed by SQL when removing data, and because of this, it is important to know that certain items like the ones listed below can affect the locking and performance behavior of SQL delete statements:

  • Number of indexes in the table
  • Poor indexed foreign keys
  • Lock escalation mode
  • Isolation levels
  • Batch sizes of the delete statement
  • Trigger(s)
  • Temporal tables

One thing you need to consider is also cascade deletes, when you delete data from one table, SQL might also attempt to remove data from a few or many more tables with references to the data you are deleting. This is a good things as it prevents orphan records, but it has also the potential to affect the performance of your delete statements.

The list above shows you some of the main areas to consider when removing large amounts of data, but in this post, I will assume you have considered having the correct type of indexes to make sure that the indexes are helpful when deleting data, and not the reason for table blocking or worst, deadlocks.

I’d like to show you how to do batch deletes to only delete an specific number of records at a time. Performing the delete in batches, helps avoiding or reducing locking on the tables where the data is being removed. Below are some examples on how to do this in code, as well as when deleting data directly via SQL code.

Batch deletes with C#

n C#, you can use the Dapper library to delete data in batches using the “DELETE” statement in conjunction with a “WHERE” clause and a loop to iterate through a set of records.

Here is an example of how to delete data in batches of 1000 records using Dapper:

using (IDbConnection connection = new SqlConnection(connectionString))
{
    connection.Open();
    int batchSize = 1000;
    int currentIndex = 0;
    while (true)
    {
        string deleteSql = "DELETE FROM myTable WHERE Id IN (SELECT Id FROM myTable ORDER BY Id OFFSET @index ROWS FETCH NEXT @batch ROWS ONLY)";
        var parameters = new { index = currentIndex, batch = batchSize };
        int rowsAffected = connection.Execute(deleteSql, parameters);
        if (rowsAffected == 0)
        {
            break;
        }
        currentIndex += rowsAffected;
    }
}

Please note that this is a simple example for demonstration purposes only and it is recommended that you add additional error handling and logging in your actual implementation.

Batch deletes with SQL

Here is an example of a DELETE statement that can be used to delete data in batches of 1000 records within Microsoft SQL.

DECLARE @batchSize INT = 1000;
DECLARE @currentIndex INT = 0;

WHILE 1 = 1
BEGIN
    DELETE FROM myTable
    WHERE Id IN (
        SELECT Id
        FROM myTable
        ORDER BY Id
        OFFSET @currentIndex ROWS
        FETCH NEXT @batchSize ROWS ONLY
    );

    SET @currentIndex = @currentIndex + @batchSize;

    IF @@ROWCOUNT = 0
        BREAK;
END;

This query uses a WHILE loop to repeatedly execute the DELETE statement until all rows have been deleted. The number of rows deleted in each iteration is determined by the @batchSize variable. The loop continues to execute as long as the @@ROWCOUNT variable is not 0.

It is worth noting that this kind of query can be resource-intensive if the table being deleted from is very large, so it’s important to test the performance on a test environment before running it on production. However, if you have the need to delete millions of records, this is still much better than attempting to delete all records at once.

It’s also worth noting that this query will work on SQL Server. If you are using other SQL databases such as MySQL, PostgreSQL or Oracle, the syntax may be slightly different.

The optimal batch size for deleting records in SQL will depend on various factors such as the size of the table, the amount of available memory, and the overall performance of the database. In general, larger batch sizes can be more efficient as they reduce the number of times the database needs to perform the delete operation, but they also require more memory to hold the set of records being deleted.

A batch size of 1000 records is a common starting point, but it may not be the best option for your specific use case. You may want to experiment with different batch sizes such as 5000 and observe the performance of your database to determine the optimal batch size for your needs.

It’s important to keep in mind that larger batch sizes can consume more memory, especially if your table is large and contains a lot of data. Also, If you have indexes on the table, then deleting large batch of records at once may lead to a more significant impact on the performance of the database.

In addition, if you are working with a production database, you should also consider the possible impact on other queries or transactions that might be executing concurrently. It’s always a good practice to test your code on a test environment before deploying to a production environment.

Lastly, if you have a table or tables where the data needs to be deleted often, you might want to create a scheduled SQL job that will run weekly or as often as you need to, and delete the data that isn’t needed anymore. This will be helpful to avoid deleting large amounts of data at once, and it will also keep your table(s) from increasing in size without need.

Happy coding!

A simple explanation of the four basic concepts of Object-Oriented Programming (OOP)

As a software engineer, the concept of object-oriented programming is something you need to learn, understand, and be able to explain simply to anyone who asks. Therefore, most interviews for programming jobs will include questions about object-oriented programming.

In this post, I’d like to share a list and a simple description of the four concepts of object-oriented programming.

The four principles of object-oriented programming are encapsulation, abstraction, inheritance, and polymorphism.

Encapsulation

It means that a group of related properties, methods, and other members are part of a single unit or object. In other words, it’s like a package of goods where all of the items are used and needed for one purpose. It’s like going to an all-inclusive resort in Mexico that includes everything with one payment and in one place. A single package includes your hotel room, entertainment, food, gym, the beach, etc.

Abstraction

It is the principle used to describe how a program handles complexity by hiding the implementation details and displaying only essential features to a user. So, for example, you can flip a switch on your wall and see that the lights come up without knowing how electricity works. Likewise, you order things online, pay by entering a number, and these things show up at your door, all without you knowing how these things operate behind the scenes – all of these are real-life examples of abstractions.

Inheritance

It enables you to create a new class that reuses, extends, and modifies the behavior defined in another class.

The class whose members are inherited is called the base class, and the class that inherits those members is called the derived class.

In C#, all classes implicitly inherit from the Object class.

A real-life example of inheritance could be vehicles as the base class, and then we can have things like motorized and non-motorized vehicles as derived classes. The idea is that the base class can have generic properties (for example, all vehicles in this example have wheels) that can be applied and used by all derived classes, and then each derived class can have its own properties and behavior depending on its need.

  • Vehicles
    • Motorized
      • Car
      • Bus
    • Non-motorized
      • Bicycle
      • Cart
      • Rickshaw

Polymorphism

It means that you can have multiple classes and use them interchangeably, even though each class implements the same properties or methods in different ways. This gives you the ability to create more modular and extensible applications. With polymorphism, the program will call a specific method determined at runtime based on the type of the object.

A real-life example of this is blood groups. Humans all share many of the same characteristics, but we also have unique characteristics within a population, such as blood type. Human blood groups are examples of genetic polymorphism.

If you are interested in learning more about object-oriented programming, I recommend the books below:

computer

A simple explanation about Software Architecture and Software Design

I have been working as a software developer for many years now. In my experience, many people (including myself) confuse these two processes and talk about them interchangeably as if they are the same thing, but they are not. I hope this post helps clarify the distinction between these two important processes. For years I didn’t understand the importance of good software architecture, but with experience comes the realization of how vital solid architecture and software design is. It can save a lot of money and time and increase a software system’s performance, usability, security, and scalability.

Software Architecture

Software architecture focuses on the high-level infrastructure of a software system to allow for flexibility, security, and scalability of a system. The goal of software architecture is to make fundamental structural choices to meet the business and technical requirements. Therefore, the choices made in software architecture are important and need to be carefully analyzed as these will be too costly to change once a system is implemented.

There are many architectural styles or patterns in existence to accomplish these tasks. These architectural patterns help establish reusable groups of design decisions and constraints that are applied to a system to generate the desired output and quality.

Some of these architectural patterns include the following:

  • Client-server (n-tier architecture for example)
  • Event-driven
  • Layered
  • Microservices
  • Pipes and filters
  • Service-oriented

Once the architecture of a software system is complete, a software development team can use it as a guide or blueprint to make educated software design decisions before starting to write any code.

Software Design

Software design is where the implementation details of all parts that will make up a system are conceptualized. Software design is what happens after software architecture and before the programming starts. The result of software design is often a plan to implement tasks such as coding, testing, integration, and deployment.

Experienced software developers will review design considerations during this phase. These considerations should reflect the goals and expectations of all stakeholders. Some of these considerations include:

  • Extensibility
  • Modularity
  • Reliability
  • Reusability
  • Security
  • Performance
  • Scalability

Also, software design includes design patterns, which are reusable solutions to common problems within the context of software design. These design patterns are essentially best practices that software developers can use to solve common issues when designing a software application.

The following are some design patterns to speed up the development process with these proven development paradigms.

Some of the software design patterns can include the following:

  • Dependency Injection
  • Singleton
  • Factory Method
  • Facade
  • Decorator
  • Composite
  • Chain of responsibility
  • State
  • Event-based Asynchronous

If you want to learn more about the items listed above under architecture and design, I suggest looking online; there is plenty of information about each of these architectural and design patterns out there, and those details are outside of the scope of this post.

This post only overs software architect and software design at a high level, and if you are interested, I suggest you look into these two processes in detail. Below are some of the books I recommend on the subject:

Where are the Authentication views in AspNet Core App?

I am struggling with Visual Studio for Mac. I have been a Visual Studio user since it came in multiple CDs in a box. Visual Studio has been improved a lot, I like it more now than I did before. But there is a problem, the VS version for the Mac is a completely different product. I am both a Mac and PC user, I like both operating systems, Windows and Mac OS. However, I have been trying to use my personal laptop (MacBook Pro M1) for my personal software projects, and a lot of them were initially created with VS for Windows.

Visual Studio for Mac lacks some important things I use in my projects, for example, SQL Server Data Tools (SSDT) projects. I know that this is because we cannot install SQL in a Mac, and also know that there are alternatives to SSDT, like Azure Data Studio… but guess what? It’s not compatible with the ARM architecture which is what Apple’s new chip (M1) uses, and that’s the laptop I have 😕.

New Web project using Visual Studio for Mac

Anyways, that’s a topic for another post, this is about finding the Authentication views in an ASP.NET Core app. Since I couldn’t use this MacBook to continue working on my existing projects, I decided to use it only for new ones. That should be fine. So I have an idea for a book-sharing app, and I wanted to use VS for Mac to create this web app. I started with one of the templates and in less than a minute I had a working web app on my laptop. Next, I decided to scaffold the Authentication stuff to get authentication features and allow users to do stuff like the following:

  • Register
  • Login/Logout
  • Forgot Password
  • Reset Password
  • Confirm Email
  • Etc.

Scaffolding Identity features

I ran the following commands in my Terminal and on my project’s root directory to add the ability to scaffold the identity stuff. You might not need to do this if you have already installed the aspnet codegenerator:

dotnet tool install -g dotnet-aspnet-codegenerator

dotnet add package Microsoft.VisualStudio.Web.CodeGeneration.Design
dotnet add package Microsoft.EntityFrameworkCore.Design
dotnet add package Microsoft.AspNetCore.Identity.EntityFrameworkCore
dotnet add package Microsoft.AspNetCore.Identity.UI
dotnet add package Microsoft.EntityFrameworkCore.SqlServer
dotnet add package Microsoft.EntityFrameworkCore.Tools

dotnet aspnet-codegenerator identity -h

Where are my Identity views?

After this, I ran the following command to add the Authentication features to my project. The following command needs to be run inside your project’s directory:

dotnet aspnet-codegenerator identity -dc ReadMoreBooks.Data.ApplicationDbContext --files "Account.Register;Account.Login;Account.Logout"

After doing this, I didn’t see many changes in my solution, but when I ran my application again, I noticed I had the option to register, login, and logout. I registered a test user, and everything worked fine. Until I wanted to edit the page where a user confirms registration and also the page to reset the password. I was expecting to see ConfirmEmail and ResetPassword views under Areas > Identity > Pages > Account – but they weren’t there, only Register, Login, and Logout.

Where are they? After doing some research and spending a good amount of time searching for them in my solution, I found out that Microsoft hid these views to make projects smaller and more organized. This is the blog post where they announced that and that I clearly missed: https://devblogs.microsoft.com/dotnet/aspnetcore-2-1-identity-ui/

So, it turns out that by using the command line to scaffold these views I was already halfway from where I needed to be. What I mean by that is, by using the command tool (I think this is available through the UI but only in VS for Windows), all I had to do was either not specify which views I wanted to scaffold or specify all the views I wanted to include in my project so I could customize them.

The first command is what I ran initially, which helped create the Register, Login, and Logout views, and while everything else within the Authentication was available, the actual files were hidden.

dotnet aspnet-codegenerator identity -dc ReadMoreBooks.Data.ApplicationDbContext --files "Account.Register;Account.Login;Account.Logout"

Bringing back the Identity views into my project

So what I ran after was the following command, which added the views I wanted to my project so I could access them and customize them as much as I needed.

dotnet aspnet-codegenerator identity -dc ReadMoreBooks.Data.ApplicationDbContext --files "Account.ConfirmEmail;Account.ForgotPassword;Account.ForgotPasswordConfirmation;Account.Lockout;Account.LoginWith2fa;Account.ResetPassword;Account.ResetPasswordConfirmation"

That’s it! As soon as the command above was executed, all the authentication-related views appeared in my solution, just like in the old times. Perfect.

Now, I learned that I also have the option to run the same command without specifying the –files command, which will then add all of the available views related to Authentication to your project.

dotnet aspnet-codegenerator identity -dc ReadMoreBooks.Data.ApplicationDbContext

Conclusion and suggestions

So there you have it folks, a feature by Microsoft was a bug for me, at least for a while, until I understood what was happening, and then I learned how to navigate through the changes and get what I wanted.

If you want to learn more about the scaffolding options and how you can use them in your projects, check out this page from Microsoft: https://docs.microsoft.com/en-us/aspnet/core/security/authentication/scaffold-identity?view=aspnetcore-6.0&tabs=netcore-cli&viewFallbackFrom=aspnetcore-2.1

The moral of the story, read the documentation, keep up to date so you are aware of what’s changing and why, and maybe you’ll save some headaches and be more proficient with your coding. Hope this is useful. Cheers!

Don’t forget about character casing when comparing strings!

There are many issues I’ve experienced during the many, many years I’ve worked as a software developer. But one of the most recurring issues is, without a doubt, the mismatching of words due to character casing.

There are solutions to the character casing mismatch problem. For example, you can make your strings all lower case or upper case before comparing them. There are also many programming languages that have features to help with string comparisons.

The problem when comparing strings and case sensitivity

The issue is no longer the lack of solutions to avoid this problem. The problem is that these solutions require that you, the developer, be proactive by being alert and aware of case sensitivity when making string comparisons. For example, in C#, you have the StringComparer class, which includes properties like StringComparer.OrdinalIgnoreCase to help you ignore the character’s casing when comparing strings.

As a developer, you have to be alert and know when to ignore character casing. While there are many simple ways and tools built into programming languages, sometimes knowing when to do this might not be obvious.

For example, if you call GroupBy in C# and select the value you want your list to be grouped by, it will consider values such as “Abc” and “ABC” as unique, which might not be what you want to do.

In most cases, if you group a list of items by a specific string value, your intention is probably to treat the same values “Abc” and “ABC” as the same. Therefore, you’ll want to ignore the casing as the values are the same in this context.

Issues like the one with GroupBy in C# can go unnoticed until it causes problems. For example, I ran into this issue and didn’t realize the mistake until I tried to add the values of that grouped list to a dictionary and failed. The dictionary attempted to use the values “Abc” and “ABC” as the dictionary key, but it failed since these aren’t unique.

What can you do to avoid casing issues when comparing strings?

So what can you do about this? Code defensively. Every time you compare strings, consider character casing sensitivity and avoid it easily by converting all your strings to upper case or lower case before comparing. Second, be aware of the use cases where you are calling a built-in function such as GroupBy or ToDictionary as functions like this might be case-sensitive within your programming language.

With the programming language C#, you can use overloads that explicitly specify the string comparison rules for string comparisons. It works in this language by calling a method overload that has a parameter of type StringComparison.

In the example below, I’ll be using StringComparison.OrdinalIgnoreCase for comparisons for culture-agnostic string matching. The example shows you how not ignoring case sensitivity might give you unexpected results.

Examples in C#

Let’s declare a list of books with author names written using different casing

var books = new List<Book>()
{
new Book { Name = "Programa en donde sea", Author = "Ricardo" },
new Book { Name = "Empieza a programar", Author = "ricardo" },
new Book { Name = "Xyz", Author = "Joe" },
new Book { Name = "Despues de la programacion", Author = "RICARDO" },
new Book { Name = "Blah", Author = "Foo" }
};

Let’s group the list of books by Author, but since we are not doing anything to ignore case sensitivity, the result is not what’s expected – It returns five records instead of three as it treats all variations of the name Ricardo as unique values.

var notAUniqueListOfBooks = books.GroupBy(b => b.Author);

Now let’s group the same list of books by author, but this time let’s add a parameter to make the string comparison case insensitive. The result is only three records, that’s because it treats all the variations of the Author name Ricardo as the same value.

var aUniqueListOfBooks = books.GroupBy(b => b.Author, StringComparer.OrdinalIgnoreCase);

Let’s now create a dictionary from the list of books. This dictionary will use the Author value as the key, and both the book’s name and author as the value. The result is five items in the dictionary, again, because it treats each instance of the author name Ricardo as a unique value due to the difference in the casing.

var notAUniqueBookDictionary = books.ToDictionary(b => b.Author, b => b);

Finally, we’ll try to create a dictionary following the same attributes above, but this time, we’ll pass the parameter StringComparer.OrdinalIgnoreCase to make sure the comparison is case insensitive.

The result of this last one is an error with the following message:

“An item with the same key has already been added. Key: ricardo”

This is because since we are ignoring the casing in Author, we cannot create a dictionary as the key values are required to be unique and by ignoring the case of the different variations of the value Ricardo, these are no longer unique. They all end up being the same exact value.

var aUniqueBookDictionary = books.ToDictionary(b => b.Author, b => b, StringComparer.OrdinalIgnoreCase);

Finally, using the examples above, if you wanted to group by Author, and then create a list of all of their books including the name and author values then you could try using ToLookup, and pass the StringComparer parameter to make sure the string comparison in a case insensitive.

var aUniqueLookup = books.ToLookup(b => b.Author, b => b, StringComparer.OrdinalIgnoreCase);

The above will give you a dictionary where the key is the Author name and the value is a list of books including name and author. Also, by passing the StringComparer.OrdinalIgnoreCase parameter, we are making sure that the result is a unique list of values.

This is the result of our book list when converted into a Lookup object in C#. There are three keys, all unique, and under each key, we have a list of books that corresponds to the book’s author representing the Key value.

I hope this is useful, the code I used to test the examples above is all available here if you want to play with it and explore changing the values, parameters, etc. Cheers and happy coding!