How to (efficiently) delete millions of records in a SQL table

A while ago, I blogged about how to efficiently update millions of records in a SQL table, today I’d like to show you a simple trick to do the same when deleting millions of records in a SQL table.

The idea is the same, you don’t want to do it all at once, instead, you want to delete a fixed number of records at a time. Why do we need to do this? For starters, when you delete data from a table, a lot of things happen behind the scenes, and because of this, you need to be careful when removing data, especially, large amounts of data.

Many activities are performed by SQL when removing data, and because of this, it is important to know that certain items like the ones listed below can affect the locking and performance behavior of SQL delete statements:

  • Number of indexes in the table
  • Poor indexed foreign keys
  • Lock escalation mode
  • Isolation levels
  • Batch sizes of the delete statement
  • Trigger(s)
  • Temporal tables

One thing you need to consider is also cascade deletes, when you delete data from one table, SQL might also attempt to remove data from a few or many more tables with references to the data you are deleting. This is a good things as it prevents orphan records, but it has also the potential to affect the performance of your delete statements.

The list above shows you some of the main areas to consider when removing large amounts of data, but in this post, I will assume you have considered having the correct type of indexes to make sure that the indexes are helpful when deleting data, and not the reason for table blocking or worst, deadlocks.

I’d like to show you how to do batch deletes to only delete an specific number of records at a time. Performing the delete in batches, helps avoiding or reducing locking on the tables where the data is being removed. Below are some examples on how to do this in code, as well as when deleting data directly via SQL code.

Batch deletes with C#

n C#, you can use the Dapper library to delete data in batches using the “DELETE” statement in conjunction with a “WHERE” clause and a loop to iterate through a set of records.

Here is an example of how to delete data in batches of 1000 records using Dapper:

using (IDbConnection connection = new SqlConnection(connectionString))
{
    connection.Open();
    int batchSize = 1000;
    int currentIndex = 0;
    while (true)
    {
        string deleteSql = "DELETE FROM myTable WHERE Id IN (SELECT Id FROM myTable ORDER BY Id OFFSET @index ROWS FETCH NEXT @batch ROWS ONLY)";
        var parameters = new { index = currentIndex, batch = batchSize };
        int rowsAffected = connection.Execute(deleteSql, parameters);
        if (rowsAffected == 0)
        {
            break;
        }
        currentIndex += rowsAffected;
    }
}

Please note that this is a simple example for demonstration purposes only and it is recommended that you add additional error handling and logging in your actual implementation.

Batch deletes with SQL

Here is an example of a DELETE statement that can be used to delete data in batches of 1000 records within Microsoft SQL.

DECLARE @batchSize INT = 1000;
DECLARE @currentIndex INT = 0;

WHILE 1 = 1
BEGIN
    DELETE FROM myTable
    WHERE Id IN (
        SELECT Id
        FROM myTable
        ORDER BY Id
        OFFSET @currentIndex ROWS
        FETCH NEXT @batchSize ROWS ONLY
    );

    SET @currentIndex = @currentIndex + @batchSize;

    IF @@ROWCOUNT = 0
        BREAK;
END;

This query uses a WHILE loop to repeatedly execute the DELETE statement until all rows have been deleted. The number of rows deleted in each iteration is determined by the @batchSize variable. The loop continues to execute as long as the @@ROWCOUNT variable is not 0.

It is worth noting that this kind of query can be resource-intensive if the table being deleted from is very large, so it’s important to test the performance on a test environment before running it on production. However, if you have the need to delete millions of records, this is still much better than attempting to delete all records at once.

It’s also worth noting that this query will work on SQL Server. If you are using other SQL databases such as MySQL, PostgreSQL or Oracle, the syntax may be slightly different.

The optimal batch size for deleting records in SQL will depend on various factors such as the size of the table, the amount of available memory, and the overall performance of the database. In general, larger batch sizes can be more efficient as they reduce the number of times the database needs to perform the delete operation, but they also require more memory to hold the set of records being deleted.

A batch size of 1000 records is a common starting point, but it may not be the best option for your specific use case. You may want to experiment with different batch sizes such as 5000 and observe the performance of your database to determine the optimal batch size for your needs.

It’s important to keep in mind that larger batch sizes can consume more memory, especially if your table is large and contains a lot of data. Also, If you have indexes on the table, then deleting large batch of records at once may lead to a more significant impact on the performance of the database.

In addition, if you are working with a production database, you should also consider the possible impact on other queries or transactions that might be executing concurrently. It’s always a good practice to test your code on a test environment before deploying to a production environment.

Lastly, if you have a table or tables where the data needs to be deleted often, you might want to create a scheduled SQL job that will run weekly or as often as you need to, and delete the data that isn’t needed anymore. This will be helpful to avoid deleting large amounts of data at once, and it will also keep your table(s) from increasing in size without need.

Happy coding!

The Logbook – part 3.

I wrote a brief post about starting a logbook in January 2021 and then an update on April 2021. This is the latest update about my use of the logbook, the tools I use for it, and how I plan to use it this year.

The logbook is a mini daily planner that I use to log daily activities and anything you want or need. There aren’t any rules or templates. Write down what you want each day, and log your day as detailed or briefly as you wish.

My use of the logbook has been very casual. I don’t write on it daily, but I want to change that. However, it’s been helpful, especially when looking back and learning more about my almost daily life. The logbook helps me think and capture data about my everyday life.

This year, I have specific questions I want to answer daily using my logbook. For example:

  • How much money did I spend today?
  • How much water do I consume every day?
  • How many cups of coffee do I consume every day?
  • Ideas and random thoughts

I’ve been doing this for about two years, but casually. I plan on writing casually in my logbook, but questions like the above need to be answered daily, which is my goal.

Why is this important? Writing down this information forces you to think about this and avoid living your life on autopilot. Writing it down on paper is the easiest thing for me. The daily planner I use is small and flexible, and I can carry it with me every day and everywhere. In addition, writing down anything on paper helps you memorize it more than typing it on a keyboard.

Tools I use to capture data about my everyday life:

Good luck and happy logging!

“Just start”

Surely you’ve heard something like “just do it” or “just start,” but in many cases, starting is one of the most challenging tasks in life. In this post, I will share my own experience and struggles when starting and finishing projects.

It used to be easy for me to start and work on multiple projects. I remember staying up late often to work on these projects. Starting these projects was easy then; completing them while challenging was possible in most instances. For example, I developed a website where startups and small businesses could sign up, add their location, and then be shown on a map. Before that, I created a meetup group where I helped people find co-founders for their startups, and both were quite successful.

Those are just two examples, but over the years, I also started a successful blog back in the mid-2000s to talk about anything related to startups and technology in Austin, TX. The blog became a Meetup group that grew to over 3,000 people, and more than a hundred met monthly.

However, over time, I stopped working on these projects for different reasons. And while I thought of new projects and ideas all the time, I ended up with zero projects at some point. And I remember thinking. It’s okay. It is time to focus on my job and dedicate 100% of my time. Well, I got too complacent over the years. Working on a project besides your day job is healthy, and your day job could benefit from this.

Starting new projects is difficult, and there’s always an excuse, such as I need more time, my idea needs to be better, or I need to know where or how to start. The reality is, at least in my experience, that fear plays a significant role here—fear of not feeling capable enough, not competent enough, or not experienced enough. But guess what? Starting doesn’t require any of those things.

Start anything, work on it for some time, and then start something else. Ideally, you’ll finish what you start, but you’ll learn much from that experience even if you don’t. For example, writing my thoughts about an idea is a good start that works for me. The next thing I could do, depending on the project, is to manually create something to help the project, a social media account, a web page, and a list of items to complete. And if you need customers for your new project, start talking to potential customers, one at a time.

If you feel stuck and unable to start or move forward with a project, it may be time to think about why you feel that way. I fear wasting time on something that might not work, but deep down, I know there is value in starting and working on new projects. At the minimum, it is a learning experience, but it takes work to convince yourself of that.

Fear is the reason many dreams never come true. Of course, fear is not necessarily bad, but if you let fear drive your decisions, you’ll end up with many regrets and missed opportunities.

I started a new project a few months ago, the first one outside my regular job in many years. I am taking it slowly and am excited about the journey without overthinking the end result.

Not fearing the outcome of a project or an idea is my goal, and I am not comfortable feeling comfortable anymore. Maybe it’s my age, or my project ideas aren’t as exciting as before. Nevertheless, I will continue to do what I can to keep things interesting with these new projects.

I posted a tweet days ago mentioning that my goal this year is slowing down when traveling, eating, talking, etc.

This does not mean I want to be lazy. On the contrary, I want to focus on each task and give it all my time and dedication as I am doing it. This, of course, will include my new project. Cheers.