How to (efficiently) delete millions of records in a SQL table

A while ago, I blogged about how to efficiently update millions of records in a SQL table, today I’d like to show you a simple trick to do the same when deleting millions of records in a SQL table.

The idea is the same, you don’t want to do it all at once, instead, you want to delete a fixed number of records at a time. Why do we need to do this? For starters, when you delete data from a table, a lot of things happen behind the scenes, and because of this, you need to be careful when removing data, especially, large amounts of data.

Many activities are performed by SQL when removing data, and because of this, it is important to know that certain items like the ones listed below can affect the locking and performance behavior of SQL delete statements:

  • Number of indexes in the table
  • Poor indexed foreign keys
  • Lock escalation mode
  • Isolation levels
  • Batch sizes of the delete statement
  • Trigger(s)
  • Temporal tables

One thing you need to consider is also cascade deletes, when you delete data from one table, SQL might also attempt to remove data from a few or many more tables with references to the data you are deleting. This is a good things as it prevents orphan records, but it has also the potential to affect the performance of your delete statements.

The list above shows you some of the main areas to consider when removing large amounts of data, but in this post, I will assume you have considered having the correct type of indexes to make sure that the indexes are helpful when deleting data, and not the reason for table blocking or worst, deadlocks.

I’d like to show you how to do batch deletes to only delete an specific number of records at a time. Performing the delete in batches, helps avoiding or reducing locking on the tables where the data is being removed. Below are some examples on how to do this in code, as well as when deleting data directly via SQL code.

Batch deletes with C#

n C#, you can use the Dapper library to delete data in batches using the “DELETE” statement in conjunction with a “WHERE” clause and a loop to iterate through a set of records.

Here is an example of how to delete data in batches of 1000 records using Dapper:

using (IDbConnection connection = new SqlConnection(connectionString))
{
    connection.Open();
    int batchSize = 1000;
    int currentIndex = 0;
    while (true)
    {
        string deleteSql = "DELETE FROM myTable WHERE Id IN (SELECT Id FROM myTable ORDER BY Id OFFSET @index ROWS FETCH NEXT @batch ROWS ONLY)";
        var parameters = new { index = currentIndex, batch = batchSize };
        int rowsAffected = connection.Execute(deleteSql, parameters);
        if (rowsAffected == 0)
        {
            break;
        }
        currentIndex += rowsAffected;
    }
}

Please note that this is a simple example for demonstration purposes only and it is recommended that you add additional error handling and logging in your actual implementation.

Batch deletes with SQL

Here is an example of a DELETE statement that can be used to delete data in batches of 1000 records within Microsoft SQL.

DECLARE @batchSize INT = 1000;
DECLARE @currentIndex INT = 0;

WHILE 1 = 1
BEGIN
    DELETE FROM myTable
    WHERE Id IN (
        SELECT Id
        FROM myTable
        ORDER BY Id
        OFFSET @currentIndex ROWS
        FETCH NEXT @batchSize ROWS ONLY
    );

    SET @currentIndex = @currentIndex + @batchSize;

    IF @@ROWCOUNT = 0
        BREAK;
END;

This query uses a WHILE loop to repeatedly execute the DELETE statement until all rows have been deleted. The number of rows deleted in each iteration is determined by the @batchSize variable. The loop continues to execute as long as the @@ROWCOUNT variable is not 0.

It is worth noting that this kind of query can be resource-intensive if the table being deleted from is very large, so it’s important to test the performance on a test environment before running it on production. However, if you have the need to delete millions of records, this is still much better than attempting to delete all records at once.

It’s also worth noting that this query will work on SQL Server. If you are using other SQL databases such as MySQL, PostgreSQL or Oracle, the syntax may be slightly different.

The optimal batch size for deleting records in SQL will depend on various factors such as the size of the table, the amount of available memory, and the overall performance of the database. In general, larger batch sizes can be more efficient as they reduce the number of times the database needs to perform the delete operation, but they also require more memory to hold the set of records being deleted.

A batch size of 1000 records is a common starting point, but it may not be the best option for your specific use case. You may want to experiment with different batch sizes such as 5000 and observe the performance of your database to determine the optimal batch size for your needs.

It’s important to keep in mind that larger batch sizes can consume more memory, especially if your table is large and contains a lot of data. Also, If you have indexes on the table, then deleting large batch of records at once may lead to a more significant impact on the performance of the database.

In addition, if you are working with a production database, you should also consider the possible impact on other queries or transactions that might be executing concurrently. It’s always a good practice to test your code on a test environment before deploying to a production environment.

Lastly, if you have a table or tables where the data needs to be deleted often, you might want to create a scheduled SQL job that will run weekly or as often as you need to, and delete the data that isn’t needed anymore. This will be helpful to avoid deleting large amounts of data at once, and it will also keep your table(s) from increasing in size without need.

Happy coding!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.