How to (efficiently) delete millions of records in a SQL table

A while ago, I blogged about how to efficiently update millions of records in a SQL table, today I’d like to show you a simple trick to do the same when deleting millions of records in a SQL table.

The idea is the same, you don’t want to do it all at once, instead, you want to delete a fixed number of records at a time. Why do we need to do this? For starters, when you delete data from a table, a lot of things happen behind the scenes, and because of this, you need to be careful when removing data, especially, large amounts of data.

Many activities are performed by SQL when removing data, and because of this, it is important to know that certain items like the ones listed below can affect the locking and performance behavior of SQL delete statements:

  • Number of indexes in the table
  • Poor indexed foreign keys
  • Lock escalation mode
  • Isolation levels
  • Batch sizes of the delete statement
  • Trigger(s)
  • Temporal tables

One thing you need to consider is also cascade deletes, when you delete data from one table, SQL might also attempt to remove data from a few or many more tables with references to the data you are deleting. This is a good things as it prevents orphan records, but it has also the potential to affect the performance of your delete statements.

The list above shows you some of the main areas to consider when removing large amounts of data, but in this post, I will assume you have considered having the correct type of indexes to make sure that the indexes are helpful when deleting data, and not the reason for table blocking or worst, deadlocks.

I’d like to show you how to do batch deletes to only delete an specific number of records at a time. Performing the delete in batches, helps avoiding or reducing locking on the tables where the data is being removed. Below are some examples on how to do this in code, as well as when deleting data directly via SQL code.

Batch deletes with C#

n C#, you can use the Dapper library to delete data in batches using the “DELETE” statement in conjunction with a “WHERE” clause and a loop to iterate through a set of records.

Here is an example of how to delete data in batches of 1000 records using Dapper:

using (IDbConnection connection = new SqlConnection(connectionString))
{
    connection.Open();
    int batchSize = 1000;
    int currentIndex = 0;
    while (true)
    {
        string deleteSql = "DELETE FROM myTable WHERE Id IN (SELECT Id FROM myTable ORDER BY Id OFFSET @index ROWS FETCH NEXT @batch ROWS ONLY)";
        var parameters = new { index = currentIndex, batch = batchSize };
        int rowsAffected = connection.Execute(deleteSql, parameters);
        if (rowsAffected == 0)
        {
            break;
        }
        currentIndex += rowsAffected;
    }
}

Please note that this is a simple example for demonstration purposes only and it is recommended that you add additional error handling and logging in your actual implementation.

Batch deletes with SQL

Here is an example of a DELETE statement that can be used to delete data in batches of 1000 records within Microsoft SQL.

DECLARE @batchSize INT = 1000;
DECLARE @currentIndex INT = 0;

WHILE 1 = 1
BEGIN
    DELETE FROM myTable
    WHERE Id IN (
        SELECT Id
        FROM myTable
        ORDER BY Id
        OFFSET @currentIndex ROWS
        FETCH NEXT @batchSize ROWS ONLY
    );

    SET @currentIndex = @currentIndex + @batchSize;

    IF @@ROWCOUNT = 0
        BREAK;
END;

This query uses a WHILE loop to repeatedly execute the DELETE statement until all rows have been deleted. The number of rows deleted in each iteration is determined by the @batchSize variable. The loop continues to execute as long as the @@ROWCOUNT variable is not 0.

It is worth noting that this kind of query can be resource-intensive if the table being deleted from is very large, so it’s important to test the performance on a test environment before running it on production. However, if you have the need to delete millions of records, this is still much better than attempting to delete all records at once.

It’s also worth noting that this query will work on SQL Server. If you are using other SQL databases such as MySQL, PostgreSQL or Oracle, the syntax may be slightly different.

The optimal batch size for deleting records in SQL will depend on various factors such as the size of the table, the amount of available memory, and the overall performance of the database. In general, larger batch sizes can be more efficient as they reduce the number of times the database needs to perform the delete operation, but they also require more memory to hold the set of records being deleted.

A batch size of 1000 records is a common starting point, but it may not be the best option for your specific use case. You may want to experiment with different batch sizes such as 5000 and observe the performance of your database to determine the optimal batch size for your needs.

It’s important to keep in mind that larger batch sizes can consume more memory, especially if your table is large and contains a lot of data. Also, If you have indexes on the table, then deleting large batch of records at once may lead to a more significant impact on the performance of the database.

In addition, if you are working with a production database, you should also consider the possible impact on other queries or transactions that might be executing concurrently. It’s always a good practice to test your code on a test environment before deploying to a production environment.

Lastly, if you have a table or tables where the data needs to be deleted often, you might want to create a scheduled SQL job that will run weekly or as often as you need to, and delete the data that isn’t needed anymore. This will be helpful to avoid deleting large amounts of data at once, and it will also keep your table(s) from increasing in size without need.

Happy coding!

How to: Configure SQL Express to accept remote connections

This is a copy of the post that used to exist here for which I got some complaints since some people where still trying to read it when looking at an answer I wrote on StackOverflow a few years ago and the page was not there anymore. The above is an exact replica of the original post, hope it helps:

I just installed SQL express 2008 recently and wanted to use it for a test application that I have in a hosted server. I wanted for this application to connect to my local SQL express 2008 database but soon I found out I needed to do some adjustments for this to work. So this is what I did to make my local SQL express 2008 db accept remote connections.

  1. Go to Start – All Programs – Microsoft SQL Server 2008 – Configuration Tools – SQL Server Configuration Manager
  2. Select and expand the SQL Server Network Configuration node and then select your SQL express 2008 database. In the window to the right, right-click on TCP/IP and click on “Enable”.
  3. Once you have enabled the TCP/IP protocol, right-click on it and select Properties, go to the tab labeled “IP Addresses” and make sure you clear any values under TCP Dynamic Ports (even if it is 0, remove it), and then add a new port number on each one of the TCP Portproperties. In my case I used port 14330.
    Click Apply and OK.
  4. You now need to restart SQL express 08, to do this, select the SQL Services node in the same SQL Server Configuration Manager and the right-click on the name of your SQL express 08 instance and select restart. If you receive any errors trying to restart your server, go back to step 3 and make sure you did everything I mentioned, if the error keeps coming up, then use a different port number.
  5. Finally, you need to make sure a remote connection can be made to your SQL server, so we need to open the port you assigned on step 3 (in my case 14330) in your router and make sure Windows firewall and/or any other firewall accept incoming connections to this port.

That’s it! your SQL express 2008 server should be able to accept remote connections now. As always, make sure you take the appropriate steps to make sure your systems are secure.

Good Luck!