Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CHECKPOINT; to batch deletes for SQL Server #702

Open
RudeySH opened this issue Oct 12, 2021 · 4 comments
Open

Add CHECKPOINT; to batch deletes for SQL Server #702

RudeySH opened this issue Oct 12, 2021 · 4 comments
Assignees

Comments

@RudeySH
Copy link

RudeySH commented Oct 12, 2021

I ran into an issue where a batch delete became extremely slow because I'm deleting millions of records. My transaction log grew very big, causing the slowdown. After reproducing the issue in SSMS, I realized that adding a CHECKPOINT statement in the WHILE loop improved performance tremendously.

I propose that a boolean option (e.g. EnableCheckpoint) should be added to batch deletes, making my suggestion opt-in. In some cases using CHECKPOINT is not desirable, because when using it and the batch delete fails, it will be partially committed already. I don't think the default behavior should be changed.

Are batch updates implemented similarly to batch deletes? (Does this library generate a WHILE loop in the SQL for batch updates?) If so, I propose a similar option should be added to batch updates as well.

You could take it one step further and add an additional option to force the batch delete to run under Simple Recovery Mode. This apparently can make it faster compared to Full Recovery Mode. See this for more information about that: https://stackoverflow.com/a/21100285/1185136. I'm not sure this approach is safe though, I wonder what happens when multiple batch deletes are running simultaneously and both of them are changing the recovery mode back and forth...

@JonathanMagnan JonathanMagnan self-assigned this Oct 12, 2021
@JonathanMagnan
Copy link
Member

Hello @RudeySH ,

I will ask my developer to look at it.

I'm not sure yet if that's something we really want to add or not in our library.

As for your other questions, if you enable a BatchSize, the Batch Update is also doing a WHILE loop.

Best Regards,

Jon


Sponsorship
Help us improve this library

Performance Libraries
context.BulkInsert(list, options => options.BatchSize = 1000);
Entity Framework ExtensionsBulk OperationsDapper Plus

Runtime Evaluation
Eval.Execute("x + y", new {x = 1, y = 2}); // return 3
C# Eval FunctionSQL Eval Function

@RudeySH
Copy link
Author

RudeySH commented Oct 12, 2021

Using CHECKPOINT is helpful in scenarios where batch deletes (or batch updates) are deterministic and can easily be retried. Using checkpoints means giving up the atomicity ("either all occurs, or nothing occurs") of the batch operation, but you gain better performance in return.

I'd love to be able to use checkpoints when I can easily retry the operation without fear of losing data or consistency. For example, I have a background job that runs daily and it deletes old records from a table. In the event the batch delete fails, I don't need it to rollback everything, I don't need it to leverage the transaction log to keep track of what I've been removing. I just want it to delete records as performant as possible. When the batch delete fails, that's OK, partially deleting old records is better then deleting no records in this scenario. When the batch delete doesn't fail, it was faster because of checkpoints keeping the transaction log small. Win-win!

I understand this library is not catered to SQL Server specifically. Perhaps the proposed EnableCheckpoint is a poor name. In a broader sense, all I'm asking for is control over the atomicity of the batch operation, which in the case of SQL Server can be implemented using checkpoints.

@JonathanMagnan
Copy link
Member

Hello @RudeySH ,

We checked this request and I don't think it should really belong to that feature. Otherwise, we will need to add it to every other method as well and that's currently impossible at this moment.

I believe the simplest solution should be on your side to create a method that opens a connection and then create a command to enable your CHECKPOINT, so you get full control over what you want to do. If the connection is already open in the context, our library will use it so you will be able to do manual CHECKPOINT.

@jzabroski
Copy link

Checkpoints are a global side effect to the transaction log, so it's unclear it's a good idea. What happens if many connections issue checkpoints at the same time due to high load scenarios where lots of processes delete something? What happens if someone uses your repository method thinking it only does bulk delete, and they parallelize the bulk delete at a higher level of abstraction? etc

Also, what happens if the checkpoint times out?

If such an approach were to be considered, it should be a DbContextExtension method called simply CreateManualTransactionLogCheckpoint(TimeSpan checkpointCreationTimeout), as it is issued at a connection level but affects the whole database (dbcontext). You don't need EFE do to that for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants