Efficiently bulk upsert unrelated rows

As part of a manual replication process between databases with different but related schemas, for each table, we identify new and modified records in the source database and feed them via a table valued parameter to a stored procedure in the SQL Server destination database. The source database is not SQL Server and is in a different data center.

For each row to be upserted, the stored procedure queries for the primary keys of the related rows already in the destination database.

Currently, we do a single SELECT to get the primary keys, followed by a single MERGE to perform the upsert. There are two aspects of this approach that I believe may not be as efficient as possible.

An implicit transaction unnecessarily wraps the MERGE. The database would remain consistent even with each row being upserted one at a time. If an row's upsert fails, we want the remaining rows to proceed.

MERGE interleaves inserts and sets as it goes through the rows, which is fine, but we don't need this. It would be acceptable to set all the modified rows and subsequently insert all the new rows.

Based on the flexibility we have, the MERGE performance tip to use UPDATE and INSERT seems to apply to our case:

When simply updating one table based on the rows of another table, improved performance and scalability can be achieved with basic INSERT, UPDATE, and DELETE statements.

Do I understand that right? Am I better off with a separate UPDATE and INSERT? And what about the implicit transaction? Is performing a single SELECT, UPDATE, and INSERT over a large batch of rows most efficient, or is it better to take advantage of the ability to do one row at time by using a FOR loop? Or something else?

In general, what is the most efficient way to upsert a large batch or rows to a SQL Server table in which the rows are not transactionally related and sets and inserts need not be interleaved?

edited Nov 1 '18 at 20:13

asked Nov 1 '18 at 19:27

Edward Brey

1115

bumped to the homepage by Community♦ 8 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

1

Aaron Bertrand wrote about some potential gotcha's with MERGE that you may want to be aware of. Another thing I'd change is the table variable to a temp table. There are more blogs about this out there. Regarding If an row's upsert fails, we want the remaining rows to proceed., this sounds like RBAR methodology. I'd expect most people, even using UPSERT, would want the entire transaction rolled back if either the batch update or insert failed

– scsimon
Nov 1 '18 at 19:34

add a comment |

For each row to be upserted, the stored procedure queries for the primary keys of the related rows already in the destination database.

Currently, we do a single SELECT to get the primary keys, followed by a single MERGE to perform the upsert. There are two aspects of this approach that I believe may not be as efficient as possible.

An implicit transaction unnecessarily wraps the MERGE. The database would remain consistent even with each row being upserted one at a time. If an row's upsert fails, we want the remaining rows to proceed.

MERGE interleaves inserts and sets as it goes through the rows, which is fine, but we don't need this. It would be acceptable to set all the modified rows and subsequently insert all the new rows.

Based on the flexibility we have, the MERGE performance tip to use UPDATE and INSERT seems to apply to our case:

When simply updating one table based on the rows of another table, improved performance and scalability can be achieved with basic INSERT, UPDATE, and DELETE statements.

In general, what is the most efficient way to upsert a large batch or rows to a SQL Server table in which the rows are not transactionally related and sets and inserts need not be interleaved?

edited Nov 1 '18 at 20:13

asked Nov 1 '18 at 19:27

Edward Brey

1115

bumped to the homepage by Community♦ 8 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

1

Aaron Bertrand wrote about some potential gotcha's with MERGE that you may want to be aware of. Another thing I'd change is the table variable to a temp table. There are more blogs about this out there. Regarding If an row's upsert fails, we want the remaining rows to proceed., this sounds like RBAR methodology. I'd expect most people, even using UPSERT, would want the entire transaction rolled back if either the batch update or insert failed

– scsimon
Nov 1 '18 at 19:34

add a comment |

For each row to be upserted, the stored procedure queries for the primary keys of the related rows already in the destination database.

Currently, we do a single SELECT to get the primary keys, followed by a single MERGE to perform the upsert. There are two aspects of this approach that I believe may not be as efficient as possible.

An implicit transaction unnecessarily wraps the MERGE. The database would remain consistent even with each row being upserted one at a time. If an row's upsert fails, we want the remaining rows to proceed.

MERGE interleaves inserts and sets as it goes through the rows, which is fine, but we don't need this. It would be acceptable to set all the modified rows and subsequently insert all the new rows.

Based on the flexibility we have, the MERGE performance tip to use UPDATE and INSERT seems to apply to our case:

When simply updating one table based on the rows of another table, improved performance and scalability can be achieved with basic INSERT, UPDATE, and DELETE statements.

In general, what is the most efficient way to upsert a large batch or rows to a SQL Server table in which the rows are not transactionally related and sets and inserts need not be interleaved?

edited Nov 1 '18 at 20:13

asked Nov 1 '18 at 19:27

Edward Brey

1115

For each row to be upserted, the stored procedure queries for the primary keys of the related rows already in the destination database.

Currently, we do a single SELECT to get the primary keys, followed by a single MERGE to perform the upsert. There are two aspects of this approach that I believe may not be as efficient as possible.

An implicit transaction unnecessarily wraps the MERGE. The database would remain consistent even with each row being upserted one at a time. If an row's upsert fails, we want the remaining rows to proceed.

MERGE interleaves inserts and sets as it goes through the rows, which is fine, but we don't need this. It would be acceptable to set all the modified rows and subsequently insert all the new rows.

Based on the flexibility we have, the MERGE performance tip to use UPDATE and INSERT seems to apply to our case:

When simply updating one table based on the rows of another table, improved performance and scalability can be achieved with basic INSERT, UPDATE, and DELETE statements.

In general, what is the most efficient way to upsert a large batch or rows to a SQL Server table in which the rows are not transactionally related and sets and inserts need not be interleaved?

sql-server transactional-replication etl upsert table-valued-parameters

edited Nov 1 '18 at 20:13

asked Nov 1 '18 at 19:27

Edward Brey

1115

edited Nov 1 '18 at 20:13

asked Nov 1 '18 at 19:27

Edward Brey

1115

edited Nov 1 '18 at 20:13

asked Nov 1 '18 at 19:27

Edward Brey

1115

asked Nov 1 '18 at 19:27

Edward Brey

1115

asked Nov 1 '18 at 19:27

Edward Brey

1115

bumped to the homepage by Community♦ 8 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ 8 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

1

Aaron Bertrand wrote about some potential gotcha's with MERGE that you may want to be aware of. Another thing I'd change is the table variable to a temp table. There are more blogs about this out there. Regarding If an row's upsert fails, we want the remaining rows to proceed., this sounds like RBAR methodology. I'd expect most people, even using UPSERT, would want the entire transaction rolled back if either the batch update or insert failed

– scsimon
Nov 1 '18 at 19:34

add a comment |

1

Aaron Bertrand wrote about some potential gotcha's with MERGE that you may want to be aware of. Another thing I'd change is the table variable to a temp table. There are more blogs about this out there. Regarding If an row's upsert fails, we want the remaining rows to proceed., this sounds like RBAR methodology. I'd expect most people, even using UPSERT, would want the entire transaction rolled back if either the batch update or insert failed

– scsimon
Nov 1 '18 at 19:34

Aaron Bertrand wrote about some potential gotcha's with MERGE that you may want to be aware of. Another thing I'd change is the table variable to a temp table. There are more blogs about this out there. Regarding If an row's upsert fails, we want the remaining rows to proceed., this sounds like RBAR methodology. I'd expect most people, even using UPSERT, would want the entire transaction rolled back if either the batch update or insert failed

– scsimon
Nov 1 '18 at 19:34

add a comment |

1 Answer
1

active

oldest

votes

To elaborate on my comment, I would use a set based approach for better performance with error handling, here is how I would do it. I mixed the two common methods, using JOIN and NOT EXISTS / EXISTS.

Note, I don't use a temp table or table variable here. I've seen performance gains when the databases are on different servers but pulling the data into a temp table before doing an upsert across a linked server or whatever, but I'm not sure if your databases are on the same instance or not.

set xact_abort on;

begin try

    begin tran



    update d

    set d.column = s.column

    from destinationTable d

    inner join sourceTable s on

    s.id = d.id



    insert into destinationTable (...column list...)

    select s.1, s.2, ...

    from sourceTable s

    where not exists(select id from destinationTable d where d.id = s.id)



    commit

end try

begin catch

    if @@trancount > 0 rollback tran

    declare @msg nvarchar(2048) = error_message()

    raiserror (@msg, 16, 1)

    return 55555

end catch

Some good reads

Erlan Sommarskog's Error and Transaction Handling

Aaron Bertrand's Use Caution w/ SQL Server MERGE

Martin Smith's break down of Table Variable vs Temp Tables and his suggestions on when you should use one over the other

Using LEFT JOIN or NOT EXISTS

In the case that the data is in on a different server a temp table could increase the performance of the insert (or may be the only option if it's not SQL Server). The above code would be the same, except the sourceTable would be replaced with #sourceTable.

If the table is large, then you may want to consider staging tables instead of #tempTables so that you don't beat up tempdb too bad. BULK INSERT into the staging tables (sourceTable) and then do your upsert from here.

edited Nov 1 '18 at 20:21

answered Nov 1 '18 at 20:07

scsimon

1,398414

I added a clarification to the question that the source database is not SQL Server and is in a different data center.

– Edward Brey
Nov 1 '18 at 20:13

unserstood @EdwardBrey. Not much would change for me, except probably bulk inserting into stagingTables and then doing the upsert. Naturally, these would have what ever PK, FK, and indexes you need to make your upserts perform well

– scsimon
Nov 1 '18 at 20:22

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f221569%2fefficiently-bulk-upsert-unrelated-rows%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

set xact_abort on;

begin try

    begin tran



    update d

    set d.column = s.column

    from destinationTable d

    inner join sourceTable s on

    s.id = d.id



    insert into destinationTable (...column list...)

    select s.1, s.2, ...

    from sourceTable s

    where not exists(select id from destinationTable d where d.id = s.id)



    commit

end try

begin catch

    if @@trancount > 0 rollback tran

    declare @msg nvarchar(2048) = error_message()

    raiserror (@msg, 16, 1)

    return 55555

end catch

Some good reads

Erlan Sommarskog's Error and Transaction Handling

Aaron Bertrand's Use Caution w/ SQL Server MERGE

Martin Smith's break down of Table Variable vs Temp Tables and his suggestions on when you should use one over the other

Using LEFT JOIN or NOT EXISTS

edited Nov 1 '18 at 20:21

answered Nov 1 '18 at 20:07

scsimon

1,398414

I added a clarification to the question that the source database is not SQL Server and is in a different data center.

– Edward Brey
Nov 1 '18 at 20:13

unserstood @EdwardBrey. Not much would change for me, except probably bulk inserting into stagingTables and then doing the upsert. Naturally, these would have what ever PK, FK, and indexes you need to make your upserts perform well

– scsimon
Nov 1 '18 at 20:22

add a comment |

set xact_abort on;

begin try

    begin tran



    update d

    set d.column = s.column

    from destinationTable d

    inner join sourceTable s on

    s.id = d.id



    insert into destinationTable (...column list...)

    select s.1, s.2, ...

    from sourceTable s

    where not exists(select id from destinationTable d where d.id = s.id)



    commit

end try

begin catch

    if @@trancount > 0 rollback tran

    declare @msg nvarchar(2048) = error_message()

    raiserror (@msg, 16, 1)

    return 55555

end catch

Some good reads

Erlan Sommarskog's Error and Transaction Handling

Aaron Bertrand's Use Caution w/ SQL Server MERGE

Martin Smith's break down of Table Variable vs Temp Tables and his suggestions on when you should use one over the other

Using LEFT JOIN or NOT EXISTS

edited Nov 1 '18 at 20:21

answered Nov 1 '18 at 20:07

scsimon

1,398414

I added a clarification to the question that the source database is not SQL Server and is in a different data center.

– Edward Brey
Nov 1 '18 at 20:13

unserstood @EdwardBrey. Not much would change for me, except probably bulk inserting into stagingTables and then doing the upsert. Naturally, these would have what ever PK, FK, and indexes you need to make your upserts perform well

– scsimon
Nov 1 '18 at 20:22

add a comment |

set xact_abort on;

begin try

    begin tran



    update d

    set d.column = s.column

    from destinationTable d

    inner join sourceTable s on

    s.id = d.id



    insert into destinationTable (...column list...)

    select s.1, s.2, ...

    from sourceTable s

    where not exists(select id from destinationTable d where d.id = s.id)



    commit

end try

begin catch

    if @@trancount > 0 rollback tran

    declare @msg nvarchar(2048) = error_message()

    raiserror (@msg, 16, 1)

    return 55555

end catch

Some good reads

Erlan Sommarskog's Error and Transaction Handling

Aaron Bertrand's Use Caution w/ SQL Server MERGE

Martin Smith's break down of Table Variable vs Temp Tables and his suggestions on when you should use one over the other

Using LEFT JOIN or NOT EXISTS

edited Nov 1 '18 at 20:21

answered Nov 1 '18 at 20:07

scsimon

1,398414

set xact_abort on;

begin try

    begin tran



    update d

    set d.column = s.column

    from destinationTable d

    inner join sourceTable s on

    s.id = d.id



    insert into destinationTable (...column list...)

    select s.1, s.2, ...

    from sourceTable s

    where not exists(select id from destinationTable d where d.id = s.id)



    commit

end try

begin catch

    if @@trancount > 0 rollback tran

    declare @msg nvarchar(2048) = error_message()

    raiserror (@msg, 16, 1)

    return 55555

end catch

Some good reads

Erlan Sommarskog's Error and Transaction Handling

Aaron Bertrand's Use Caution w/ SQL Server MERGE

Martin Smith's break down of Table Variable vs Temp Tables and his suggestions on when you should use one over the other

Using LEFT JOIN or NOT EXISTS

edited Nov 1 '18 at 20:21

answered Nov 1 '18 at 20:07

scsimon

1,398414

edited Nov 1 '18 at 20:21

answered Nov 1 '18 at 20:07

scsimon

1,398414

answered Nov 1 '18 at 20:07

scsimon

1,398414

answered Nov 1 '18 at 20:07

scsimon

1,398414

I added a clarification to the question that the source database is not SQL Server and is in a different data center.

– Edward Brey
Nov 1 '18 at 20:13

unserstood @EdwardBrey. Not much would change for me, except probably bulk inserting into stagingTables and then doing the upsert. Naturally, these would have what ever PK, FK, and indexes you need to make your upserts perform well

– scsimon
Nov 1 '18 at 20:22

add a comment |

I added a clarification to the question that the source database is not SQL Server and is in a different data center.

– Edward Brey
Nov 1 '18 at 20:13

unserstood @EdwardBrey. Not much would change for me, except probably bulk inserting into stagingTables and then doing the upsert. Naturally, these would have what ever PK, FK, and indexes you need to make your upserts perform well

– scsimon
Nov 1 '18 at 20:22

I added a clarification to the question that the source database is not SQL Server and is in a different data center.

– Edward Brey
Nov 1 '18 at 20:13

unserstood @EdwardBrey. Not much would change for me, except probably bulk inserting into stagingTables and then doing the upsert. Naturally, these would have what ever PK, FK, and indexes you need to make your upserts perform well

– scsimon
Nov 1 '18 at 20:22

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vryjdfkk