SQL Server bulk transfer help needed

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}

We've got this very badly designed logging table that we want to add functionality to. The problem is that it's already a scalability nightmare and we want to fix the design before adding to it, but we only have the nightly upgrade window to do it in.

I've seen a lot of articles about the various bulk copy options with SQL Server, claiming "We could move 80M rows in 10 minutes!" but so far my testing doesn't get anywhere near that, and I'd like suggestions on how to improve on what I'm seeing.

Before the upgrade, there's always a full backup. I'm only interested in the end result and don't want a huge transaction log. I also don't want it to take too long and I don't want to blow out the disk space with transaction logs or temp files.

The table's been out there a while, so in our bigger customer dbs, it's already over 50 million rows. Each row is about 350-400 bytes. The columns are something like this

IdentityColID int, [type] int, [subtype] int, 

created datetime, author nvarchar(100), Message nvarchar(max)

The problems with the design are

The primary clustered key is (type, subtype, created, identitycolid), so it's an insert nightmare. Blocksplits all over the place. And even doing a SELECT COUNT(*) takes like 8 minutes.

There aren't good indexes to support the types of queries desired

I wanted to make a new table where the primary clustered index is the IdentityColId and add the indexes to support the type of necessary queries, and then copy the existing data over and drop the old table.

So far, I tried using BCP to get the data out, and importing with

BULK INSERT

INSERT INTO ... FROM OPENROWSET

The bcp export took about 25 minutes and the imports all took about 1.3 hour - about 1.5 hours. With Recovery Model Simple, the transaction log didn't grow but the cpu consumption was in the 60-65% range most of the time.

I tried just using T-SQL INSERT INTO NewTable SELECT * FROM OldTable, but even with Recovery Model Simple, the transaction log gets to 100 gig.

I tried using SSIS data import packages with the from/to model, and the net time was about an hour 20 minutes. With Recovery Model Simple, the transaction log stayed small.

Then I tried an SSIS Execute SQLTask package to effectively do the INSERT INTO NewTable... line within SSIS. That got the execution time down to about 1:15, but no matter what the recovery model, the transaction log ended up around 100 gig, though CPU consumption stays modest.

I'd like the end result to be one new table, so the suggestion from some of the articles I've read to parallelize into multiple result tables doesn't seem a profitable path. But so far, I just can't seem to approach those stats from the articles I've read.

Anyone have any suggestions on how I can goose this a bit?

edited Dec 26 '18 at 9:37

marc_s

7,13853849

asked Dec 21 '18 at 21:29

user1664043

1061

bumped to the homepage by Community♦ 11 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

If you are doing the insert in one batch, then it will be logged, and your log file will grow until the transaction is finished, even when the recovery model is simple. The lob data (messages column) could be slowing your inserts down, if you could do anything about that, that would help.

– Randi Vertongen
Dec 21 '18 at 21:40

5

Have you tried to use INSERT .. SELECT in batches (of say 1-10K rows eacj)? It would probbaly hep to add the indexes after the table has been populated.

– ypercubeᵀᴹ
Dec 21 '18 at 21:42

2

What version of SQL Server? The table is 20 GB, correct? If rows are only 400 bytes then why is there an nvarchar(max) column? Do you want to do the fastest load possible while minimizing transaction log growth?

– Joe Obbish
Dec 22 '18 at 0:15

1

I dont understand why you are copying the data. You can drop and add indexes on the existing table.

– Sir Swears-a-lot
Dec 24 '18 at 9:44

The nvarchar(max) is to allow people to type in whatever they want for audit notes, but on average people don't tend to type all that much. Yeah, I'm looking for fastest possible load while minimizing transaction log growth because of the maintenance window I have to work in.

– user1664043
Dec 26 '18 at 14:41

|
show 1 more comment

The table's been out there a while, so in our bigger customer dbs, it's already over 50 million rows. Each row is about 350-400 bytes. The columns are something like this

IdentityColID int, [type] int, [subtype] int, 

created datetime, author nvarchar(100), Message nvarchar(max)

The problems with the design are

The primary clustered key is (type, subtype, created, identitycolid), so it's an insert nightmare. Blocksplits all over the place. And even doing a SELECT COUNT(*) takes like 8 minutes.

There aren't good indexes to support the types of queries desired

So far, I tried using BCP to get the data out, and importing with

BULK INSERT

INSERT INTO ... FROM OPENROWSET

I tried just using T-SQL INSERT INTO NewTable SELECT * FROM OldTable, but even with Recovery Model Simple, the transaction log gets to 100 gig.

I tried using SSIS data import packages with the from/to model, and the net time was about an hour 20 minutes. With Recovery Model Simple, the transaction log stayed small.

Anyone have any suggestions on how I can goose this a bit?

edited Dec 26 '18 at 9:37

marc_s

7,13853849

asked Dec 21 '18 at 21:29

user1664043

1061

bumped to the homepage by Community♦ 11 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

If you are doing the insert in one batch, then it will be logged, and your log file will grow until the transaction is finished, even when the recovery model is simple. The lob data (messages column) could be slowing your inserts down, if you could do anything about that, that would help.

– Randi Vertongen
Dec 21 '18 at 21:40

5

Have you tried to use INSERT .. SELECT in batches (of say 1-10K rows eacj)? It would probbaly hep to add the indexes after the table has been populated.

– ypercubeᵀᴹ
Dec 21 '18 at 21:42

2

What version of SQL Server? The table is 20 GB, correct? If rows are only 400 bytes then why is there an nvarchar(max) column? Do you want to do the fastest load possible while minimizing transaction log growth?

– Joe Obbish
Dec 22 '18 at 0:15

1

I dont understand why you are copying the data. You can drop and add indexes on the existing table.

– Sir Swears-a-lot
Dec 24 '18 at 9:44

The nvarchar(max) is to allow people to type in whatever they want for audit notes, but on average people don't tend to type all that much. Yeah, I'm looking for fastest possible load while minimizing transaction log growth because of the maintenance window I have to work in.

– user1664043
Dec 26 '18 at 14:41

|
show 1 more comment

The table's been out there a while, so in our bigger customer dbs, it's already over 50 million rows. Each row is about 350-400 bytes. The columns are something like this

IdentityColID int, [type] int, [subtype] int, 

created datetime, author nvarchar(100), Message nvarchar(max)

The problems with the design are

The primary clustered key is (type, subtype, created, identitycolid), so it's an insert nightmare. Blocksplits all over the place. And even doing a SELECT COUNT(*) takes like 8 minutes.

There aren't good indexes to support the types of queries desired

So far, I tried using BCP to get the data out, and importing with

BULK INSERT

INSERT INTO ... FROM OPENROWSET

I tried just using T-SQL INSERT INTO NewTable SELECT * FROM OldTable, but even with Recovery Model Simple, the transaction log gets to 100 gig.

I tried using SSIS data import packages with the from/to model, and the net time was about an hour 20 minutes. With Recovery Model Simple, the transaction log stayed small.

Anyone have any suggestions on how I can goose this a bit?

edited Dec 26 '18 at 9:37

marc_s

7,13853849

asked Dec 21 '18 at 21:29

user1664043

1061

The table's been out there a while, so in our bigger customer dbs, it's already over 50 million rows. Each row is about 350-400 bytes. The columns are something like this

IdentityColID int, [type] int, [subtype] int, 

created datetime, author nvarchar(100), Message nvarchar(max)

The problems with the design are

The primary clustered key is (type, subtype, created, identitycolid), so it's an insert nightmare. Blocksplits all over the place. And even doing a SELECT COUNT(*) takes like 8 minutes.

There aren't good indexes to support the types of queries desired

So far, I tried using BCP to get the data out, and importing with

BULK INSERT

INSERT INTO ... FROM OPENROWSET

I tried just using T-SQL INSERT INTO NewTable SELECT * FROM OldTable, but even with Recovery Model Simple, the transaction log gets to 100 gig.

I tried using SSIS data import packages with the from/to model, and the net time was about an hour 20 minutes. With Recovery Model Simple, the transaction log stayed small.

Anyone have any suggestions on how I can goose this a bit?

sql-server bulk-insert bulkcopy

edited Dec 26 '18 at 9:37

marc_s

7,13853849

asked Dec 21 '18 at 21:29

user1664043

1061

edited Dec 26 '18 at 9:37

marc_s

7,13853849

asked Dec 21 '18 at 21:29

user1664043

1061

edited Dec 26 '18 at 9:37

marc_s

7,13853849

edited Dec 26 '18 at 9:37

marc_s

7,13853849

edited Dec 26 '18 at 9:37

marc_s

7,13853849

asked Dec 21 '18 at 21:29

user1664043

1061

asked Dec 21 '18 at 21:29

user1664043

1061

asked Dec 21 '18 at 21:29

user1664043

1061

bumped to the homepage by Community♦ 11 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ 11 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

If you are doing the insert in one batch, then it will be logged, and your log file will grow until the transaction is finished, even when the recovery model is simple. The lob data (messages column) could be slowing your inserts down, if you could do anything about that, that would help.

– Randi Vertongen
Dec 21 '18 at 21:40

5

Have you tried to use INSERT .. SELECT in batches (of say 1-10K rows eacj)? It would probbaly hep to add the indexes after the table has been populated.

– ypercubeᵀᴹ
Dec 21 '18 at 21:42

2

What version of SQL Server? The table is 20 GB, correct? If rows are only 400 bytes then why is there an nvarchar(max) column? Do you want to do the fastest load possible while minimizing transaction log growth?

– Joe Obbish
Dec 22 '18 at 0:15

1

I dont understand why you are copying the data. You can drop and add indexes on the existing table.

– Sir Swears-a-lot
Dec 24 '18 at 9:44

The nvarchar(max) is to allow people to type in whatever they want for audit notes, but on average people don't tend to type all that much. Yeah, I'm looking for fastest possible load while minimizing transaction log growth because of the maintenance window I have to work in.

– user1664043
Dec 26 '18 at 14:41

|
show 1 more comment

If you are doing the insert in one batch, then it will be logged, and your log file will grow until the transaction is finished, even when the recovery model is simple. The lob data (messages column) could be slowing your inserts down, if you could do anything about that, that would help.

– Randi Vertongen
Dec 21 '18 at 21:40

5

Have you tried to use INSERT .. SELECT in batches (of say 1-10K rows eacj)? It would probbaly hep to add the indexes after the table has been populated.

– ypercubeᵀᴹ
Dec 21 '18 at 21:42

2

What version of SQL Server? The table is 20 GB, correct? If rows are only 400 bytes then why is there an nvarchar(max) column? Do you want to do the fastest load possible while minimizing transaction log growth?

– Joe Obbish
Dec 22 '18 at 0:15

1

I dont understand why you are copying the data. You can drop and add indexes on the existing table.

– Sir Swears-a-lot
Dec 24 '18 at 9:44

The nvarchar(max) is to allow people to type in whatever they want for audit notes, but on average people don't tend to type all that much. Yeah, I'm looking for fastest possible load while minimizing transaction log growth because of the maintenance window I have to work in.

– user1664043
Dec 26 '18 at 14:41

If you are doing the insert in one batch, then it will be logged, and your log file will grow until the transaction is finished, even when the recovery model is simple. The lob data (messages column) could be slowing your inserts down, if you could do anything about that, that would help.

– Randi Vertongen
Dec 21 '18 at 21:40

Have you tried to use INSERT .. SELECT in batches (of say 1-10K rows eacj)? It would probbaly hep to add the indexes after the table has been populated.

– ypercubeᵀᴹ
Dec 21 '18 at 21:42

What version of SQL Server? The table is 20 GB, correct? If rows are only 400 bytes then why is there an nvarchar(max) column? Do you want to do the fastest load possible while minimizing transaction log growth?

– Joe Obbish
Dec 22 '18 at 0:15

I dont understand why you are copying the data. You can drop and add indexes on the existing table.

– Sir Swears-a-lot
Dec 24 '18 at 9:44

The nvarchar(max) is to allow people to type in whatever they want for audit notes, but on average people don't tend to type all that much. Yeah, I'm looking for fastest possible load while minimizing transaction log growth because of the maintenance window I have to work in.

– user1664043
Dec 26 '18 at 14:41

|
show 1 more comment

2 Answers
2

active

oldest

votes

Did you change of the configurations when you tried SSIS? Using an Execute SQL Task with SQL statement is equivalent to running the same SQL in SSMS. Below is an overview of doing this process in SSIS. Of course, you'll want to change and test various settings to see what works best in your environment.

Add a Data Flow Task. In this, add an OLE DB Source component. I'd recommend using the "SQL Command" Data Access Mode and writing a SELECT statement that contains only the column you need.

Add either an OLE DB Destination or SQL Server Destination and connect the source to this. The SQL Server Destination tends to perform better, however is only available when importing to a database on a local server. If you use an OLE DB Destination, use the "Table or view - fast load" option. This will operate as a BULK INSERT internally. On the destination un-check the Check Constraints option to bypass this.

To avoid excessive transaction log use, change the Maximum Insert Commit Size (OLE DB Destination) to a value lower than the default. The equivelant to this on the SQL Server Destination in the MaxInsertCommitSize which you can view by right-clicking the component, selecting Show Advanced Editor, and going to the Component Properties page. The default of 0 means all records will be commit in a single transaction.

On the Data Flow Task, go to the Properties window and set AutoAdjustBufferSize to true. This will set the size of the buffer based on the DefaultBufferMaxRows property, which indicates how many rows can be held by the buffers.

You can adjust parallel execution as well. On the package, the MaxConcurrentExecutables property sets the max number of task that can be run. The default is -1, which is the number of processors where the package is running plus two.
The Data Flow Task has an EngineThreads property, which controls how many threads can be used by it.

answered Dec 26 '18 at 19:49

userfl89

4616

add a comment |

I can't give the credit to Sir Swears-A-lot because his response was in a comment, but he was right. My assumption that dropping the primary key and creating a new one would be effectively the same or more expensive than loading up an empty new table was incorrect.

Setting recovery mode simple, dropping the primary key, recreating it (forcing the reshuffle), and adding all my new indexes afterwards clocked in at 40 minutes. That beat any bcp/SSIS method I'd tried by more than a half hour, so that was a big improvement.

And the transaction log grew by 10 gig but didn't go nuts like some of my other experiments.

answered Jan 4 at 20:59

user1664043

1061

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f225591%2fsql-server-bulk-transfer-help-needed%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Add a Data Flow Task. In this, add an OLE DB Source component. I'd recommend using the "SQL Command" Data Access Mode and writing a SELECT statement that contains only the column you need.

Add either an OLE DB Destination or SQL Server Destination and connect the source to this. The SQL Server Destination tends to perform better, however is only available when importing to a database on a local server. If you use an OLE DB Destination, use the "Table or view - fast load" option. This will operate as a BULK INSERT internally. On the destination un-check the Check Constraints option to bypass this.

To avoid excessive transaction log use, change the Maximum Insert Commit Size (OLE DB Destination) to a value lower than the default. The equivelant to this on the SQL Server Destination in the MaxInsertCommitSize which you can view by right-clicking the component, selecting Show Advanced Editor, and going to the Component Properties page. The default of 0 means all records will be commit in a single transaction.

On the Data Flow Task, go to the Properties window and set AutoAdjustBufferSize to true. This will set the size of the buffer based on the DefaultBufferMaxRows property, which indicates how many rows can be held by the buffers.

You can adjust parallel execution as well. On the package, the MaxConcurrentExecutables property sets the max number of task that can be run. The default is -1, which is the number of processors where the package is running plus two.
The Data Flow Task has an EngineThreads property, which controls how many threads can be used by it.

answered Dec 26 '18 at 19:49

userfl89

4616

add a comment |

Add a Data Flow Task. In this, add an OLE DB Source component. I'd recommend using the "SQL Command" Data Access Mode and writing a SELECT statement that contains only the column you need.

Add either an OLE DB Destination or SQL Server Destination and connect the source to this. The SQL Server Destination tends to perform better, however is only available when importing to a database on a local server. If you use an OLE DB Destination, use the "Table or view - fast load" option. This will operate as a BULK INSERT internally. On the destination un-check the Check Constraints option to bypass this.

To avoid excessive transaction log use, change the Maximum Insert Commit Size (OLE DB Destination) to a value lower than the default. The equivelant to this on the SQL Server Destination in the MaxInsertCommitSize which you can view by right-clicking the component, selecting Show Advanced Editor, and going to the Component Properties page. The default of 0 means all records will be commit in a single transaction.

On the Data Flow Task, go to the Properties window and set AutoAdjustBufferSize to true. This will set the size of the buffer based on the DefaultBufferMaxRows property, which indicates how many rows can be held by the buffers.

You can adjust parallel execution as well. On the package, the MaxConcurrentExecutables property sets the max number of task that can be run. The default is -1, which is the number of processors where the package is running plus two.
The Data Flow Task has an EngineThreads property, which controls how many threads can be used by it.

answered Dec 26 '18 at 19:49

userfl89

4616

add a comment |

Add a Data Flow Task. In this, add an OLE DB Source component. I'd recommend using the "SQL Command" Data Access Mode and writing a SELECT statement that contains only the column you need.

Add either an OLE DB Destination or SQL Server Destination and connect the source to this. The SQL Server Destination tends to perform better, however is only available when importing to a database on a local server. If you use an OLE DB Destination, use the "Table or view - fast load" option. This will operate as a BULK INSERT internally. On the destination un-check the Check Constraints option to bypass this.

To avoid excessive transaction log use, change the Maximum Insert Commit Size (OLE DB Destination) to a value lower than the default. The equivelant to this on the SQL Server Destination in the MaxInsertCommitSize which you can view by right-clicking the component, selecting Show Advanced Editor, and going to the Component Properties page. The default of 0 means all records will be commit in a single transaction.

On the Data Flow Task, go to the Properties window and set AutoAdjustBufferSize to true. This will set the size of the buffer based on the DefaultBufferMaxRows property, which indicates how many rows can be held by the buffers.

You can adjust parallel execution as well. On the package, the MaxConcurrentExecutables property sets the max number of task that can be run. The default is -1, which is the number of processors where the package is running plus two.
The Data Flow Task has an EngineThreads property, which controls how many threads can be used by it.

answered Dec 26 '18 at 19:49

userfl89

4616

Add a Data Flow Task. In this, add an OLE DB Source component. I'd recommend using the "SQL Command" Data Access Mode and writing a SELECT statement that contains only the column you need.

Add either an OLE DB Destination or SQL Server Destination and connect the source to this. The SQL Server Destination tends to perform better, however is only available when importing to a database on a local server. If you use an OLE DB Destination, use the "Table or view - fast load" option. This will operate as a BULK INSERT internally. On the destination un-check the Check Constraints option to bypass this.

To avoid excessive transaction log use, change the Maximum Insert Commit Size (OLE DB Destination) to a value lower than the default. The equivelant to this on the SQL Server Destination in the MaxInsertCommitSize which you can view by right-clicking the component, selecting Show Advanced Editor, and going to the Component Properties page. The default of 0 means all records will be commit in a single transaction.

On the Data Flow Task, go to the Properties window and set AutoAdjustBufferSize to true. This will set the size of the buffer based on the DefaultBufferMaxRows property, which indicates how many rows can be held by the buffers.

You can adjust parallel execution as well. On the package, the MaxConcurrentExecutables property sets the max number of task that can be run. The default is -1, which is the number of processors where the package is running plus two.
The Data Flow Task has an EngineThreads property, which controls how many threads can be used by it.

answered Dec 26 '18 at 19:49

userfl89

4616

answered Dec 26 '18 at 19:49

userfl89

4616

answered Dec 26 '18 at 19:49

userfl89

4616

answered Dec 26 '18 at 19:49

userfl89

4616

add a comment |

And the transaction log grew by 10 gig but didn't go nuts like some of my other experiments.

answered Jan 4 at 20:59

user1664043

1061

add a comment |

And the transaction log grew by 10 gig but didn't go nuts like some of my other experiments.

answered Jan 4 at 20:59

user1664043

1061

add a comment |

And the transaction log grew by 10 gig but didn't go nuts like some of my other experiments.

answered Jan 4 at 20:59

user1664043

1061

And the transaction log grew by 10 gig but didn't go nuts like some of my other experiments.

answered Jan 4 at 20:59

user1664043

1061

answered Jan 4 at 20:59

user1664043

1061

answered Jan 4 at 20:59

user1664043

1061

answered Jan 4 at 20:59

user1664043

1061

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vryjdfkk