Transaction terminated running SELECT on secondary AG group

Using SQL Server 2016, Always-On

Running a simple SELECT on a table within a secondary Availability Group.

select distinct some_column 

from some_table oo (nolock) 

inner join some_other_ ss (nolock) on ss.some_column= oo.some_other_column

SSMS eventually sends back this error:

Msg 3948, Level 16, State 2, Line 14061 The transaction was terminated
because of the availability replica config/state change or because
ghost records are being deleted on the primary and the secondary
availability replica that might be needed by queries running under
snapshot isolation. Retry the transaction.

The same SELECT with (NOLOCK) works fine on the primary AG.

With or without the NOLOCK it fails most of the time 9/10 and sometimes executes correctly but that's rare. No changes were made to the AG at all.

There are. no changes in the AlwaysOn_Health extended event session.

The low_water_mark_for_ghosts in sys.dm_hadr_database_replica_states is NULL for some and for others there is a value. For those with a value, this is also the primary.

Using the script provided by Nic:

;

WITH    PrimaryStats

          AS ( SELECT   DB_NAME(database_id) AS DatabaseName ,

                        low_water_mark_for_ghosts ,

                        ar.replica_server_name ,

                        ar.availability_mode_desc

               FROM     sys.dm_hadr_database_replica_states hdrs

                        JOIN sys.availability_replicas ar ON hdrs.replica_id = ar.replica_id

               WHERE    ar.replica_server_name = @@SERVERNAME

             )

    SELECT  DB_NAME(database_id) AS DatabaseName ,

            hdrs.low_water_mark_for_ghosts AS LowWaterMarkSecondaryReplica,

            ps.low_water_mark_for_ghosts AS LowWaterMarkLocalReplica,

            ps.low_water_mark_for_ghosts - hdrs.low_water_mark_for_ghosts AS GhostWatermarkDiff,

            ar.replica_server_name AS ReplicaNode,

            DATEDIFF(SECOND, last_redone_time, GETDATE()) AS RedoDiffSec,

            last_redone_time

    FROM    sys.dm_hadr_database_replica_states hdrs

            JOIN sys.availability_replicas ar ON hdrs.replica_id = ar.replica_id

            JOIN PrimaryStats ps ON ps.DatabaseName = DB_NAME(database_id)

    ORDER BY 

            DatabaseName ASC,'NODE ' + right(ar.replica_server_name ,1);

The output is:

enter image description here

When the LowWaterMarkSecondaryReplica is NULL, its because that's the instance I'm currently executing this query from.

edited 2 mins ago

Paul White♦

50.9k14278448

asked Nov 16 '16 at 13:58

Craig Efrein

7,21284080

add a comment |

Using SQL Server 2016, Always-On

Running a simple SELECT on a table within a secondary Availability Group.

select distinct some_column 

from some_table oo (nolock) 

inner join some_other_ ss (nolock) on ss.some_column= oo.some_other_column

SSMS eventually sends back this error:

Msg 3948, Level 16, State 2, Line 14061 The transaction was terminated
because of the availability replica config/state change or because
ghost records are being deleted on the primary and the secondary
availability replica that might be needed by queries running under
snapshot isolation. Retry the transaction.

The same SELECT with (NOLOCK) works fine on the primary AG.

With or without the NOLOCK it fails most of the time 9/10 and sometimes executes correctly but that's rare. No changes were made to the AG at all.

There are. no changes in the AlwaysOn_Health extended event session.

The low_water_mark_for_ghosts in sys.dm_hadr_database_replica_states is NULL for some and for others there is a value. For those with a value, this is also the primary.

Using the script provided by Nic:

;

WITH    PrimaryStats

          AS ( SELECT   DB_NAME(database_id) AS DatabaseName ,

                        low_water_mark_for_ghosts ,

                        ar.replica_server_name ,

                        ar.availability_mode_desc

               FROM     sys.dm_hadr_database_replica_states hdrs

                        JOIN sys.availability_replicas ar ON hdrs.replica_id = ar.replica_id

               WHERE    ar.replica_server_name = @@SERVERNAME

             )

    SELECT  DB_NAME(database_id) AS DatabaseName ,

            hdrs.low_water_mark_for_ghosts AS LowWaterMarkSecondaryReplica,

            ps.low_water_mark_for_ghosts AS LowWaterMarkLocalReplica,

            ps.low_water_mark_for_ghosts - hdrs.low_water_mark_for_ghosts AS GhostWatermarkDiff,

            ar.replica_server_name AS ReplicaNode,

            DATEDIFF(SECOND, last_redone_time, GETDATE()) AS RedoDiffSec,

            last_redone_time

    FROM    sys.dm_hadr_database_replica_states hdrs

            JOIN sys.availability_replicas ar ON hdrs.replica_id = ar.replica_id

            JOIN PrimaryStats ps ON ps.DatabaseName = DB_NAME(database_id)

    ORDER BY 

            DatabaseName ASC,'NODE ' + right(ar.replica_server_name ,1);

The output is:

enter image description here

When the LowWaterMarkSecondaryReplica is NULL, its because that's the instance I'm currently executing this query from.

edited 2 mins ago

Paul White♦

50.9k14278448

asked Nov 16 '16 at 13:58

Craig Efrein

7,21284080

add a comment |

Using SQL Server 2016, Always-On

Running a simple SELECT on a table within a secondary Availability Group.

select distinct some_column 

from some_table oo (nolock) 

inner join some_other_ ss (nolock) on ss.some_column= oo.some_other_column

SSMS eventually sends back this error:

Msg 3948, Level 16, State 2, Line 14061 The transaction was terminated
because of the availability replica config/state change or because
ghost records are being deleted on the primary and the secondary
availability replica that might be needed by queries running under
snapshot isolation. Retry the transaction.

The same SELECT with (NOLOCK) works fine on the primary AG.

With or without the NOLOCK it fails most of the time 9/10 and sometimes executes correctly but that's rare. No changes were made to the AG at all.

There are. no changes in the AlwaysOn_Health extended event session.

The low_water_mark_for_ghosts in sys.dm_hadr_database_replica_states is NULL for some and for others there is a value. For those with a value, this is also the primary.

Using the script provided by Nic:

;

WITH    PrimaryStats

          AS ( SELECT   DB_NAME(database_id) AS DatabaseName ,

                        low_water_mark_for_ghosts ,

                        ar.replica_server_name ,

                        ar.availability_mode_desc

               FROM     sys.dm_hadr_database_replica_states hdrs

                        JOIN sys.availability_replicas ar ON hdrs.replica_id = ar.replica_id

               WHERE    ar.replica_server_name = @@SERVERNAME

             )

    SELECT  DB_NAME(database_id) AS DatabaseName ,

            hdrs.low_water_mark_for_ghosts AS LowWaterMarkSecondaryReplica,

            ps.low_water_mark_for_ghosts AS LowWaterMarkLocalReplica,

            ps.low_water_mark_for_ghosts - hdrs.low_water_mark_for_ghosts AS GhostWatermarkDiff,

            ar.replica_server_name AS ReplicaNode,

            DATEDIFF(SECOND, last_redone_time, GETDATE()) AS RedoDiffSec,

            last_redone_time

    FROM    sys.dm_hadr_database_replica_states hdrs

            JOIN sys.availability_replicas ar ON hdrs.replica_id = ar.replica_id

            JOIN PrimaryStats ps ON ps.DatabaseName = DB_NAME(database_id)

    ORDER BY 

            DatabaseName ASC,'NODE ' + right(ar.replica_server_name ,1);

The output is:

enter image description here

When the LowWaterMarkSecondaryReplica is NULL, its because that's the instance I'm currently executing this query from.

edited 2 mins ago

Paul White♦

50.9k14278448

asked Nov 16 '16 at 13:58

Craig Efrein

7,21284080

Using SQL Server 2016, Always-On

Running a simple SELECT on a table within a secondary Availability Group.

select distinct some_column 

from some_table oo (nolock) 

inner join some_other_ ss (nolock) on ss.some_column= oo.some_other_column

SSMS eventually sends back this error:

Msg 3948, Level 16, State 2, Line 14061 The transaction was terminated
because of the availability replica config/state change or because
ghost records are being deleted on the primary and the secondary
availability replica that might be needed by queries running under
snapshot isolation. Retry the transaction.

The same SELECT with (NOLOCK) works fine on the primary AG.

With or without the NOLOCK it fails most of the time 9/10 and sometimes executes correctly but that's rare. No changes were made to the AG at all.

There are. no changes in the AlwaysOn_Health extended event session.

The low_water_mark_for_ghosts in sys.dm_hadr_database_replica_states is NULL for some and for others there is a value. For those with a value, this is also the primary.

Using the script provided by Nic:

;

WITH    PrimaryStats

          AS ( SELECT   DB_NAME(database_id) AS DatabaseName ,

                        low_water_mark_for_ghosts ,

                        ar.replica_server_name ,

                        ar.availability_mode_desc

               FROM     sys.dm_hadr_database_replica_states hdrs

                        JOIN sys.availability_replicas ar ON hdrs.replica_id = ar.replica_id

               WHERE    ar.replica_server_name = @@SERVERNAME

             )

    SELECT  DB_NAME(database_id) AS DatabaseName ,

            hdrs.low_water_mark_for_ghosts AS LowWaterMarkSecondaryReplica,

            ps.low_water_mark_for_ghosts AS LowWaterMarkLocalReplica,

            ps.low_water_mark_for_ghosts - hdrs.low_water_mark_for_ghosts AS GhostWatermarkDiff,

            ar.replica_server_name AS ReplicaNode,

            DATEDIFF(SECOND, last_redone_time, GETDATE()) AS RedoDiffSec,

            last_redone_time

    FROM    sys.dm_hadr_database_replica_states hdrs

            JOIN sys.availability_replicas ar ON hdrs.replica_id = ar.replica_id

            JOIN PrimaryStats ps ON ps.DatabaseName = DB_NAME(database_id)

    ORDER BY 

            DatabaseName ASC,'NODE ' + right(ar.replica_server_name ,1);

The output is:

enter image description here

When the LowWaterMarkSecondaryReplica is NULL, its because that's the instance I'm currently executing this query from.

sql-server sql-server-2016 availability-groups ghost-cleanup

edited 2 mins ago

Paul White♦

50.9k14278448

asked Nov 16 '16 at 13:58

Craig Efrein

7,21284080

edited 2 mins ago

Paul White♦

50.9k14278448

asked Nov 16 '16 at 13:58

Craig Efrein

7,21284080

edited 2 mins ago

Paul White♦

50.9k14278448

edited 2 mins ago

Paul White♦

50.9k14278448

edited 2 mins ago

Paul White♦

50.9k14278448

asked Nov 16 '16 at 13:58

Craig Efrein

7,21284080

asked Nov 16 '16 at 13:58

Craig Efrein

7,21284080

asked Nov 16 '16 at 13:58

Craig Efrein

7,21284080

add a comment |

1 Answer
1

active

oldest

votes

This is caused by one or two things, much like the error says:

A config change was made to the AG

The primary restarted

The ghost cleanup low watermark was set

The first two are fairly obvious as to why this would be an issue. The 3rd is not.

Since secondary replicas automagically remap read-committed isolation level to the snapshot isolation level a few things need to kick in. One of those things is row versioning, since after all SI is an optimistic concurrency.

Row versioning depends on the version store rows to be available on the secondary. If the low water for ghost cleanup is changed, then the secondary may need to cleanup records that are currently used for queries being satisfied by the version store which have a timestamp lower than the new ghost cleanup low watermark. When this is the case, those versions will be cleaned up and any SI datasets that required them will no longer be valid. That's when this error will surface for the session for that dataset.

select distinct some_column 

from some_table oo (nolock) 

inner join some_other_ ss (nolock) on ss.some_column= oo.some_other_column

If this query runs quickly, I don't see why it should error out like that as I doubt your low watermark is changing that rapidly. Thus, I'm going to assume this takes some time to run. Having the appropriate indexes, not using a long transaction, and up-to-date statistics will help with this query. Additionally, the NOLOCK hint is completely ignored on the readable secondary, you may already know that and it may just be code re-use which is fine but I wanted to point it out anyway.

We are considering using read-intent to resolve the issue, do you think that will resolve the problem?

ApplicationIntent is only used for two things in general with AOAGs:

read only routing (ROR)

to access AG Replicas that are set as secondary mode read-intent only.

This will not solve the underlying problem. IMHO you have two options, make the query run faster (if it isn't already) or open up a support ticket with MS. The reason I say open a ticket is this query should not constantly fail 24/7 and the issue should be looked into.

edited 8 mins ago

Paul White♦

50.9k14278448

answered Nov 17 '16 at 19:52

Sean Gallardy

15.8k22548

Hi, you are definitely right about the SI kicking in, so yeah the NOLOCK and ROWLOCK are totally ignored. Since the queries failing with Always On worked perfectly in Replication, its considered a regression so we have to find the answer in Always On and not by changing the query. We are considering using read-intent to resolve the issue, do you think that will resolve the problem?

– Craig Efrein
Nov 18 '16 at 12:47

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f155462%2ftransaction-terminated-running-select-on-secondary-ag-group%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

This is caused by one or two things, much like the error says:

A config change was made to the AG

The primary restarted

The ghost cleanup low watermark was set

The first two are fairly obvious as to why this would be an issue. The 3rd is not.

select distinct some_column 

from some_table oo (nolock) 

inner join some_other_ ss (nolock) on ss.some_column= oo.some_other_column

We are considering using read-intent to resolve the issue, do you think that will resolve the problem?

ApplicationIntent is only used for two things in general with AOAGs:

read only routing (ROR)

to access AG Replicas that are set as secondary mode read-intent only.

edited 8 mins ago

Paul White♦

50.9k14278448

answered Nov 17 '16 at 19:52

Sean Gallardy

15.8k22548

Hi, you are definitely right about the SI kicking in, so yeah the NOLOCK and ROWLOCK are totally ignored. Since the queries failing with Always On worked perfectly in Replication, its considered a regression so we have to find the answer in Always On and not by changing the query. We are considering using read-intent to resolve the issue, do you think that will resolve the problem?

– Craig Efrein
Nov 18 '16 at 12:47

add a comment |

This is caused by one or two things, much like the error says:

A config change was made to the AG

The primary restarted

The ghost cleanup low watermark was set

The first two are fairly obvious as to why this would be an issue. The 3rd is not.

select distinct some_column 

from some_table oo (nolock) 

inner join some_other_ ss (nolock) on ss.some_column= oo.some_other_column

We are considering using read-intent to resolve the issue, do you think that will resolve the problem?

ApplicationIntent is only used for two things in general with AOAGs:

read only routing (ROR)

to access AG Replicas that are set as secondary mode read-intent only.

edited 8 mins ago

Paul White♦

50.9k14278448

answered Nov 17 '16 at 19:52

Sean Gallardy

15.8k22548

Hi, you are definitely right about the SI kicking in, so yeah the NOLOCK and ROWLOCK are totally ignored. Since the queries failing with Always On worked perfectly in Replication, its considered a regression so we have to find the answer in Always On and not by changing the query. We are considering using read-intent to resolve the issue, do you think that will resolve the problem?

– Craig Efrein
Nov 18 '16 at 12:47

add a comment |

This is caused by one or two things, much like the error says:

A config change was made to the AG

The primary restarted

The ghost cleanup low watermark was set

The first two are fairly obvious as to why this would be an issue. The 3rd is not.

select distinct some_column 

from some_table oo (nolock) 

inner join some_other_ ss (nolock) on ss.some_column= oo.some_other_column

We are considering using read-intent to resolve the issue, do you think that will resolve the problem?

ApplicationIntent is only used for two things in general with AOAGs:

read only routing (ROR)

to access AG Replicas that are set as secondary mode read-intent only.

edited 8 mins ago

Paul White♦

50.9k14278448

answered Nov 17 '16 at 19:52

Sean Gallardy

15.8k22548

This is caused by one or two things, much like the error says:

A config change was made to the AG

The primary restarted

The ghost cleanup low watermark was set

The first two are fairly obvious as to why this would be an issue. The 3rd is not.

select distinct some_column 

from some_table oo (nolock) 

inner join some_other_ ss (nolock) on ss.some_column= oo.some_other_column

We are considering using read-intent to resolve the issue, do you think that will resolve the problem?

ApplicationIntent is only used for two things in general with AOAGs:

read only routing (ROR)

to access AG Replicas that are set as secondary mode read-intent only.

edited 8 mins ago

Paul White♦

50.9k14278448

answered Nov 17 '16 at 19:52

Sean Gallardy

15.8k22548

edited 8 mins ago

Paul White♦

50.9k14278448

edited 8 mins ago

Paul White♦

50.9k14278448

edited 8 mins ago

Paul White♦

50.9k14278448

answered Nov 17 '16 at 19:52

Sean Gallardy

15.8k22548

answered Nov 17 '16 at 19:52

Sean Gallardy

15.8k22548

answered Nov 17 '16 at 19:52

Sean Gallardy

15.8k22548

Hi, you are definitely right about the SI kicking in, so yeah the NOLOCK and ROWLOCK are totally ignored. Since the queries failing with Always On worked perfectly in Replication, its considered a regression so we have to find the answer in Always On and not by changing the query. We are considering using read-intent to resolve the issue, do you think that will resolve the problem?

– Craig Efrein
Nov 18 '16 at 12:47

add a comment |

Hi, you are definitely right about the SI kicking in, so yeah the NOLOCK and ROWLOCK are totally ignored. Since the queries failing with Always On worked perfectly in Replication, its considered a regression so we have to find the answer in Always On and not by changing the query. We are considering using read-intent to resolve the issue, do you think that will resolve the problem?

– Craig Efrein
Nov 18 '16 at 12:47

Hi, you are definitely right about the SI kicking in, so yeah the NOLOCK and ROWLOCK are totally ignored. Since the queries failing with Always On worked perfectly in Replication, its considered a regression so we have to find the answer in Always On and not by changing the query. We are considering using read-intent to resolve the issue, do you think that will resolve the problem?

– Craig Efrein
Nov 18 '16 at 12:47

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vryjdfkk