Transaction terminated running SELECT on secondary AG group
Using SQL Server 2016, Always-On
Running a simple SELECT on a table within a secondary Availability Group.
select distinct some_column
from some_table oo (nolock)
inner join some_other_ ss (nolock) on ss.some_column= oo.some_other_column
SSMS eventually sends back this error:
Msg 3948, Level 16, State 2, Line 14061 The transaction was terminated
because of the availability replica config/state change or because
ghost records are being deleted on the primary and the secondary
availability replica that might be needed by queries running under
snapshot isolation. Retry the transaction.
The same SELECT
with (NOLOCK)
works fine on the primary AG.
With or without the NOLOCK
it fails most of the time 9/10 and sometimes executes correctly but that's rare. No changes were made to the AG at all.
There are. no changes in the AlwaysOn_Health extended event session.
The low_water_mark_for_ghosts
in sys.dm_hadr_database_replica_states
is NULL for some and for others there is a value. For those with a value, this is also the primary.
Using the script provided by Nic:
;
WITH PrimaryStats
AS ( SELECT DB_NAME(database_id) AS DatabaseName ,
low_water_mark_for_ghosts ,
ar.replica_server_name ,
ar.availability_mode_desc
FROM sys.dm_hadr_database_replica_states hdrs
JOIN sys.availability_replicas ar ON hdrs.replica_id = ar.replica_id
WHERE ar.replica_server_name = @@SERVERNAME
)
SELECT DB_NAME(database_id) AS DatabaseName ,
hdrs.low_water_mark_for_ghosts AS LowWaterMarkSecondaryReplica,
ps.low_water_mark_for_ghosts AS LowWaterMarkLocalReplica,
ps.low_water_mark_for_ghosts - hdrs.low_water_mark_for_ghosts AS GhostWatermarkDiff,
ar.replica_server_name AS ReplicaNode,
DATEDIFF(SECOND, last_redone_time, GETDATE()) AS RedoDiffSec,
last_redone_time
FROM sys.dm_hadr_database_replica_states hdrs
JOIN sys.availability_replicas ar ON hdrs.replica_id = ar.replica_id
JOIN PrimaryStats ps ON ps.DatabaseName = DB_NAME(database_id)
ORDER BY
DatabaseName ASC,'NODE ' + right(ar.replica_server_name ,1);
The output is:
When the LowWaterMarkSecondaryReplica
is NULL, its because that's the instance I'm currently executing this query from.
sql-server sql-server-2016 availability-groups ghost-cleanup
add a comment |
Using SQL Server 2016, Always-On
Running a simple SELECT on a table within a secondary Availability Group.
select distinct some_column
from some_table oo (nolock)
inner join some_other_ ss (nolock) on ss.some_column= oo.some_other_column
SSMS eventually sends back this error:
Msg 3948, Level 16, State 2, Line 14061 The transaction was terminated
because of the availability replica config/state change or because
ghost records are being deleted on the primary and the secondary
availability replica that might be needed by queries running under
snapshot isolation. Retry the transaction.
The same SELECT
with (NOLOCK)
works fine on the primary AG.
With or without the NOLOCK
it fails most of the time 9/10 and sometimes executes correctly but that's rare. No changes were made to the AG at all.
There are. no changes in the AlwaysOn_Health extended event session.
The low_water_mark_for_ghosts
in sys.dm_hadr_database_replica_states
is NULL for some and for others there is a value. For those with a value, this is also the primary.
Using the script provided by Nic:
;
WITH PrimaryStats
AS ( SELECT DB_NAME(database_id) AS DatabaseName ,
low_water_mark_for_ghosts ,
ar.replica_server_name ,
ar.availability_mode_desc
FROM sys.dm_hadr_database_replica_states hdrs
JOIN sys.availability_replicas ar ON hdrs.replica_id = ar.replica_id
WHERE ar.replica_server_name = @@SERVERNAME
)
SELECT DB_NAME(database_id) AS DatabaseName ,
hdrs.low_water_mark_for_ghosts AS LowWaterMarkSecondaryReplica,
ps.low_water_mark_for_ghosts AS LowWaterMarkLocalReplica,
ps.low_water_mark_for_ghosts - hdrs.low_water_mark_for_ghosts AS GhostWatermarkDiff,
ar.replica_server_name AS ReplicaNode,
DATEDIFF(SECOND, last_redone_time, GETDATE()) AS RedoDiffSec,
last_redone_time
FROM sys.dm_hadr_database_replica_states hdrs
JOIN sys.availability_replicas ar ON hdrs.replica_id = ar.replica_id
JOIN PrimaryStats ps ON ps.DatabaseName = DB_NAME(database_id)
ORDER BY
DatabaseName ASC,'NODE ' + right(ar.replica_server_name ,1);
The output is:
When the LowWaterMarkSecondaryReplica
is NULL, its because that's the instance I'm currently executing this query from.
sql-server sql-server-2016 availability-groups ghost-cleanup
add a comment |
Using SQL Server 2016, Always-On
Running a simple SELECT on a table within a secondary Availability Group.
select distinct some_column
from some_table oo (nolock)
inner join some_other_ ss (nolock) on ss.some_column= oo.some_other_column
SSMS eventually sends back this error:
Msg 3948, Level 16, State 2, Line 14061 The transaction was terminated
because of the availability replica config/state change or because
ghost records are being deleted on the primary and the secondary
availability replica that might be needed by queries running under
snapshot isolation. Retry the transaction.
The same SELECT
with (NOLOCK)
works fine on the primary AG.
With or without the NOLOCK
it fails most of the time 9/10 and sometimes executes correctly but that's rare. No changes were made to the AG at all.
There are. no changes in the AlwaysOn_Health extended event session.
The low_water_mark_for_ghosts
in sys.dm_hadr_database_replica_states
is NULL for some and for others there is a value. For those with a value, this is also the primary.
Using the script provided by Nic:
;
WITH PrimaryStats
AS ( SELECT DB_NAME(database_id) AS DatabaseName ,
low_water_mark_for_ghosts ,
ar.replica_server_name ,
ar.availability_mode_desc
FROM sys.dm_hadr_database_replica_states hdrs
JOIN sys.availability_replicas ar ON hdrs.replica_id = ar.replica_id
WHERE ar.replica_server_name = @@SERVERNAME
)
SELECT DB_NAME(database_id) AS DatabaseName ,
hdrs.low_water_mark_for_ghosts AS LowWaterMarkSecondaryReplica,
ps.low_water_mark_for_ghosts AS LowWaterMarkLocalReplica,
ps.low_water_mark_for_ghosts - hdrs.low_water_mark_for_ghosts AS GhostWatermarkDiff,
ar.replica_server_name AS ReplicaNode,
DATEDIFF(SECOND, last_redone_time, GETDATE()) AS RedoDiffSec,
last_redone_time
FROM sys.dm_hadr_database_replica_states hdrs
JOIN sys.availability_replicas ar ON hdrs.replica_id = ar.replica_id
JOIN PrimaryStats ps ON ps.DatabaseName = DB_NAME(database_id)
ORDER BY
DatabaseName ASC,'NODE ' + right(ar.replica_server_name ,1);
The output is:
When the LowWaterMarkSecondaryReplica
is NULL, its because that's the instance I'm currently executing this query from.
sql-server sql-server-2016 availability-groups ghost-cleanup
Using SQL Server 2016, Always-On
Running a simple SELECT on a table within a secondary Availability Group.
select distinct some_column
from some_table oo (nolock)
inner join some_other_ ss (nolock) on ss.some_column= oo.some_other_column
SSMS eventually sends back this error:
Msg 3948, Level 16, State 2, Line 14061 The transaction was terminated
because of the availability replica config/state change or because
ghost records are being deleted on the primary and the secondary
availability replica that might be needed by queries running under
snapshot isolation. Retry the transaction.
The same SELECT
with (NOLOCK)
works fine on the primary AG.
With or without the NOLOCK
it fails most of the time 9/10 and sometimes executes correctly but that's rare. No changes were made to the AG at all.
There are. no changes in the AlwaysOn_Health extended event session.
The low_water_mark_for_ghosts
in sys.dm_hadr_database_replica_states
is NULL for some and for others there is a value. For those with a value, this is also the primary.
Using the script provided by Nic:
;
WITH PrimaryStats
AS ( SELECT DB_NAME(database_id) AS DatabaseName ,
low_water_mark_for_ghosts ,
ar.replica_server_name ,
ar.availability_mode_desc
FROM sys.dm_hadr_database_replica_states hdrs
JOIN sys.availability_replicas ar ON hdrs.replica_id = ar.replica_id
WHERE ar.replica_server_name = @@SERVERNAME
)
SELECT DB_NAME(database_id) AS DatabaseName ,
hdrs.low_water_mark_for_ghosts AS LowWaterMarkSecondaryReplica,
ps.low_water_mark_for_ghosts AS LowWaterMarkLocalReplica,
ps.low_water_mark_for_ghosts - hdrs.low_water_mark_for_ghosts AS GhostWatermarkDiff,
ar.replica_server_name AS ReplicaNode,
DATEDIFF(SECOND, last_redone_time, GETDATE()) AS RedoDiffSec,
last_redone_time
FROM sys.dm_hadr_database_replica_states hdrs
JOIN sys.availability_replicas ar ON hdrs.replica_id = ar.replica_id
JOIN PrimaryStats ps ON ps.DatabaseName = DB_NAME(database_id)
ORDER BY
DatabaseName ASC,'NODE ' + right(ar.replica_server_name ,1);
The output is:
When the LowWaterMarkSecondaryReplica
is NULL, its because that's the instance I'm currently executing this query from.
sql-server sql-server-2016 availability-groups ghost-cleanup
sql-server sql-server-2016 availability-groups ghost-cleanup
edited 2 mins ago
Paul White♦
50.9k14278448
50.9k14278448
asked Nov 16 '16 at 13:58
Craig EfreinCraig Efrein
7,21284080
7,21284080
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
This is caused by one or two things, much like the error says:
- A config change was made to the AG
- The primary restarted
- The ghost cleanup low watermark was set
The first two are fairly obvious as to why this would be an issue. The 3rd is not.
Since secondary replicas automagically remap read-committed isolation level to the snapshot isolation level a few things need to kick in. One of those things is row versioning, since after all SI is an optimistic concurrency.
Row versioning depends on the version store rows to be available on the secondary. If the low water for ghost cleanup is changed, then the secondary may need to cleanup records that are currently used for queries being satisfied by the version store which have a timestamp lower than the new ghost cleanup low watermark. When this is the case, those versions will be cleaned up and any SI datasets that required them will no longer be valid. That's when this error will surface for the session for that dataset.
select distinct some_column
from some_table oo (nolock)
inner join some_other_ ss (nolock) on ss.some_column= oo.some_other_column
If this query runs quickly, I don't see why it should error out like that as I doubt your low watermark is changing that rapidly. Thus, I'm going to assume this takes some time to run. Having the appropriate indexes, not using a long transaction, and up-to-date statistics will help with this query. Additionally, the NOLOCK
hint is completely ignored on the readable secondary, you may already know that and it may just be code re-use which is fine but I wanted to point it out anyway.
We are considering using read-intent to resolve the issue, do you think that will resolve the problem?
ApplicationIntent
is only used for two things in general with AOAGs:
- read only routing (ROR)
- to access AG Replicas that are set as secondary mode read-intent only.
This will not solve the underlying problem. IMHO you have two options, make the query run faster (if it isn't already) or open up a support ticket with MS. The reason I say open a ticket is this query should not constantly fail 24/7 and the issue should be looked into.
Hi, you are definitely right about the SI kicking in, so yeah the NOLOCK and ROWLOCK are totally ignored. Since the queries failing with Always On worked perfectly in Replication, its considered a regression so we have to find the answer in Always On and not by changing the query. We are considering using read-intent to resolve the issue, do you think that will resolve the problem?
– Craig Efrein
Nov 18 '16 at 12:47
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f155462%2ftransaction-terminated-running-select-on-secondary-ag-group%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
This is caused by one or two things, much like the error says:
- A config change was made to the AG
- The primary restarted
- The ghost cleanup low watermark was set
The first two are fairly obvious as to why this would be an issue. The 3rd is not.
Since secondary replicas automagically remap read-committed isolation level to the snapshot isolation level a few things need to kick in. One of those things is row versioning, since after all SI is an optimistic concurrency.
Row versioning depends on the version store rows to be available on the secondary. If the low water for ghost cleanup is changed, then the secondary may need to cleanup records that are currently used for queries being satisfied by the version store which have a timestamp lower than the new ghost cleanup low watermark. When this is the case, those versions will be cleaned up and any SI datasets that required them will no longer be valid. That's when this error will surface for the session for that dataset.
select distinct some_column
from some_table oo (nolock)
inner join some_other_ ss (nolock) on ss.some_column= oo.some_other_column
If this query runs quickly, I don't see why it should error out like that as I doubt your low watermark is changing that rapidly. Thus, I'm going to assume this takes some time to run. Having the appropriate indexes, not using a long transaction, and up-to-date statistics will help with this query. Additionally, the NOLOCK
hint is completely ignored on the readable secondary, you may already know that and it may just be code re-use which is fine but I wanted to point it out anyway.
We are considering using read-intent to resolve the issue, do you think that will resolve the problem?
ApplicationIntent
is only used for two things in general with AOAGs:
- read only routing (ROR)
- to access AG Replicas that are set as secondary mode read-intent only.
This will not solve the underlying problem. IMHO you have two options, make the query run faster (if it isn't already) or open up a support ticket with MS. The reason I say open a ticket is this query should not constantly fail 24/7 and the issue should be looked into.
Hi, you are definitely right about the SI kicking in, so yeah the NOLOCK and ROWLOCK are totally ignored. Since the queries failing with Always On worked perfectly in Replication, its considered a regression so we have to find the answer in Always On and not by changing the query. We are considering using read-intent to resolve the issue, do you think that will resolve the problem?
– Craig Efrein
Nov 18 '16 at 12:47
add a comment |
This is caused by one or two things, much like the error says:
- A config change was made to the AG
- The primary restarted
- The ghost cleanup low watermark was set
The first two are fairly obvious as to why this would be an issue. The 3rd is not.
Since secondary replicas automagically remap read-committed isolation level to the snapshot isolation level a few things need to kick in. One of those things is row versioning, since after all SI is an optimistic concurrency.
Row versioning depends on the version store rows to be available on the secondary. If the low water for ghost cleanup is changed, then the secondary may need to cleanup records that are currently used for queries being satisfied by the version store which have a timestamp lower than the new ghost cleanup low watermark. When this is the case, those versions will be cleaned up and any SI datasets that required them will no longer be valid. That's when this error will surface for the session for that dataset.
select distinct some_column
from some_table oo (nolock)
inner join some_other_ ss (nolock) on ss.some_column= oo.some_other_column
If this query runs quickly, I don't see why it should error out like that as I doubt your low watermark is changing that rapidly. Thus, I'm going to assume this takes some time to run. Having the appropriate indexes, not using a long transaction, and up-to-date statistics will help with this query. Additionally, the NOLOCK
hint is completely ignored on the readable secondary, you may already know that and it may just be code re-use which is fine but I wanted to point it out anyway.
We are considering using read-intent to resolve the issue, do you think that will resolve the problem?
ApplicationIntent
is only used for two things in general with AOAGs:
- read only routing (ROR)
- to access AG Replicas that are set as secondary mode read-intent only.
This will not solve the underlying problem. IMHO you have two options, make the query run faster (if it isn't already) or open up a support ticket with MS. The reason I say open a ticket is this query should not constantly fail 24/7 and the issue should be looked into.
Hi, you are definitely right about the SI kicking in, so yeah the NOLOCK and ROWLOCK are totally ignored. Since the queries failing with Always On worked perfectly in Replication, its considered a regression so we have to find the answer in Always On and not by changing the query. We are considering using read-intent to resolve the issue, do you think that will resolve the problem?
– Craig Efrein
Nov 18 '16 at 12:47
add a comment |
This is caused by one or two things, much like the error says:
- A config change was made to the AG
- The primary restarted
- The ghost cleanup low watermark was set
The first two are fairly obvious as to why this would be an issue. The 3rd is not.
Since secondary replicas automagically remap read-committed isolation level to the snapshot isolation level a few things need to kick in. One of those things is row versioning, since after all SI is an optimistic concurrency.
Row versioning depends on the version store rows to be available on the secondary. If the low water for ghost cleanup is changed, then the secondary may need to cleanup records that are currently used for queries being satisfied by the version store which have a timestamp lower than the new ghost cleanup low watermark. When this is the case, those versions will be cleaned up and any SI datasets that required them will no longer be valid. That's when this error will surface for the session for that dataset.
select distinct some_column
from some_table oo (nolock)
inner join some_other_ ss (nolock) on ss.some_column= oo.some_other_column
If this query runs quickly, I don't see why it should error out like that as I doubt your low watermark is changing that rapidly. Thus, I'm going to assume this takes some time to run. Having the appropriate indexes, not using a long transaction, and up-to-date statistics will help with this query. Additionally, the NOLOCK
hint is completely ignored on the readable secondary, you may already know that and it may just be code re-use which is fine but I wanted to point it out anyway.
We are considering using read-intent to resolve the issue, do you think that will resolve the problem?
ApplicationIntent
is only used for two things in general with AOAGs:
- read only routing (ROR)
- to access AG Replicas that are set as secondary mode read-intent only.
This will not solve the underlying problem. IMHO you have two options, make the query run faster (if it isn't already) or open up a support ticket with MS. The reason I say open a ticket is this query should not constantly fail 24/7 and the issue should be looked into.
This is caused by one or two things, much like the error says:
- A config change was made to the AG
- The primary restarted
- The ghost cleanup low watermark was set
The first two are fairly obvious as to why this would be an issue. The 3rd is not.
Since secondary replicas automagically remap read-committed isolation level to the snapshot isolation level a few things need to kick in. One of those things is row versioning, since after all SI is an optimistic concurrency.
Row versioning depends on the version store rows to be available on the secondary. If the low water for ghost cleanup is changed, then the secondary may need to cleanup records that are currently used for queries being satisfied by the version store which have a timestamp lower than the new ghost cleanup low watermark. When this is the case, those versions will be cleaned up and any SI datasets that required them will no longer be valid. That's when this error will surface for the session for that dataset.
select distinct some_column
from some_table oo (nolock)
inner join some_other_ ss (nolock) on ss.some_column= oo.some_other_column
If this query runs quickly, I don't see why it should error out like that as I doubt your low watermark is changing that rapidly. Thus, I'm going to assume this takes some time to run. Having the appropriate indexes, not using a long transaction, and up-to-date statistics will help with this query. Additionally, the NOLOCK
hint is completely ignored on the readable secondary, you may already know that and it may just be code re-use which is fine but I wanted to point it out anyway.
We are considering using read-intent to resolve the issue, do you think that will resolve the problem?
ApplicationIntent
is only used for two things in general with AOAGs:
- read only routing (ROR)
- to access AG Replicas that are set as secondary mode read-intent only.
This will not solve the underlying problem. IMHO you have two options, make the query run faster (if it isn't already) or open up a support ticket with MS. The reason I say open a ticket is this query should not constantly fail 24/7 and the issue should be looked into.
edited 8 mins ago
Paul White♦
50.9k14278448
50.9k14278448
answered Nov 17 '16 at 19:52
Sean GallardySean Gallardy
15.8k22548
15.8k22548
Hi, you are definitely right about the SI kicking in, so yeah the NOLOCK and ROWLOCK are totally ignored. Since the queries failing with Always On worked perfectly in Replication, its considered a regression so we have to find the answer in Always On and not by changing the query. We are considering using read-intent to resolve the issue, do you think that will resolve the problem?
– Craig Efrein
Nov 18 '16 at 12:47
add a comment |
Hi, you are definitely right about the SI kicking in, so yeah the NOLOCK and ROWLOCK are totally ignored. Since the queries failing with Always On worked perfectly in Replication, its considered a regression so we have to find the answer in Always On and not by changing the query. We are considering using read-intent to resolve the issue, do you think that will resolve the problem?
– Craig Efrein
Nov 18 '16 at 12:47
Hi, you are definitely right about the SI kicking in, so yeah the NOLOCK and ROWLOCK are totally ignored. Since the queries failing with Always On worked perfectly in Replication, its considered a regression so we have to find the answer in Always On and not by changing the query. We are considering using read-intent to resolve the issue, do you think that will resolve the problem?
– Craig Efrein
Nov 18 '16 at 12:47
Hi, you are definitely right about the SI kicking in, so yeah the NOLOCK and ROWLOCK are totally ignored. Since the queries failing with Always On worked perfectly in Replication, its considered a regression so we have to find the answer in Always On and not by changing the query. We are considering using read-intent to resolve the issue, do you think that will resolve the problem?
– Craig Efrein
Nov 18 '16 at 12:47
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f155462%2ftransaction-terminated-running-select-on-secondary-ag-group%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown