Galera Cluster Setup - Primary and Secondary Site Scenario

I'm very new to Galera Cluster and is exploring a potential setup with reasonable resiliency to node failure and network failure. Looking at the very bottom part of this documentation, the Weighted Quorum for a Primary and Secondary Site Scenario is quite promising. For ease of reading, I've extracted the setup from the document as follows:

When configuring quorum weights for primary and secondary sites, use
the following pattern:
Primary Site:

  node1: pc.weight = 2

  node2: pc.weight = 2



Secondary Site:

  node3: pc.weight = 1

  node4: pc.weight = 1
Under this pattern, some nodes are located at the primary site while
others are at the secondary site. In the event that the secondary site
goes down or if network connectivity is lost between the sites, the
nodes at the primary site remain the Primary Component. Additionally,
either node1 or node2 can crash without the rest of the nodes becoming
non-primary components.

But there seems to be two drawbacks:

If there are two failed nodes and one of them happened to be on the primary site, the quorum will be <= 50% and the remaining two nodes will become non-primary components.

Despite pc.weight is a dynamic option that can be changed while the server is running, flipping between primary site and secondary site requires modification on all nodes, which is a bit troublesome.

So I've come up with another idea in mind - leave the weight as 1 for all nodes, and in the primary site add a Galera Arbitrator. In this case:

The primary site will remain the Primary Component on network issue,
just like the original setup.

The cluster still functions even if two nodes failed.

Flipping between primary and secondary site just require a move of the Galera Arbitrator.

May I know if there's anything wrong with my idea, or if there will be any practical difficulties? Appreciate if you can share your thoughts with me.

asked Oct 6 '17 at 15:51

CLDev

1062

bumped to the homepage by Community♦ 23 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

When configuring quorum weights for primary and secondary sites, use
the following pattern:
Primary Site:

  node1: pc.weight = 2

  node2: pc.weight = 2



Secondary Site:

  node3: pc.weight = 1

  node4: pc.weight = 1
Under this pattern, some nodes are located at the primary site while
others are at the secondary site. In the event that the secondary site
goes down or if network connectivity is lost between the sites, the
nodes at the primary site remain the Primary Component. Additionally,
either node1 or node2 can crash without the rest of the nodes becoming
non-primary components.

But there seems to be two drawbacks:

If there are two failed nodes and one of them happened to be on the primary site, the quorum will be <= 50% and the remaining two nodes will become non-primary components.

Despite pc.weight is a dynamic option that can be changed while the server is running, flipping between primary site and secondary site requires modification on all nodes, which is a bit troublesome.

So I've come up with another idea in mind - leave the weight as 1 for all nodes, and in the primary site add a Galera Arbitrator. In this case:

The primary site will remain the Primary Component on network issue,
just like the original setup.

The cluster still functions even if two nodes failed.

Flipping between primary and secondary site just require a move of the Galera Arbitrator.

May I know if there's anything wrong with my idea, or if there will be any practical difficulties? Appreciate if you can share your thoughts with me.

asked Oct 6 '17 at 15:51

CLDev

1062

bumped to the homepage by Community♦ 23 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

When configuring quorum weights for primary and secondary sites, use
the following pattern:
Primary Site:

  node1: pc.weight = 2

  node2: pc.weight = 2



Secondary Site:

  node3: pc.weight = 1

  node4: pc.weight = 1
Under this pattern, some nodes are located at the primary site while
others are at the secondary site. In the event that the secondary site
goes down or if network connectivity is lost between the sites, the
nodes at the primary site remain the Primary Component. Additionally,
either node1 or node2 can crash without the rest of the nodes becoming
non-primary components.

But there seems to be two drawbacks:

If there are two failed nodes and one of them happened to be on the primary site, the quorum will be <= 50% and the remaining two nodes will become non-primary components.

Despite pc.weight is a dynamic option that can be changed while the server is running, flipping between primary site and secondary site requires modification on all nodes, which is a bit troublesome.

So I've come up with another idea in mind - leave the weight as 1 for all nodes, and in the primary site add a Galera Arbitrator. In this case:

The primary site will remain the Primary Component on network issue,
just like the original setup.

The cluster still functions even if two nodes failed.

Flipping between primary and secondary site just require a move of the Galera Arbitrator.

May I know if there's anything wrong with my idea, or if there will be any practical difficulties? Appreciate if you can share your thoughts with me.

asked Oct 6 '17 at 15:51

CLDev

1062

When configuring quorum weights for primary and secondary sites, use
the following pattern:
Primary Site:

  node1: pc.weight = 2

  node2: pc.weight = 2



Secondary Site:

  node3: pc.weight = 1

  node4: pc.weight = 1
Under this pattern, some nodes are located at the primary site while
others are at the secondary site. In the event that the secondary site
goes down or if network connectivity is lost between the sites, the
nodes at the primary site remain the Primary Component. Additionally,
either node1 or node2 can crash without the rest of the nodes becoming
non-primary components.

But there seems to be two drawbacks:

If there are two failed nodes and one of them happened to be on the primary site, the quorum will be <= 50% and the remaining two nodes will become non-primary components.

Despite pc.weight is a dynamic option that can be changed while the server is running, flipping between primary site and secondary site requires modification on all nodes, which is a bit troublesome.

So I've come up with another idea in mind - leave the weight as 1 for all nodes, and in the primary site add a Galera Arbitrator. In this case:

The primary site will remain the Primary Component on network issue,
just like the original setup.

The cluster still functions even if two nodes failed.

Flipping between primary and secondary site just require a move of the Galera Arbitrator.

May I know if there's anything wrong with my idea, or if there will be any practical difficulties? Appreciate if you can share your thoughts with me.

mysql mariadb high-availability galera multi-master

asked Oct 6 '17 at 15:51

CLDev

1062

asked Oct 6 '17 at 15:51

CLDev

1062

asked Oct 6 '17 at 15:51

CLDev

1062

asked Oct 6 '17 at 15:51

CLDev

1062

asked Oct 6 '17 at 15:51

CLDev

1062

bumped to the homepage by Community♦ 23 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ 23 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

1 Answer
1

active

oldest

votes

"Weighting" was added late in the game, when they realized that a 2-datacenter setup was too vulnerable. (3 datacenters is resilient, and can use garbd in one of them.) The example you quote is resilient to any single server, datacenter, or network outage.

As I read the last sentence of the quote, node1 or node2 died but the other three nodes are alive and talking to each other. That is, there is a Quorum, and the system is still reliable.

However, I agree that the sentence is ambiguous -- It can be read that after the network died, node1 or node 2 died. This leaves three clumps: (node1), (node2), (node3,node4), each with a weight of 2. None should be considered "primary" because none has Quorum.

You bring up garbd, yet it is not in the example?? And where would you put it?

You should not be changing the configuration while the system is hobbled -- you should be fixing the broken components.

The main goal of is to allow a single point of failure -- a single node, the network, a data center. It would take a really large and complex system to survive two failures. For example, I think it would require 5 datacenters to survive 2 network failures.

So, focus on a single point of failure.

answered Oct 8 '17 at 16:25

Rick James

41.2k22258

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f187875%2fgalera-cluster-setup-primary-and-secondary-site-scenario%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

As I read the last sentence of the quote, node1 or node2 died but the other three nodes are alive and talking to each other. That is, there is a Quorum, and the system is still reliable.

You bring up garbd, yet it is not in the example?? And where would you put it?

You should not be changing the configuration while the system is hobbled -- you should be fixing the broken components.

So, focus on a single point of failure.

answered Oct 8 '17 at 16:25

Rick James

41.2k22258

add a comment |

As I read the last sentence of the quote, node1 or node2 died but the other three nodes are alive and talking to each other. That is, there is a Quorum, and the system is still reliable.

You bring up garbd, yet it is not in the example?? And where would you put it?

You should not be changing the configuration while the system is hobbled -- you should be fixing the broken components.

So, focus on a single point of failure.

answered Oct 8 '17 at 16:25

Rick James

41.2k22258

add a comment |

As I read the last sentence of the quote, node1 or node2 died but the other three nodes are alive and talking to each other. That is, there is a Quorum, and the system is still reliable.

You bring up garbd, yet it is not in the example?? And where would you put it?

You should not be changing the configuration while the system is hobbled -- you should be fixing the broken components.

So, focus on a single point of failure.

answered Oct 8 '17 at 16:25

Rick James

41.2k22258

As I read the last sentence of the quote, node1 or node2 died but the other three nodes are alive and talking to each other. That is, there is a Quorum, and the system is still reliable.

You bring up garbd, yet it is not in the example?? And where would you put it?

You should not be changing the configuration while the system is hobbled -- you should be fixing the broken components.

So, focus on a single point of failure.

answered Oct 8 '17 at 16:25

Rick James

41.2k22258

answered Oct 8 '17 at 16:25

Rick James

41.2k22258

answered Oct 8 '17 at 16:25

Rick James

41.2k22258

answered Oct 8 '17 at 16:25

Rick James

41.2k22258

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vryjdfkk