Galera Cluster Setup - Primary and Secondary Site Scenario
I'm very new to Galera Cluster and is exploring a potential setup with reasonable resiliency to node failure and network failure. Looking at the very bottom part of this documentation, the Weighted Quorum for a Primary and Secondary Site Scenario is quite promising. For ease of reading, I've extracted the setup from the document as follows:
When configuring quorum weights for primary and secondary sites, use
the following pattern:
Primary Site:
node1: pc.weight = 2
node2: pc.weight = 2
Secondary Site:
node3: pc.weight = 1
node4: pc.weight = 1
Under this pattern, some nodes are located at the primary site while
others are at the secondary site. In the event that the secondary site
goes down or if network connectivity is lost between the sites, the
nodes at the primary site remain the Primary Component. Additionally,
either node1 or node2 can crash without the rest of the nodes becoming
non-primary components.
But there seems to be two drawbacks:
- If there are two failed nodes and one of them happened to be on the primary site, the quorum will be <= 50% and the remaining two nodes will become non-primary components.
- Despite pc.weight is a dynamic option that can be changed while the server is running, flipping between primary site and secondary site requires modification on all nodes, which is a bit troublesome.
So I've come up with another idea in mind - leave the weight as 1 for all nodes, and in the primary site add a Galera Arbitrator. In this case:
- The primary site will remain the Primary Component on network issue,
just like the original setup. - The cluster still functions even if two nodes failed.
- Flipping between primary and secondary site just require a move of the Galera Arbitrator.
May I know if there's anything wrong with my idea, or if there will be any practical difficulties? Appreciate if you can share your thoughts with me.
mysql mariadb high-availability galera multi-master
bumped to the homepage by Community♦ 23 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
I'm very new to Galera Cluster and is exploring a potential setup with reasonable resiliency to node failure and network failure. Looking at the very bottom part of this documentation, the Weighted Quorum for a Primary and Secondary Site Scenario is quite promising. For ease of reading, I've extracted the setup from the document as follows:
When configuring quorum weights for primary and secondary sites, use
the following pattern:
Primary Site:
node1: pc.weight = 2
node2: pc.weight = 2
Secondary Site:
node3: pc.weight = 1
node4: pc.weight = 1
Under this pattern, some nodes are located at the primary site while
others are at the secondary site. In the event that the secondary site
goes down or if network connectivity is lost between the sites, the
nodes at the primary site remain the Primary Component. Additionally,
either node1 or node2 can crash without the rest of the nodes becoming
non-primary components.
But there seems to be two drawbacks:
- If there are two failed nodes and one of them happened to be on the primary site, the quorum will be <= 50% and the remaining two nodes will become non-primary components.
- Despite pc.weight is a dynamic option that can be changed while the server is running, flipping between primary site and secondary site requires modification on all nodes, which is a bit troublesome.
So I've come up with another idea in mind - leave the weight as 1 for all nodes, and in the primary site add a Galera Arbitrator. In this case:
- The primary site will remain the Primary Component on network issue,
just like the original setup. - The cluster still functions even if two nodes failed.
- Flipping between primary and secondary site just require a move of the Galera Arbitrator.
May I know if there's anything wrong with my idea, or if there will be any practical difficulties? Appreciate if you can share your thoughts with me.
mysql mariadb high-availability galera multi-master
bumped to the homepage by Community♦ 23 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
I'm very new to Galera Cluster and is exploring a potential setup with reasonable resiliency to node failure and network failure. Looking at the very bottom part of this documentation, the Weighted Quorum for a Primary and Secondary Site Scenario is quite promising. For ease of reading, I've extracted the setup from the document as follows:
When configuring quorum weights for primary and secondary sites, use
the following pattern:
Primary Site:
node1: pc.weight = 2
node2: pc.weight = 2
Secondary Site:
node3: pc.weight = 1
node4: pc.weight = 1
Under this pattern, some nodes are located at the primary site while
others are at the secondary site. In the event that the secondary site
goes down or if network connectivity is lost between the sites, the
nodes at the primary site remain the Primary Component. Additionally,
either node1 or node2 can crash without the rest of the nodes becoming
non-primary components.
But there seems to be two drawbacks:
- If there are two failed nodes and one of them happened to be on the primary site, the quorum will be <= 50% and the remaining two nodes will become non-primary components.
- Despite pc.weight is a dynamic option that can be changed while the server is running, flipping between primary site and secondary site requires modification on all nodes, which is a bit troublesome.
So I've come up with another idea in mind - leave the weight as 1 for all nodes, and in the primary site add a Galera Arbitrator. In this case:
- The primary site will remain the Primary Component on network issue,
just like the original setup. - The cluster still functions even if two nodes failed.
- Flipping between primary and secondary site just require a move of the Galera Arbitrator.
May I know if there's anything wrong with my idea, or if there will be any practical difficulties? Appreciate if you can share your thoughts with me.
mysql mariadb high-availability galera multi-master
I'm very new to Galera Cluster and is exploring a potential setup with reasonable resiliency to node failure and network failure. Looking at the very bottom part of this documentation, the Weighted Quorum for a Primary and Secondary Site Scenario is quite promising. For ease of reading, I've extracted the setup from the document as follows:
When configuring quorum weights for primary and secondary sites, use
the following pattern:
Primary Site:
node1: pc.weight = 2
node2: pc.weight = 2
Secondary Site:
node3: pc.weight = 1
node4: pc.weight = 1
Under this pattern, some nodes are located at the primary site while
others are at the secondary site. In the event that the secondary site
goes down or if network connectivity is lost between the sites, the
nodes at the primary site remain the Primary Component. Additionally,
either node1 or node2 can crash without the rest of the nodes becoming
non-primary components.
But there seems to be two drawbacks:
- If there are two failed nodes and one of them happened to be on the primary site, the quorum will be <= 50% and the remaining two nodes will become non-primary components.
- Despite pc.weight is a dynamic option that can be changed while the server is running, flipping between primary site and secondary site requires modification on all nodes, which is a bit troublesome.
So I've come up with another idea in mind - leave the weight as 1 for all nodes, and in the primary site add a Galera Arbitrator. In this case:
- The primary site will remain the Primary Component on network issue,
just like the original setup. - The cluster still functions even if two nodes failed.
- Flipping between primary and secondary site just require a move of the Galera Arbitrator.
May I know if there's anything wrong with my idea, or if there will be any practical difficulties? Appreciate if you can share your thoughts with me.
mysql mariadb high-availability galera multi-master
mysql mariadb high-availability galera multi-master
asked Oct 6 '17 at 15:51
CLDevCLDev
1062
1062
bumped to the homepage by Community♦ 23 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 23 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
"Weighting" was added late in the game, when they realized that a 2-datacenter setup was too vulnerable. (3 datacenters is resilient, and can use garbd in one of them.) The example you quote is resilient to any single server, datacenter, or network outage.
As I read the last sentence of the quote, node1 or node2 died but the other three nodes are alive and talking to each other. That is, there is a Quorum, and the system is still reliable.
However, I agree that the sentence is ambiguous -- It can be read that after the network died, node1 or node 2 died. This leaves three clumps: (node1), (node2), (node3,node4), each with a weight of 2. None should be considered "primary" because none has Quorum.
You bring up garbd, yet it is not in the example?? And where would you put it?
You should not be changing the configuration while the system is hobbled -- you should be fixing the broken components.
The main goal of is to allow a single point of failure -- a single node, the network, a data center. It would take a really large and complex system to survive two failures. For example, I think it would require 5 datacenters to survive 2 network failures.
So, focus on a single point of failure.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f187875%2fgalera-cluster-setup-primary-and-secondary-site-scenario%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
"Weighting" was added late in the game, when they realized that a 2-datacenter setup was too vulnerable. (3 datacenters is resilient, and can use garbd in one of them.) The example you quote is resilient to any single server, datacenter, or network outage.
As I read the last sentence of the quote, node1 or node2 died but the other three nodes are alive and talking to each other. That is, there is a Quorum, and the system is still reliable.
However, I agree that the sentence is ambiguous -- It can be read that after the network died, node1 or node 2 died. This leaves three clumps: (node1), (node2), (node3,node4), each with a weight of 2. None should be considered "primary" because none has Quorum.
You bring up garbd, yet it is not in the example?? And where would you put it?
You should not be changing the configuration while the system is hobbled -- you should be fixing the broken components.
The main goal of is to allow a single point of failure -- a single node, the network, a data center. It would take a really large and complex system to survive two failures. For example, I think it would require 5 datacenters to survive 2 network failures.
So, focus on a single point of failure.
add a comment |
"Weighting" was added late in the game, when they realized that a 2-datacenter setup was too vulnerable. (3 datacenters is resilient, and can use garbd in one of them.) The example you quote is resilient to any single server, datacenter, or network outage.
As I read the last sentence of the quote, node1 or node2 died but the other three nodes are alive and talking to each other. That is, there is a Quorum, and the system is still reliable.
However, I agree that the sentence is ambiguous -- It can be read that after the network died, node1 or node 2 died. This leaves three clumps: (node1), (node2), (node3,node4), each with a weight of 2. None should be considered "primary" because none has Quorum.
You bring up garbd, yet it is not in the example?? And where would you put it?
You should not be changing the configuration while the system is hobbled -- you should be fixing the broken components.
The main goal of is to allow a single point of failure -- a single node, the network, a data center. It would take a really large and complex system to survive two failures. For example, I think it would require 5 datacenters to survive 2 network failures.
So, focus on a single point of failure.
add a comment |
"Weighting" was added late in the game, when they realized that a 2-datacenter setup was too vulnerable. (3 datacenters is resilient, and can use garbd in one of them.) The example you quote is resilient to any single server, datacenter, or network outage.
As I read the last sentence of the quote, node1 or node2 died but the other three nodes are alive and talking to each other. That is, there is a Quorum, and the system is still reliable.
However, I agree that the sentence is ambiguous -- It can be read that after the network died, node1 or node 2 died. This leaves three clumps: (node1), (node2), (node3,node4), each with a weight of 2. None should be considered "primary" because none has Quorum.
You bring up garbd, yet it is not in the example?? And where would you put it?
You should not be changing the configuration while the system is hobbled -- you should be fixing the broken components.
The main goal of is to allow a single point of failure -- a single node, the network, a data center. It would take a really large and complex system to survive two failures. For example, I think it would require 5 datacenters to survive 2 network failures.
So, focus on a single point of failure.
"Weighting" was added late in the game, when they realized that a 2-datacenter setup was too vulnerable. (3 datacenters is resilient, and can use garbd in one of them.) The example you quote is resilient to any single server, datacenter, or network outage.
As I read the last sentence of the quote, node1 or node2 died but the other three nodes are alive and talking to each other. That is, there is a Quorum, and the system is still reliable.
However, I agree that the sentence is ambiguous -- It can be read that after the network died, node1 or node 2 died. This leaves three clumps: (node1), (node2), (node3,node4), each with a weight of 2. None should be considered "primary" because none has Quorum.
You bring up garbd, yet it is not in the example?? And where would you put it?
You should not be changing the configuration while the system is hobbled -- you should be fixing the broken components.
The main goal of is to allow a single point of failure -- a single node, the network, a data center. It would take a really large and complex system to survive two failures. For example, I think it would require 5 datacenters to survive 2 network failures.
So, focus on a single point of failure.
answered Oct 8 '17 at 16:25
Rick JamesRick James
41.2k22258
41.2k22258
add a comment |
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f187875%2fgalera-cluster-setup-primary-and-secondary-site-scenario%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown