Difference between revisions of "High Availability Configuration with Pacemaker and Corosync"
(Created page with 'WIP') |
|||
Line 1: | Line 1: | ||
Work In Progress... | |||
=Description= | |||
<br><br> | |||
=Requirements= | |||
<br><br> | |||
=Install= | |||
All examples assume that there are two nodes whih hostnames '''node01''' and '''node02''' and they are reachable by their hostnames and IP addresses: | |||
* node01 - 192.168.0.152 | |||
* node02 - 192.168.0.200 | |||
192.168.0.205 is Virtual IP address. | |||
Also, in all following command line examples, convention is this: | |||
* '''[root@node01 ~]#''' denotes a command which should be run on 'ONE' server in the cluster. | |||
* '''[root@ALL ~]#''' denotes a command which should be run on 'ALL' servers (node01 and node02) in the cluster. | |||
'''You should replace hostnames and IP addresses to match your setup.''' | |||
<br><br> | |||
=Configuration= | |||
All configuration bellow assumes that Corosync/Pacemaker/pcsd has been installed, hostnames and and /etc/hosts set up correctly, and ports open as explained above. | |||
Installed packages will create a '''hacluster''' user with disabled password. In order to configure cluster, we need to setup password for this user on all nodes. Commands bellow will change password for hacluster user, save this password in /root/hacluster_password in case you will forget it and authenticate nodes. Change CHANGME to strong password, and remember it as you will need it later: | |||
[root@ALL ~]# echo CHANGEME | passwd --stdin hacluster | |||
Changing password for user hacluster. | |||
passwd: all authentication tokens updated successfully. | |||
[root@ALL ~]# echo CHANGEME > /root/hacluster_password; chmod 600 /root/hacluster_password | |||
Now we can authenticate cluster: | |||
[root@node01 ~]# pcs cluster auth node01 node02 -u hacluster -p CHANGEME | |||
node02: Authorized | |||
node01: Authorized | |||
If you get any other output, it means something went wrong and you should not proceed until this is fixed. If everything is OK, then we can setup cluster: | |||
[root@node01 ~]# pcs cluster setup --name cluster_asterisk node01 node02 | |||
Destroying cluster on nodes: node01, node02... | |||
node01: Stopping Cluster (pacemaker)... | |||
node02: Stopping Cluster (pacemaker)... | |||
node02: Successfully destroyed cluster | |||
node01: Successfully destroyed cluster | |||
Sending 'pacemaker_remote authkey' to 'node01', 'node02' | |||
node01: successful distribution of the file 'pacemaker_remote authkey' | |||
node02: successful distribution of the file 'pacemaker_remote authkey' | |||
Sending cluster config files to the nodes... | |||
node01: Succeeded | |||
node02: Succeeded | |||
Synchronizing pcsd certificates on nodes node01, node02... | |||
node02: Success | |||
node01: Success | |||
Restarting pcsd on the nodes in order to reload the certificates... | |||
node02: Success | |||
node01: Success | |||
[root@node01 ~]# | |||
If everything went OK, there should be no errors in output. If this is the case, let's start cluster: | |||
[root@node01 ~]# pcs cluster start --all | |||
node02: Starting Cluster... | |||
node01: Starting Cluster... | |||
[root@node01 ~]# | |||
This will automatically start Corosync and Pacemaker services on both nodes. Now let's check if Corosync is happy and there are no errors (issue this command on both nodes separately): | |||
[root@node01 ~]# corosync-cfgtool -s | |||
Printing ring status. | |||
Local node ID 1 | |||
RING ID 0 | |||
id = 192.168.0.152 | |||
status = ring 0 active with no faults | |||
[root@node02 ~]# corosync-cfgtool -s | |||
Printing ring status. | |||
Local node ID 2 | |||
RING ID 0 | |||
id = 192.168.0.200 | |||
status = ring 0 active with no faults | |||
If you see different output, you should investigate before proceeding. Now let's check membership and quorum APIs, you should see both nodes with status Joined | |||
[root@node01 ~]# corosync-cmapctl | grep members | |||
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0 | |||
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.0.152) | |||
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1 | |||
runtime.totem.pg.mrp.srp.members.1.status (str) = joined | |||
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0 | |||
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.0.200) | |||
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1 | |||
runtime.totem.pg.mrp.srp.members.2.status (str) = joined | |||
Now disable STONITH and Ignore the Quorum Policy: | |||
[root@node01 ~]# pcs property set stonith-enabled=false | |||
[root@node01 ~]# pcs property set no-quorum-policy=ignore | |||
[root@node01 ~]# pcs property list | |||
Cluster Properties: | |||
cluster-infrastructure: corosync | |||
cluster-name: cluster_asterisk | |||
dc-version: 1.1.18-11.el7_5.3-2b07d5c5a9 | |||
have-watchdog: false | |||
no-quorum-policy: ignore | |||
stonith-enabled: false | |||
Finally, let check cluster status: | |||
[root@node01 ~]# pcs status | |||
Cluster name: cluster_asterisk | |||
Stack: corosync | |||
Current DC: node02 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum | |||
Last updated: Wed Oct 17 07:34:20 2018 | |||
Last change: Wed Oct 17 07:32:39 2018 by root via cibadmin on node01 | |||
2 nodes configured | |||
0 resources configured | |||
Online: [ node01 node02 ] | |||
No resources | |||
Daemon Status: | |||
corosync: active/enabled | |||
pacemaker: active/enabled | |||
pcsd: active/enabled | |||
We can see that both nodes are online, all daemons (corosync, pacemaker, pcsd) are active (started) and enabled. | |||
<br><br> | |||
==Configuring Asterisk HA solution with Virtual IP== | |||
Now when cluster is ready, we can add resources (Virtual IP, Asterisk, httpd, opensips, etc). In this section we will show how to add Virtual IP and Asterisk resources. | |||
Firstly, let's add Virtual IP resource. Do not forget replace ip values and nic name with values from your setup. | |||
[root@node01 ~]# pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.0.205 cidr_netmask=32 nic=enp0s3 op monitor interval=30s | |||
[root@node01 ~]# pcs status | |||
Cluster name: cluster_asterisk | |||
Stack: corosync | |||
Current DC: node02 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum | |||
Last updated: Wed Oct 17 07:49:56 2018 | |||
Last change: Wed Oct 17 07:49:51 2018 by root via cibadmin on node01 | |||
2 nodes configured | |||
1 resource configured | |||
Online: [ node01 node02 ] | |||
Full list of resources: | |||
VirtualIP (ocf::heartbeat:IPaddr2): Started node01 | |||
Daemon Status: | |||
corosync: active/enabled | |||
pacemaker: active/enabled | |||
pcsd: active/enabled | |||
'''ip''' command should also confirm that Virtual IP has been assigned to interface: | |||
[root@node01 ~]# ip addr show | |||
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 | |||
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 | |||
inet 127.0.0.1/8 scope host lo | |||
valid_lft forever preferred_lft forever | |||
inet6 ::1/128 scope host | |||
valid_lft forever preferred_lft forever | |||
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 | |||
link/ether 08:00:27:90:37:9c brd ff:ff:ff:ff:ff:ff | |||
inet 192.168.0.152/24 brd 192.168.0.255 scope global noprefixroute dynamic enp0s3 | |||
valid_lft 564sec preferred_lft 564sec | |||
inet 192.168.0.205/32 brd 192.168.0.255 scope global enp0s3 | |||
valid_lft forever preferred_lft forever | |||
inet6 fe80::eb74:dc5d:cdd:df23/64 scope link noprefixroute | |||
valid_lft forever preferred_lft forever | |||
Once Virtual IP resource is setup correctly, it is time to add Asterisk resource. Install script will add Asterisk resource in directory /usr/lib/ocf/resource.d/heartbeat | |||
[root@node01 ~]# pcs resource create asterisk ocf:heartbeat:asterisk op monitor timeout="30" |
Revision as of 12:48, 17 October 2018
Work In Progress...
Description
Requirements
Install
All examples assume that there are two nodes whih hostnames node01 and node02 and they are reachable by their hostnames and IP addresses:
- node01 - 192.168.0.152
- node02 - 192.168.0.200
192.168.0.205 is Virtual IP address.
Also, in all following command line examples, convention is this:
- [root@node01 ~]# denotes a command which should be run on 'ONE' server in the cluster.
- [root@ALL ~]# denotes a command which should be run on 'ALL' servers (node01 and node02) in the cluster.
You should replace hostnames and IP addresses to match your setup.
Configuration
All configuration bellow assumes that Corosync/Pacemaker/pcsd has been installed, hostnames and and /etc/hosts set up correctly, and ports open as explained above. Installed packages will create a hacluster user with disabled password. In order to configure cluster, we need to setup password for this user on all nodes. Commands bellow will change password for hacluster user, save this password in /root/hacluster_password in case you will forget it and authenticate nodes. Change CHANGME to strong password, and remember it as you will need it later:
[root@ALL ~]# echo CHANGEME | passwd --stdin hacluster Changing password for user hacluster. passwd: all authentication tokens updated successfully. [root@ALL ~]# echo CHANGEME > /root/hacluster_password; chmod 600 /root/hacluster_password
Now we can authenticate cluster:
[root@node01 ~]# pcs cluster auth node01 node02 -u hacluster -p CHANGEME node02: Authorized node01: Authorized
If you get any other output, it means something went wrong and you should not proceed until this is fixed. If everything is OK, then we can setup cluster:
[root@node01 ~]# pcs cluster setup --name cluster_asterisk node01 node02 Destroying cluster on nodes: node01, node02... node01: Stopping Cluster (pacemaker)... node02: Stopping Cluster (pacemaker)... node02: Successfully destroyed cluster node01: Successfully destroyed cluster Sending 'pacemaker_remote authkey' to 'node01', 'node02' node01: successful distribution of the file 'pacemaker_remote authkey' node02: successful distribution of the file 'pacemaker_remote authkey' Sending cluster config files to the nodes... node01: Succeeded node02: Succeeded Synchronizing pcsd certificates on nodes node01, node02... node02: Success node01: Success Restarting pcsd on the nodes in order to reload the certificates... node02: Success node01: Success [root@node01 ~]#
If everything went OK, there should be no errors in output. If this is the case, let's start cluster:
[root@node01 ~]# pcs cluster start --all node02: Starting Cluster... node01: Starting Cluster... [root@node01 ~]#
This will automatically start Corosync and Pacemaker services on both nodes. Now let's check if Corosync is happy and there are no errors (issue this command on both nodes separately):
[root@node01 ~]# corosync-cfgtool -s Printing ring status. Local node ID 1 RING ID 0 id = 192.168.0.152 status = ring 0 active with no faults
[root@node02 ~]# corosync-cfgtool -s Printing ring status. Local node ID 2 RING ID 0 id = 192.168.0.200 status = ring 0 active with no faults
If you see different output, you should investigate before proceeding. Now let's check membership and quorum APIs, you should see both nodes with status Joined
[root@node01 ~]# corosync-cmapctl | grep members runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.0.152) runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.1.status (str) = joined runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.0.200) runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.2.status (str) = joined
Now disable STONITH and Ignore the Quorum Policy:
[root@node01 ~]# pcs property set stonith-enabled=false [root@node01 ~]# pcs property set no-quorum-policy=ignore [root@node01 ~]# pcs property list Cluster Properties: cluster-infrastructure: corosync cluster-name: cluster_asterisk dc-version: 1.1.18-11.el7_5.3-2b07d5c5a9 have-watchdog: false no-quorum-policy: ignore stonith-enabled: false
Finally, let check cluster status:
[root@node01 ~]# pcs status Cluster name: cluster_asterisk Stack: corosync Current DC: node02 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum Last updated: Wed Oct 17 07:34:20 2018 Last change: Wed Oct 17 07:32:39 2018 by root via cibadmin on node01 2 nodes configured 0 resources configured Online: [ node01 node02 ] No resources Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
We can see that both nodes are online, all daemons (corosync, pacemaker, pcsd) are active (started) and enabled.
Configuring Asterisk HA solution with Virtual IP
Now when cluster is ready, we can add resources (Virtual IP, Asterisk, httpd, opensips, etc). In this section we will show how to add Virtual IP and Asterisk resources.
Firstly, let's add Virtual IP resource. Do not forget replace ip values and nic name with values from your setup.
[root@node01 ~]# pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.0.205 cidr_netmask=32 nic=enp0s3 op monitor interval=30s [root@node01 ~]# pcs status Cluster name: cluster_asterisk Stack: corosync Current DC: node02 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum Last updated: Wed Oct 17 07:49:56 2018 Last change: Wed Oct 17 07:49:51 2018 by root via cibadmin on node01 2 nodes configured 1 resource configured Online: [ node01 node02 ] Full list of resources: VirtualIP (ocf::heartbeat:IPaddr2): Started node01 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
ip command should also confirm that Virtual IP has been assigned to interface:
[root@node01 ~]# ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 08:00:27:90:37:9c brd ff:ff:ff:ff:ff:ff inet 192.168.0.152/24 brd 192.168.0.255 scope global noprefixroute dynamic enp0s3 valid_lft 564sec preferred_lft 564sec inet 192.168.0.205/32 brd 192.168.0.255 scope global enp0s3 valid_lft forever preferred_lft forever inet6 fe80::eb74:dc5d:cdd:df23/64 scope link noprefixroute valid_lft forever preferred_lft forever
Once Virtual IP resource is setup correctly, it is time to add Asterisk resource. Install script will add Asterisk resource in directory /usr/lib/ocf/resource.d/heartbeat
[root@node01 ~]# pcs resource create asterisk ocf:heartbeat:asterisk op monitor timeout="30"