Difference between revisions of "High Availability Configuration with Pacemaker and Corosync"

From Kolmisoft Wiki
Jump to navigationJump to search
(Created page with 'WIP')
 
Line 1: Line 1:
WIP
 
Work In Progress...
=Description=
<br><br>
=Requirements=
<br><br>
=Install=
All examples assume that there are two nodes whih hostnames '''node01''' and '''node02''' and they are reachable by their hostnames and IP addresses:
* node01 - 192.168.0.152
* node02 - 192.168.0.200
 
192.168.0.205 is Virtual IP address.
 
Also, in all following command line examples, convention is this:
* '''[root@node01 ~]#'''  denotes a command which should be run on 'ONE' server in the cluster.
* '''[root@ALL ~]#''' denotes a command which should be run on 'ALL' servers (node01 and node02) in the cluster.
 
'''You should replace hostnames and IP addresses to match your setup.'''
<br><br>
=Configuration=
All configuration bellow assumes that Corosync/Pacemaker/pcsd has been installed, hostnames and and /etc/hosts set up correctly, and ports open as explained above.
Installed packages will create a '''hacluster''' user with disabled password. In order to configure cluster, we need to setup password for this user on all nodes. Commands bellow will change password for hacluster user, save this password in /root/hacluster_password in case you will forget it and  authenticate nodes.  Change CHANGME to strong password, and remember it as you will need it later:
[root@ALL ~]# echo CHANGEME | passwd --stdin hacluster
Changing password for user hacluster.
passwd: all authentication tokens updated successfully.
[root@ALL ~]# echo CHANGEME > /root/hacluster_password; chmod 600 /root/hacluster_password
 
Now we can authenticate cluster:
[root@node01 ~]#  pcs cluster auth node01 node02 -u hacluster -p CHANGEME
node02: Authorized
node01: Authorized
 
If you get any other output, it means something went wrong and you should not proceed until this is fixed. If everything is OK, then we can setup cluster:
[root@node01 ~]# pcs cluster setup --name cluster_asterisk node01 node02
Destroying cluster on nodes: node01, node02...
node01: Stopping Cluster (pacemaker)...
node02: Stopping Cluster (pacemaker)...
node02: Successfully destroyed cluster
node01: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'node01', 'node02'
node01: successful distribution of the file 'pacemaker_remote authkey'
node02: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node01: Succeeded
node02: Succeeded
Synchronizing pcsd certificates on nodes node01, node02...
node02: Success
node01: Success
Restarting pcsd on the nodes in order to reload the certificates...
node02: Success
node01: Success
[root@node01 ~]#
 
If everything went OK, there should be no errors in output. If this is the case, let's start cluster:
[root@node01 ~]# pcs cluster start --all
node02: Starting Cluster...
node01: Starting Cluster...
[root@node01 ~]#
 
This will automatically start Corosync and Pacemaker services on both nodes. Now let's check if Corosync is happy and there are no errors (issue this command on both nodes separately):
 
[root@node01 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
        id = 192.168.0.152
        status = ring 0 active with no faults
 
[root@node02 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
      id = 192.168.0.200
      status = ring 0 active with no faults
 
If you see different output, you should investigate before proceeding. Now let's check membership and quorum APIs, you should see both nodes with status Joined
 
[root@node01 ~]# corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.0.152)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.0.200)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
 
 
Now disable STONITH and Ignore the Quorum Policy:
[root@node01 ~]# pcs property set stonith-enabled=false
[root@node01 ~]# pcs property set no-quorum-policy=ignore
[root@node01 ~]# pcs property list
Cluster Properties:
  cluster-infrastructure: corosync
  cluster-name: cluster_asterisk
  dc-version: 1.1.18-11.el7_5.3-2b07d5c5a9
  have-watchdog: false
  no-quorum-policy: ignore
  stonith-enabled: false
 
Finally, let check cluster status:
[root@node01 ~]# pcs status
Cluster name: cluster_asterisk
Stack: corosync
Current DC: node02 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Wed Oct 17 07:34:20 2018
Last change: Wed Oct 17 07:32:39 2018 by root via cibadmin on node01
2 nodes configured
0 resources configured
Online: [ node01 node02 ]
No resources
Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
 
 
We can see that both nodes are online, all daemons (corosync, pacemaker, pcsd) are active (started) and enabled.
<br><br>
==Configuring Asterisk HA solution with Virtual IP==
Now when cluster is ready, we can add resources (Virtual IP, Asterisk, httpd, opensips, etc). In this section we will show how to add Virtual IP and Asterisk resources.
 
Firstly, let's add Virtual IP resource. Do not forget replace ip values and nic name with values from your setup.
 
[root@node01 ~]# pcs resource create VirtualIP  ocf:heartbeat:IPaddr2 ip=192.168.0.205 cidr_netmask=32 nic=enp0s3 op monitor interval=30s
[root@node01 ~]# pcs status
Cluster name: cluster_asterisk
Stack: corosync
Current DC: node02 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Wed Oct 17 07:49:56 2018
Last change: Wed Oct 17 07:49:51 2018 by root via cibadmin on node01
2 nodes configured
1 resource configured
Online: [ node01 node02 ]
Full list of resources:
  VirtualIP (ocf::heartbeat:IPaddr2): Started node01
Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
 
 
'''ip''' command should also confirm that Virtual IP has been assigned to interface:
[root@node01 ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
        valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
        valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:90:37:9c brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.152/24 brd 192.168.0.255 scope global noprefixroute dynamic enp0s3
        valid_lft 564sec preferred_lft 564sec
    inet 192.168.0.205/32 brd 192.168.0.255 scope global enp0s3
        valid_lft forever preferred_lft forever
    inet6 fe80::eb74:dc5d:cdd:df23/64 scope link noprefixroute
        valid_lft forever preferred_lft forever
 
Once Virtual IP resource is setup correctly, it is time to add Asterisk resource. Install script will add Asterisk resource in directory /usr/lib/ocf/resource.d/heartbeat
 
[root@node01 ~]# pcs resource create asterisk ocf:heartbeat:asterisk op monitor timeout="30"

Revision as of 12:48, 17 October 2018

Work In Progress...

Description



Requirements



Install

All examples assume that there are two nodes whih hostnames node01 and node02 and they are reachable by their hostnames and IP addresses:

  • node01 - 192.168.0.152
  • node02 - 192.168.0.200

192.168.0.205 is Virtual IP address.

Also, in all following command line examples, convention is this:

  • [root@node01 ~]# denotes a command which should be run on 'ONE' server in the cluster.
  • [root@ALL ~]# denotes a command which should be run on 'ALL' servers (node01 and node02) in the cluster.

You should replace hostnames and IP addresses to match your setup.

Configuration

All configuration bellow assumes that Corosync/Pacemaker/pcsd has been installed, hostnames and and /etc/hosts set up correctly, and ports open as explained above. Installed packages will create a hacluster user with disabled password. In order to configure cluster, we need to setup password for this user on all nodes. Commands bellow will change password for hacluster user, save this password in /root/hacluster_password in case you will forget it and authenticate nodes. Change CHANGME to strong password, and remember it as you will need it later:

[root@ALL ~]# echo CHANGEME | passwd --stdin hacluster
Changing password for user hacluster.
passwd: all authentication tokens updated successfully.
[root@ALL ~]# echo CHANGEME > /root/hacluster_password; chmod 600 /root/hacluster_password

Now we can authenticate cluster:

[root@node01 ~]#  pcs cluster auth node01 node02 -u hacluster -p CHANGEME
node02: Authorized
node01: Authorized

If you get any other output, it means something went wrong and you should not proceed until this is fixed. If everything is OK, then we can setup cluster:

[root@node01 ~]# pcs cluster setup --name cluster_asterisk node01 node02
Destroying cluster on nodes: node01, node02...
node01: Stopping Cluster (pacemaker)...
node02: Stopping Cluster (pacemaker)...
node02: Successfully destroyed cluster
node01: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'node01', 'node02'
node01: successful distribution of the file 'pacemaker_remote authkey'
node02: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node01: Succeeded
node02: Succeeded
Synchronizing pcsd certificates on nodes node01, node02...
node02: Success
node01: Success
Restarting pcsd on the nodes in order to reload the certificates...
node02: Success
node01: Success
[root@node01 ~]# 

If everything went OK, there should be no errors in output. If this is the case, let's start cluster:

[root@node01 ~]# pcs cluster start --all
node02: Starting Cluster...
node01: Starting Cluster...
[root@node01 ~]# 

This will automatically start Corosync and Pacemaker services on both nodes. Now let's check if Corosync is happy and there are no errors (issue this command on both nodes separately):

[root@node01 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
       id	= 192.168.0.152
       status	= ring 0 active with no faults
[root@node02 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
     id	= 192.168.0.200
     status	= ring 0 active with no faults

If you see different output, you should investigate before proceeding. Now let's check membership and quorum APIs, you should see both nodes with status Joined

[root@node01 ~]# corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.0.152) 
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.0.200) 
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined


Now disable STONITH and Ignore the Quorum Policy:

[root@node01 ~]# pcs property set stonith-enabled=false
[root@node01 ~]# pcs property set no-quorum-policy=ignore
[root@node01 ~]# pcs property list
Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: cluster_asterisk
 dc-version: 1.1.18-11.el7_5.3-2b07d5c5a9
 have-watchdog: false
 no-quorum-policy: ignore
 stonith-enabled: false

Finally, let check cluster status:

[root@node01 ~]# pcs status
Cluster name: cluster_asterisk
Stack: corosync
Current DC: node02 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Wed Oct 17 07:34:20 2018
Last change: Wed Oct 17 07:32:39 2018 by root via cibadmin on node01
2 nodes configured
0 resources configured
Online: [ node01 node02 ]
No resources
Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


We can see that both nodes are online, all daemons (corosync, pacemaker, pcsd) are active (started) and enabled.

Configuring Asterisk HA solution with Virtual IP

Now when cluster is ready, we can add resources (Virtual IP, Asterisk, httpd, opensips, etc). In this section we will show how to add Virtual IP and Asterisk resources.

Firstly, let's add Virtual IP resource. Do not forget replace ip values and nic name with values from your setup.

[root@node01 ~]# pcs resource create VirtualIP  ocf:heartbeat:IPaddr2 ip=192.168.0.205 cidr_netmask=32 nic=enp0s3 op monitor interval=30s
[root@node01 ~]# pcs status
Cluster name: cluster_asterisk
Stack: corosync
Current DC: node02 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Wed Oct 17 07:49:56 2018
Last change: Wed Oct 17 07:49:51 2018 by root via cibadmin on node01
2 nodes configured
1 resource configured
Online: [ node01 node02 ]
Full list of resources:
 VirtualIP	(ocf::heartbeat:IPaddr2):	Started node01
Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


ip command should also confirm that Virtual IP has been assigned to interface:

[root@node01 ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:90:37:9c brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.152/24 brd 192.168.0.255 scope global noprefixroute dynamic enp0s3
       valid_lft 564sec preferred_lft 564sec
    inet 192.168.0.205/32 brd 192.168.0.255 scope global enp0s3
       valid_lft forever preferred_lft forever
    inet6 fe80::eb74:dc5d:cdd:df23/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

Once Virtual IP resource is setup correctly, it is time to add Asterisk resource. Install script will add Asterisk resource in directory /usr/lib/ocf/resource.d/heartbeat

[root@node01 ~]# pcs resource create asterisk ocf:heartbeat:asterisk op monitor timeout="30"