Heartbeat configuration

From Kolmisoft Wiki
Revision as of 07:20, 25 November 2014 by Nerijuss (talk | contribs)
Jump to navigationJump to search

What is High availability?

High availability is a system design protocol and associated implementation that ensures a certain absolute degree of operational continuity during a given measurement period.

Lets say you have two servers 'A' and 'B' with MOR installed, MySQL is running replication between them. By default all trafic comming to server 'A' is monitored by server 'B', so when server 'A' fails, server 'B' stands in its position by given time.

So allmost no data is lost, and your users will be happy with your services.

Hearbeat example.jpg


What is Heartbeat?

Heartbeat is software which implements these monitoring and availability features for your servers. It must be carrefully installed, configured and tested on both servers to ensure correct producing of services.

What do you need to know before starting

Please review once more the provided above scheme. It represents a typical Linux Heartbeat configuration. Before starting you have to be aware of 4 main points:

1. All public IP addresses have to be on same subnet.
2. Virtual IP has to be free (not assigned to any device on the network) and the last octet of the address (in this case .4 is the last octet) has to be the highest in the configuration. Server A and B have to be on "lower" IP, in this case .2 and .3 (the last octet).
3. Never use servers when they are off-line as it will ruin the MySQL replication.
4. After heartbeat configuration has been made, do not change the addressing scheme or the host-names of the servers.


Installation

Hearbeat 2.99 has been full tested on Centos 5.2 only, there is no guarantee that it will work on older versions or distros.

Download mor install scripts from svn. in both servers.

Run /usr/src/mor/sh_scripts/install_heartbeat.sh in both servers

Script will:

* download special file, then yum automaticaly install correct heartbeat files for your system.
* install pacemaker, its future only.
* configure /etc/ha.d/authkeys so you don't need to change anything here
* configure /etc/ha.d/ha.cf file, but you still need adjust it by hand (how? later on this page).
* configure /etc/ha.d/haresources file, but you still need to change few bits there, also, later on this page.
* add 2 lines to /etc/hosts, but just for testing purporses only, so you will have to change ips here.

ATTENTION: if you installed Heartbeat manually, please make sure that you have same version of Heartbeat on both servers. You will probably need same Centos version on both servers to do that.



Configuration


Before going further, you need to setup hostname of both servers. Make sure master will have node01 and slave node02. (uname -n) must return correct words.

Make sure to remove all services which are related to Heartbeat from the runlevels:

chkconfig --list

If you want to remove some service (for example asterisk):

chkconfig --del asterisk

Edit /etc/sysconfig/network to change your hostname. Then reboot your machine.


First configure master (node01):

Open /etc/hosts and you will see something like this:

203.0.113.2 node01 #change to correct IP
203.0.113.3 node02 #change to correct IP here aswell

Change ip of node01 of master machine, which "accept all incoming traffic by default".

And change ip to slave, witch will "accept all data, if master will die"


Open /etc/ha.d/ha.cf and change:


Deadtime higher or lower setting. Deadtime means how many seconds have to pass before take over job from master.

Remember, this should be lower on on "lightly" loaded machines, and higher and "highly" loaded machines.

Deadtime 10 is more than enough. (Default is 5)

Make sure you add 2 network interfaces for heartbeat broadcasts.


The ucast directive configures Heartbeat to communicate over a UDP unicast communications link.

For example 'ucast eth0 10.10.10.133'.

If you like to use more than one interface for communication between nodes, you can add multiple lines with ucast directive.

The udpport directive is used to configure which port is used for these unicast communications if the udpport directive is specified before the ucast directive, otherwise the default port will be used.


Now open /etc/ha.d/haresources (be sure to leave only the lines given below):

node01 203.0.113.4 asterisk # just for testing, remember this ip can't be used in your network!!!

Or

node01 IPaddr::192.168.0.142/24/eth0:0 httpd # just for testing, remember this ip can't be used in your network!!! # if eth0:0 does not appear

So, first of all:

Assuming node01 is master and node02 is slave, by default all traffic is going to node01.

You need to choose IP (for Virtual IP) from your network and never use it, otherwise this will lead to unexpected results.

node01 203.0.113.4 asterisk, "if master is dead, slave (node02) will restart asterisk and start to accept traffic comming from 203.0.113.4"


Now copy all configuration to slave (node02).

shell@node01:/$ scp -r /etc/ha.d/ root@node02:/etc/

VERY IMPORTANT After copying configs to another node, open /etc/ha.d/ha.cf on both nodes and change IP in ucast directive.

node01 should have node02 IP there; node02 should have node01 IP there.


Start heartbeat on both servers by running /etc/init.d/heartbeat start. If everything will be ok, you will see something like this:

Starting High-Availability services:

2008/12/18_18:13:22 INFO:  Resource is stopped


                                                          [  OK  ]

Do not forget to remove Asterisk from system startup, because it needs to be started by Heartbeat:

chkconfig asterisk off

on both servers.

Testing

Now run iptraf on both machines, from another machine start pinging your binded IP address (in this example we speak about 203.0.113.4).

Check for masters iptraf window, you will see incoming ICMP data, slave have to be quiet.

Heartbeat1.png


Now kill master (for example: ifconfig eth0 down or issue a reboot), after short period of time (on this example 5 seconds) on slave you will see incoming ICMP packets.

To bring master to work again, first of all you have to configure network on master and start heartbeat. After 20~ seconds master have to start doing his job again.


Now go to GUI and change your default asterisk server IP (Settings -> Servers), change from 127.0.0.1 to virtual IP address. If your GUI is unable connect to them, make sure both asterisk servers allow connections from your GUI server. (file: /etc/asterisk/manager.conf)

sip.conf

In file /etc/asterisk/sip.conf enter correct IP for value bindaddr, correct IP = IP to which your devices are registering. E.g. Virtual IP.

Do same changes for bindaddr in iax.conf, h323.conf and manager.conf

Restart Asterisk

Make this on both servers.


mor.conf

On slave server go to /etc/asterisk/mor.conf and set server_id = 2 (default is 1)

Possible problems

Look at the picture:

Heartbeat broadcast.jpg

In this example HeartBeat in ucast directive has only one interface configured on both servers: eth1

This can lead to problems if the link eth1 <----> eth1 between the servers fails (broken cable, switch, NIC, etc)

You must specify at least 2 interfaces in ucast directive

The ucast directive is used to configure which interfaces Heartbeat sends UDP broadcast traffic on.

An example of ucast directive:

ucast eth0 eth1



Virtual interface does not appear

Check /var/log/heartbeat.log

If you can see something like:

...
ResourceManager(default)[6106]: 2012/12/07_11:33:51 info: Running /etc/ha.d/resource.d/IPaddr 123.123.123.123 start
IPaddr(IPaddr_123.123.123.123)[6197]: 2012/12/07_11:33:51 ERROR: /usr/lib64/heartbeat/findif failed [rc=1].
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_123.123.123.123)[6183]: 2012/12/07_11:33:51 ERROR:  Generic error
...

Please add subnet mask and interface to /etc/ha.d/haresources like this:

node01 123.123.123.123/27/em1 asterisk



Virtual interface appears on both nodes

Check if servers can send packets through port UDP 694 to each other (both directions). Firewalls may block that port.

Check if /etc/ha.d/haresources has properly defined "node01" (or relevant hostname) on both servers.



There is more than one HA cluster in the same subnet

We can suspect that there is another HA cluster runing if in /var/log/heartbeat.log we see such output:

Nov 06 13:25:28 node01 heartbeat: [9200]: WARN: Invalid authentication type [3] in message!
Nov 06 13:25:28 node01 heartbeat: [9200]: WARN: string2msg_ll: node [cpc2] failed authentication

So we need to change port in /etc/ha.d/ha.cf like this:

logfile /var/log/heartbeat.log
logfacility local0
keepalive 2
deadtime 5 # depends on load, so be careful!
initdead 20 # start time
udpport 496 # UDPPORT SHOULD COME IN FRONT OF UCAST LINE OTHERWISE PORT WON'T BE CHANGED
ucast  eth0 1.1.1.1
auto_failback on
node node01
node node02

Heartbeat service does not start

There is was bug with resource-agents-3.9.2-21.el6_4.8.x86_64 package.

Make sure that packets "heartbeat" and "resource-agents" are up to date.

Command to update:

yum install heartbeat resource-agents

More details: http://bugs.centos.org/view.php?id=6727

See also

High availability (Heartbeat clustering)

http://www.linux-ha.org/wiki/Haresources