Now you have to decide on some of your implementation. Your"cluster" is established via a "heartbeat" between the two computers(nodes) generated by the software package of the same name. However, this heartbeat needs one or more media paths (serial via a nullmodem cable, ethernet via a crossover cable, etc.) between the nodes.
At this point, you're actually ready to begin hardware-wise. Of course, since you're looking into HA, you'll mostly likely want toavoid having only one point of failure. In this case, that wouldbe your null modem cable/serial port or network interfacecard(NIC)/crossover cable. So, you need to decide whether you wishto add a second serial/null modem connection or a second networkinterface card (NIC)/crossover connnection to each node. SeeAppendix A for instructions on how to build a Cat-5 crossovercable. My heartbeat path setup uses one serial port and one extraNIC because I only had one null modem cable, had an extra of NIC on handand thought it was good to have two medium types for the heartbeats.
Once your hardware is in order, you must install your OS andconfigure your networking (I used Red Hat). Assuming you have 2NICs, one should be configured for your "normal" network and the otheras a private network between your clustered nodes (via the crossovercable). For an example, we will assume that our cluster will havethe following addresses:
Node 1 (linuxha1): 192.168.85.1 (normal 192x net)
10.0.0.1 (private 10x net for heartbeat)
Node 2 (linuxha2): 192.168.85.2 (192x)
10.0.0.2 (10x)
Note: None of these addresses should beyour "cluster address" - the address handled by heartbeat and failedover between nodes!
Most *nix distributions this easy during installation, however, ifyou are having any problems, refer to either the Ethernet HOWTO, or thedocumentation for your distribution. To checkyour configuration, type:
ifconfig
This will show your network interfaces and theirconfiguration. You can obtain your network routing informationfrom "netstat -nr".
If it looks good, make sure you can ping between both nodes on allinterfaces.
Next, if you're using one, you'll need to test your serialconnection. On one node, which will be the receiver, type:
cat</dev/ttyS0
On the other node, type,:
echohello >/dev/ttyS0
You should see the text on the receiver node. If it works,change their roles and try again. If it doesn't, it may be assimple as having the wrong device file. Volker's HA Hardware Guideand the Serial HOWTO are two good resources for troubleshooting yourserial connection.
There are binary RPMs at the website, or you can build heartbeatfrom source. Grab the tarball (or install the source RPM). Untar it into your favorite source directory. From thetop of the source tree, type "./ConfigureMeconfigure", followed by "make" and "make install". If you have problems installing the RPMs found at the website and want away to make your own, there may be help in the FAQ.
For our example, we'll assume the high availability services areApache and Samba. The IP for the cluster is mandatory, and don'tconfigure the cluster IP outside of the haresources file!. The haresources will need one line:
linuxha1.linux-ha.org 192.168.85.3 httpd smbSo, this line dictates that on startup, have linuxha1 serve the IP192.168.85.3 and start apache and samba as well.
Note: httpd and smb are the name of startup scriptsfor Apache and Samba, respectively. Heartbeat will look forstartup scripts of the same name in the following paths:
/etc/ha.d/resource.d
/etc/rc.d/init.d
These scripts must start services via "scriptname start" andstop them via "scriptname stop".
So you can use any services as long as they conform to the abovestandard.
Should you need to pass arguments to a custom script, the formatwould be:
scriptname::argumentSo, if we added a service "maid" which needed the argument "vacuum",our haresources line would modify to the following:
linuxha1 192.168.85.3 httpd smb maid::vacuum
This brings us to some added flexibility withthe service IP address. We are actually using a shorthand notationabove. The actual line could have read (we've canned the maid):
linuxha1 IPaddr::192.168.85.3 httpd smbWhere IPaddr is the name of our service script, takingthe argument 192.168.85.3. Sure enough, if you look in thedirectory /etc/ha.d/resource.d, you will find a script calledIPaddr. This script will also allow you to manipulate the netmask,broadcast address and base interface of this IP service. To specify a subnet with32 addresses, you could define the service as (leaving off the IPaddrbecause we can!):
linuxha1 192.168.85.3/27 httpd smbThis sets the IP service address to 192.168.85.3, the netmask to255.255.255.224 and the broadcast address would default to 192.168.85.31(which is the highest address on the subnet). The last parameteryou can set is the broadcast address. To override thedefault and set it to 192.168.85.16, your entry would read:
linuxha1 192.168.85.3/27/192.168.85.16 httpd smbYou may be wondering whether any of the above is necessary foryou. It depends. If you've properly established a net route(independent of heartbeat) for the service's IP address, with thecorrect netmask and broadcast address, then no, it's not necessary foryou. However, this case won't fit everybody and that's why theoption's there! In addition, you may have more than one possibleinterface that could be used for the service IP. Read on to seehow heartbeat treats this...
Once you straighten out your haresources file, copy ha.cf andharesources to /etc/ha.d and you're ready to start!
auto_failback onor
auto_failback off
respawn hacluster /usr/lib/heartbeat/ipfail
ping pnode1 pnode2 pnodeNSimply replace pnode1, pnode2, ... pnodeN with the IP addresses of your ping nodes.
NOTE: You will want to check on the availability of the ping nodes prior to using them. If you cannot ping them from both of the HA nodes, they are useless.
Heartbeat decides which interface will be used by looking at therouting table. It tries to select the lowest cost route to the IPaddress to be taken over. In the case of a tie, it chooses thefirst route found. For most configurations this means the defaultroute will be least preferred.
If you don't specify a netmask for the IP address in the haresourcesfile, the netmask associated with the selected route will be used. Simmilarly, if an interface is not specivied, then the virtual ip addresswill be added to the interface associated with the selected route.If the broadcast address is omitted then the hightest address inthe subnet is used.
Configuring Authkeys
The third file to configure determines your authenticationkeys. There are three types of authentication methodsavailable: crc, md5, and sha1. "Well, which should I use?",you ask. Since this document is called "Getting Started",we'll keep it simple......
If your heartbeat runs over a secure network, such as the crossovercable in our example, you'll want to use crc. This is the cheapestmethod from a resources perspective. If the network is insecure,but you're either not very paranoid or concerned about minimizing CPUresources, use md5. Finally, if you want the best authenticationwithout regard for CPU resources, use sha1. It's the hardest tocrack.
The format of the file is as follows:
auth <number>
<number> <authmethod> [<authkey>]
SO, for sha1, a sample /etc/ha.d/authkeys could be:
auth 1
1 sha1 key-for-sha1-any-text-you-want
For md5, you could use the same as the above, but replace "sha1"with "md5".
Finally, for crc, a sample might be:
auth 2
2 crc
Whatever index you put after the keyword auth must be foundbelow in the keys listed in the file. If you put "auth 4", then theremust be an "4 signaturetype" line in the list below.
Make sure its permissions are safe, like 600. And "any textyou want" is not quite right. There's a limit to the numberof characters you can use.
That's it!
If you want heartbeat to run on startup, what to do will differ onyour distribution. You may need to place links to the startupscript in the appropriate init level directories, but the RPM versionswill do this for you. I have heartbeat start at its defaultsequential priority (75, which means it starts after services 74 andlower and before services with priority 76-99), end at its defaultsequential priority (05), and only care about the 0(halt), 6(reboot),3(text-only), 5(X) run levels.
So, if I had to do it by hand, I'd need to type in the following (asroot, of course):
cd /etc/rc.d/rc0.d ; ln -s ../init.d/heartbeatK05heartbeat
cd /etc/rc.d/rc3.d ; ln -s ../init.d/heartbeatS75heartbeat
cd /etc/rc.d/rc5.d ; ln -s ../init.d/heartbeatS75heartbeat
cd /etc/rc.d/rc6.d ; ln -s ../init.d/heartbeatK05heartbeat
The last time I ran slackware, there was no /etc/rc.d/init.ddirectory (may have changed by now) and to do the same thing, I wouldhave placed in /etc/rc.d/rc.local:
/etc/ha.d/heartbeat start
***This assumes you copy the file ha.rc to /etc/ha.d/heartbeat. If you can't find /etc/rc.d/init.d with your distribution and you'reunsure of how processes start, you can use the rc.local method. But you're on your own for shutdown, I just don't remember...
Note: If you use the watchdog function, you'll need toload its module at bootup as well. You can put the followingcommand at the bottom of the /etc/rc.d/rc.sysinit file:
/sbin/insmod softdog
For the rc.local method, just put the same line right above where youstart heartbeat.
Once you've started heartbeat, take a peek at your log file (defaultis /var/log/ha-log) before testing it. If all is peachy, theservice owner's log (linuxha1 in our example) should look something likethis:
heartbeat: 2003/02/10_13:52:22 info: Neither logfile nor logfacilityfound.
heartbeat: 2003/02/10_13:52:22 info: Logging defaulting to/var/log/ha-log
heartbeat: 2003/02/10_13:52:22 info: **************************
heartbeat: 2003/02/10_13:52:22 info: Configuration validated. Startingheartbeat 0.4.9f
heartbeat: 2003/02/10_13:52:22 info: nice_failback is in effect.
heartbeat: 2003/02/10_13:52:22 info: heartbeat: version 0.4.9f
heartbeat: 2003/02/10_13:52:22 info: Heartbeat generation: 17
heartbeat: 2003/02/10_13:52:22 info: Starting serial heartbeat on tty/dev/ttyS0 (19200 baud)
heartbeat: 2003/02/10_13:52:22 info: UDP Broadcast heartbeat started onport 694 (694) interface eth1
heartbeat: 2003/02/10_13:52:23 info: pid 28140 locked in memory.
heartbeat: 2003/02/10_13:52:23 info: pid 28137 locked in memory.
heartbeat: 2003/02/10_13:52:23 info: pid 28139 locked in memory.
heartbeat: 2003/02/10_13:52:23 notice: Using watchdog device:/dev/watchdog
heartbeat: 2003/02/10_13:52:23 info: pid 28141 locked in memory.
heartbeat: 2003/02/10_13:52:23 info: Local status now set to: 'up'
heartbeat: 2003/02/10_13:52:23 info: pid 28138 locked in memory.
heartbeat: 2003/02/10_13:52:23 info: pid 28134 locked in memory.
heartbeat: 2003/02/10_13:52:25 info: Link linuxha1.linux-ha.org:eth1 up.
heartbeat: 2003/02/10_13:53:23 WARN: node linuxha2.linux-ha.org: is dead
heartbeat: 2003/02/10_13:53:23 info: Dead node linuxha2.linux-ha.orgheld no resources.
heartbeat: 2003/02/10_13:53:23 info: Resources being acquired fromlinuxha2.linux-ha.org.
heartbeat: 2003/02/10_13:53:23 info: Local status now set to: 'active'
heartbeat: 2003/02/10_13:53:23 info: Running /etc/ha.d/rc.d/statusstatus
heartbeat: 2003/02/10_13:53:23 info: /usr/lib/heartbeat/mach_down:nice_failback: acquiring foreign resources
heartbeat: 2003/02/10_13:53:23 info: mach_down takeover complete.
heartbeat: 2003/02/10_13:53:23 info: mach_down takeover complete fornode linuxha2.linux-ha.org.
heartbeat: 2003/02/10_13:53:23 info: Acquiring resource group:linuxha1.linux-ha.org 192.168.85.3 datadisk::drbd0 datadisk::drbd1 mirror
heartbeat: 2003/02/10_13:53:23 info: Running/etc/ha.d/resource.d/IPaddr 192.168.85.3 start
heartbeat: 2003/02/10_13:53:23 info: /sbin/ifconfig eth0:0 192.168.85.3netmask 255.255.255.0 broadcast 192.168.85.255
heartbeat: 2003/02/10_13:53:23 info: Sending Gratuitous Arp for192.168.85.3 on eth0:0 [eth0]
heartbeat: 2003/02/10_13:53:23 /usr/lib/heartbeat/send_arp eth0192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:24 info: Running/etc/ha.d/resource.d/datadisk drbd0 start
heartbeat: 2003/02/10_13:53:24 info: Running/etc/ha.d/resource.d/datadisk drbd1 start
heartbeat: 2003/02/10_13:53:25 info: Running/etc/ha.d/resource.d/mirror start
heartbeat: 2003/02/10_13:53:25 /usr/lib/heartbeat/send_arp eth0192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:26 info: Resource acquisition completed.
heartbeat: 2003/02/10_13:53:28 /usr/lib/heartbeat/send_arp eth0192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:30 /usr/lib/heartbeat/send_arp eth0192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:32 /usr/lib/heartbeat/send_arp eth0192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:33 info: Local Resource acquisitioncompleted. (none)
heartbeat: 2003/02/10_13:53:33 info: local resource transitioncompleted.
heartbeat: 2003/02/10_13:56:30 info: Link linuxha2.linux-ha.org:eth1 up.
heartbeat: 2003/02/10_13:56:30 info: Status update for nodelinuxha2.linux-ha.org: status up
heartbeat: 2003/02/10_13:56:30 info: Running /etc/ha.d/rc.d/statusstatus
heartbeat: 2003/02/10_13:56:30 info: Status update for nodelinuxha2.linux-ha.org: status active
heartbeat: 2003/02/10_13:56:30 info: remote resource transitioncompleted.
heartbeat: 2003/02/10_13:56:30 info: Running /etc/ha.d/rc.d/statusstatus
heartbeat: 2003/02/10_13:56:31 info: Linklinuxha2.linux-ha.org:/dev/ttyS0 up.
NOTE: Your log may differ depending on when you startedheartbeat on linuxha2!!! I started heartbeat on the linuxha2@13:56:30...
OK, now try to ping your cluster's IP (192.168.85.3 in theexample). If this works, ssh to it and verify you're on linuxha1.
Next, make sure your services are tied to the .3 address. Bringup netscape and type in 192.168.85.3 for the URL. For Samba, tryto map the drive "\192.168.85.3\test" assuming you set up a sharecalled "test". See Samba docs to get that going. As anaside, however, you'll want to use the "netbios name" parameter to haveyour Samba share listed under the cluster name and not the hostname ofyour cluster member!
NOTE: If you can't bring up theservice IP address and you get ha-log entries similar to this:
If this all works, you've got availability. Now let's see if wehave High Availability :-)It may mean that you need to enable IP aliasing in your kernelbuild. Check /usr/src/linux/.config for "CONFIG_IP_ALIAS=y" if youdon't have it, you'll have the line "CONFIG_IP_ALIAS is not set". Rebuild your kernel with IP aliasing enabled.SIOCSIFADDR: No such device
SIOCSIFFLAGS: Nosuch device
SIOCSIFNETMASK:No such device
SIOCSIFBRDADDR:No such device
SIOCSIFFLAGS: Nosuch device
SIOCADDRT: Nosuch device
Take down linuxha1. Kill power, kill heartbeat, whatever youhave the stomach for, but don't just yank both the serial andeth1 heartbeat cables. If you do that, you'll have servicesrunning on both nodes and when you re-connect the heartbeat, a bit ofchaos....
Now ping the cluster IP. Approximately 5-10 seconds later it shouldstart responding again. Telnet again and verify you're onlinuxha2. If it happens but takes more like 30 seconds, somethingis wrong.
If you get this far, it's probably working, but you should probablycheck all your heartbeats, too.
First, check your serial heartbeat. Unplug the crossover cablefrom your eth1 NIC that you're using for your bcast heartbeat. Waitabout 10 seconds.
Now, look at /var/log/ha-log on linuxha2 and make sure there's no linelike this:
1999/08/16_12:40:58 node linuxha1.linux-ha.org:is dead
If you get that, your serial heartbeat isn't working and your secondnode is taking over. To avoid any problems, shut down heartbeat onthe first node, then test your null modem cable. Run the aboveserial tests again.
If your log is clean, great. Re-connect the crossovercable. Once that's done, disconnect the serial cable, wait 10seconds and check the linuxha2 log again.
If it's clean, congrats! If not, you can check /var/log/ha-logand /var/log/ha-debug for more clues.
Appendix A - Ethernet Crossover Cable Construction
Your cable diagram should be as follows:
Connector A Connector B
Connector A | Connector B |
Pin # | Pin # |
1 | 3 |
2 | 6 |
3 | 1 |
6 | 2 |
4 | 7 |
5 | 8 |
7 | 4 |
8 | 5 |
Rev 1.2.0
(c) 2003 Rudy Pawul
rpawul@iso-ne.com