Getting Started with Linux-HA (heartbeat)
Intro
Let me preface this document by saying most of this is _not_ originalwork. My purpose for writing this document is just trying tocontribute in some way to possibly help those who REALLY get thingsdone. The "work" I am contributing is mostly compiling bits andpieces from other HA documents (such as Volker Wiegand's HardwareInstallation Guide) into a document that can help novices get started onHA without pestering Alan (like I did!) and to cut down on repeatquestions on the mailing list.Getting Started
The first thing you'll need is two computers. You need not haveidentical hardware in both machines (or amount of memory, etc.), but ifyou did, it would make your life that much easier when a componentfails.Now you have to decide on some of your implementation. Your"cluster" is established via a "heartbeat" between the two computers(nodes) generated by the software package of the same name. However, this heartbeat needs one or more media paths (serial via a nullmodem cable, ethernet via a crossover cable, etc.) between the nodes.
At this point, you're actually ready to begin hardware-wise. Of course, since you're looking into HA, you'll mostly likely want toavoid having only one point of failure. In this case, that wouldbe your null modem cable/serial port or network interfacecard(NIC)/crossover cable. So, you need to decide whether you wishto add a second serial/null modem connection or a second networkinterface card (NIC)/crossover connnection to each node. SeeAppendix A for instructions on how to build a Cat-5 crossovercable. My heartbeat path setup uses one serial port and one extraNIC because I only had one null modem cable, had an extra of NIC on handand thought it was good to have two medium types for the heartbeats.
Once your hardware is in order, you must install your OS andconfigure your networking (I used Red Hat). Assuming you have 2NICs, one should be configured for your "normal" network and the otheras a private network between your clustered nodes (via the crossovercable). For an example, we will assume that our cluster will havethe following addresses:
Node 1 (linuxha1): 192.168.85.1 (normal 192x net)
10.0.0.1 (private 10x net for heartbeat)
Node 2 (linuxha2): 192.168.85.2 (192x)
10.0.0.2 (10x)
Note: None of these addresses should beyour "cluster address" - the address handled by heartbeat and failedover between nodes!
Most *nix distributions this easy during installation, however, ifyou are having any problems, refer to either the Ethernet HOWTO, or thedocumentation for your distribution. To checkyour configuration, type:
ifconfig
This will show your network interfaces and theirconfiguration. You can obtain your network routing informationfrom "netstat -nr".
If it looks good, make sure you can ping between both nodes on allinterfaces.
Next, if you're using one, you'll need to test your serialconnection. On one node, which will be the receiver, type:
cat</dev/ttyS0
On the other node, type,:
echohello >/dev/ttyS0
You should see the text on the receiver node. If it works,change their roles and try again. If it doesn't, it may be assimple as having the wrong device file. Volker's HA Hardware Guideand the Serial HOWTO are two good resources for troubleshooting yourserial connection.
Installing Heartbeat.
You can now install the heartbeat package. If you're readingthis, you already have it, but in any case it's available at:There are binary RPMs at the website, or you can build heartbeatfrom source. Grab the tarball (or install the source RPM). Untar it into your favorite source directory. From thetop of the source tree, type "./ConfigureMeconfigure", followed by "make" and "make install". If you have problems installing the RPMs found at the website and want away to make your own, there may be help in the FAQ.
Configuring Heartbeat
Configuring ha.cfThere are three files you will need to configure before starting upheartbeat. First, is ha.cf. This will be placed inthe /etc/ha.d directory that is created after installation. Ittells heartbeat what types of media paths to use and how to configurethem. The ha.cf in the source directory contains all thevarious options you can use, I'll go through it line by line...
- serial /dev/ttyS0
- Use a serial heartbeat - if you don't use a serial heartbeat, youmust use another medium, such as a bcast (ethernet) heartbeat. Replace /dev/ttyS0 with the appropriate device file for yourrequired serial heartbeat.
- watchdog /dev/watchdog
- Optional. The watchdog function provides a way to have asystem that is still minimally functioning, but not providing aheartbeat, reboot itself after a minute of being sick. This couldhelp to avoid a scenario where the machine recovers its heartbeat afterbeing pronounced dead. If that happened and a disk mount failedover, you could have two nodes mounting a disk simultaneously. If youwish to use this feature, then in addition to this line, you will needto load the "softdog" kernel module and create the actual devicefile. To do this, first type "insmod softdog" to load themodule. Then, type "grep misc /proc/devices" and note the number itreports (should be 10). Next, type "cat /proc/misc | grepwatchdog" and note that number (should be 130). Now youcan create the device file with that info typing, "mknod/dev/watchdog c 10 130".
- bcast eth1
- Specifies to use a broadcast heartbeat over the eth1 interface(replace with eth0, eth2, or whatever you use).
- keepalive 2
- Sets the time between heartbeats to 2 seconds.
- warntime 10
- Time in seconds before issuing a "late heartbeat" warning in thelogs.
- deadtime 30
- Node is pronounced dead after 30 seconds.
- initdead 120
- With some configurations, the network takes some time to startworking after a reboot. This is a separate "deadtime" to handlethat case. It should be at least twice the normal deadtime.
- hopfudge 1
- Optional. For ring topologies, number of hopsallowed in addition to the number of nodes in the cluster.
- baud 19200
- Speed at which to run the serial line (bps).
- udpport 694
- Use port number 694 for bcast or ucast communication.This is the default, and the official IANA registered port number.
- auto_failback on
- Required. For those familiar with Tru64 Unix,heartbeat acts as if in "favored member" mode. The master listedin the haresources file holds allthe resources until a failover, at which time the slave takesover. When auto_failback is set to ononce the master comes back online, it will take everythingback from the slave. When set to off this option will prevent the master node fromre-acquiring cluster resources after a failover.This option is similar to to the obsolete nice_failback option.If you want to upgrade from a cluster which had nice_failbackset off, to this or later versions, special considerations applyin order to want to avoid requiring a flash cut. Please see theFAQ for detailson how to deal with this situation.
- node linuxha1.linux-ha.org
- Mandatory. Hostname of machine in cluster asdescribed by `uname -n`.
- node linuxha2.linux-ha.org
- Mandatory. Hostname of machine in cluster asdescribed by `uname -n`.
- respawn userid cmd
- Optional: Lists a command to be spawned andmonitored. Eg: To spawn ccm daemons the following line hasto be added:
- respawn hacluster/usr/lib/heartbeat/ccm
Informs heartbeat to spawn the command with the credentials of that ofuserid (hacluster, in this example) and monitors the health of theprocess, respawning it if dead. For ipfail, the line would be:
respawn hacluster /usr/lib/heartbeat/ipfail
NOTE: If the process dies with exit code 100, the processis not respawned. -
- ping ping1.linux-ha.org ping2.linux-ha.org ....
- Optional: Specify ping nodes. These nodes are notconsidered as cluster nodes. They are used to check networkconnectivity for modules like ipfail.
- ping_group name ping1.linux-ha.org ping2.linux-ha.org ....
- Optional: Specify a group ping nodes. These are the similar to ping nodes, but if any node in a group is available then the group is considered available. The group name can be any string and is used to uniquely identify the group. Each group must appear on a separate line. Like ping nodes the group is not considered to be a cluster node. They appear to be the same as ping nodes and are used to check network connectivity for modules like ipfail.
Once you've got your ha.cf set up, you need to configure haresources. This file specifies the services for the cluster and who the defaultowner is.
Note: This file must be the sameon both nodes!
For our example, we'll assume the high availability services areApache and Samba. The IP for the cluster is mandatory, and don'tconfigure the cluster IP outside of the haresources file!. The haresources will need one line:
linuxha1.linux-ha.org 192.168.85.3 httpd smbSo, this line dictates that on startup, have linuxha1 serve the IP192.168.85.3 and start apache and samba as well.
On shutdown, heartbeat will first stop smb, then apache, then giveup the IP. This assumes that the command "uname -n" spits out"linuxha1.linux-ha.org" - yours may well produce "linuxha1" and if itdoes, use that instead!
Note: httpd and smb are the name of startup scriptsfor Apache and Samba, respectively. Heartbeat will look forstartup scripts of the same name in the following paths:
/etc/ha.d/resource.d
/etc/rc.d/init.d
These scripts must start services via "scriptname start" andstop them via "scriptname stop".
So you can use any services as long as they conform to the abovestandard.
Should you need to pass arguments to a custom script, the formatwould be:
scriptname::argumentSo, if we added a service "maid" which needed the argument "vacuum",our haresources line would modify to the following:
linuxha1 192.168.85.3 httpd smb maid::vacuum
This brings us to some added flexibility withthe service IP address. We are actually using a shorthand notationabove. The actual line could have read (we've canned the maid):
linuxha1 IPaddr::192.168.85.3 httpd smbWhere IPaddr is the name of our service script, takingthe argument 192.168.85.3. Sure enough, if you look in thedirectory /etc/ha.d/resource.d, you will find a script calledIPaddr. This script will also allow you to manipulate the netmask,broadcast address and base interface of this IP service. To specify a subnet with32 addresses, you could define the service as (leaving off the IPaddrbecause we can!):
linuxha1 192.168.85.3/27 httpd smbThis sets the IP service address to 192.168.85.3, the netmask to255.255.255.224 and the broadcast address would default to 192.168.85.31(which is the highest address on the subnet). The last parameteryou can set is the broadcast address. To override thedefault and set it to 192.168.85.16, your entry would read:
linuxha1 192.168.85.3/27/192.168.85.16 httpd smbYou may be wondering whether any of the above is necessary foryou. It depends. If you've properly established a net route(independent of heartbeat) for the service's IP address, with thecorrect netmask and broadcast address, then no, it's not necessary foryou. However, this case won't fit everybody and that's why theoption's there! In addition, you may have more than one possibleinterface that could be used for the service IP. Read on to seehow heartbeat treats this...
Once you straighten out your haresources file, copy ha.cf andharesources to /etc/ha.d and you're ready to start!
The ipfail plugin attempts to provide detection of network failures, andthen intelligently react, directing the cluster to failover resources asnecessary. In order to accomplish this goal, it uses ping nodes or pinggroups which work as "dumb" third parties in the cluster. Provided both HAnodes can communicate with each other, ipfail can reliably detect when oneof their network links has become unusable, and compensate.
To configure ipfail, the following steps must be performed.
- Select good ping node candidates.
It is essential that good strategic ping nodes be selected. The better yourchoices, the stronger your HA cluster becomes. Choosing solid network deviceslike switches and routers is a good idea. Do not choose either of the members of the HA cluster. Nor should you select someone's workstation. It is also important to select ping nodes that reflect the connectivity of yourHA nodes. If you wish to monitor the connectivity of two interfaces, it is wise to select a ping node for each interface, that is reachable exclusively from said interface. Consult ipfail-diagram.pdf for a graphical representation of this idea. - Set auto_failback to on or off.
ipfail will only operate if heartbeat has been configured to somethingother than legacyIn ha.cf, set the auto_failback option to "on" or "off" like so:auto_failback on
orauto_failback off
- Configure your ha.cf to start ipfail.
Add a line like the following to ha.cf (assuming your compile PREFIX is /usr)respawn hacluster /usr/lib/heartbeat/ipfail
- Add the ping nodes to ha.cf.
The ping nodes can be added to the cluster by using a line like the following:ping pnode1 pnode2 pnodeN
Simply replace pnode1, pnode2, ... pnodeN with the IP addresses of your ping nodes.
NOTE: You will want to check on the availability of the ping nodes prior to using them. If you cannot ping them from both of the HA nodes, they are useless.
Selecting an Interface
One important aspect of configuring the haresources file for a machinewhich has multiple ethernet interfaces is to know how heartbeat selectswhich interface will wind up supporting the service addresses that areconfigured in haresources. After all, no interface was specifiedin the haresources file.Heartbeat decides which interface will be used by looking at therouting table. It tries to select the lowest cost route to the IPaddress to be taken over. In the case of a tie, it chooses thefirst route found. For most configurations this means the defaultroute will be least preferred.
If you don't specify a netmask for the IP address in the haresourcesfile, the netmask associated with the selected route will be used. Simmilarly, if an interface is not specivied, then the virtual ip addresswill be added to the interface associated with the selected route.If the broadcast address is omitted then the hightest address inthe subnet is used.
Configuring Authkeys
The third file to configure determines your authenticationkeys. There are three types of authentication methodsavailable: crc, md5, and sha1. "Well, which should I use?",you ask. Since this document is called "Getting Started",we'll keep it simple......
If your heartbeat runs over a secure network, such as the crossovercable in our example, you'll want to use crc. This is the cheapestmethod from a resources perspective. If the network is insecure,but you're either not very paranoid or concerned about minimizing CPUresources, use md5. Finally, if you want the best authenticationwithout regard for CPU resources, use sha1. It's the hardest tocrack.
The format of the file is as follows:
auth <number>
<number> <authmethod> [<authkey>]
SO, for sha1, a sample /etc/ha.d/authkeys could be:
auth 1
1 sha1 key-for-sha1-any-text-you-want
For md5, you could use the same as the above, but replace "sha1"with "md5".
Finally, for crc, a sample might be:
auth 2
2 crc
Whatever index you put after the keyword auth must be foundbelow in the keys listed in the file. If you put "auth 4", then theremust be an "4 signaturetype" line in the list below.
Make sure its permissions are safe, like 600. And "any textyou want" is not quite right. There's a limit to the numberof characters you can use.
That's it!
Starting and testing heartbeat
From Red Hat, or other distributions which use /etc/init.d startupfiles, simply type /etc/init.d/heartbeat start on both nodes. Iwould recommend starting on the system master (in our example linuxha1)first.If you want heartbeat to run on startup, what to do will differ onyour distribution. You may need to place links to the startupscript in the appropriate init level directories, but the RPM versionswill do this for you. I have heartbeat start at its defaultsequential priority (75, which means it starts after services 74 andlower and before services with priority 76-99), end at its defaultsequential priority (05), and only care about the 0(halt), 6(reboot),3(text-only), 5(X) run levels.
So, if I had to do it by hand, I'd need to type in the following (asroot, of course):
cd /etc/rc.d/rc0.d ; ln -s ../init.d/heartbeatK05heartbeat
cd /etc/rc.d/rc3.d ; ln -s ../init.d/heartbeatS75heartbeat
cd /etc/rc.d/rc5.d ; ln -s ../init.d/heartbeatS75heartbeat
cd /etc/rc.d/rc6.d ; ln -s ../init.d/heartbeatK05heartbeat
The last time I ran slackware, there was no /etc/rc.d/init.ddirectory (may have changed by now) and to do the same thing, I wouldhave placed in /etc/rc.d/rc.local:
/etc/ha.d/heartbeat start
***This assumes you copy the file ha.rc to /etc/ha.d/heartbeat. If you can't find /etc/rc.d/init.d with your distribution and you'reunsure of how processes start, you can use the rc.local method. But you're on your own for shutdown, I just don't remember...
Note: If you use the watchdog function, you'll need toload its module at bootup as well. You can put the followingcommand at the bottom of the /etc/rc.d/rc.sysinit file:
/sbin/insmod softdog
For the rc.local method, just put the same line right above where youstart heartbeat.
Once you've started heartbeat, take a peek at your log file (defaultis /var/log/ha-log) before testing it. If all is peachy, theservice owner's log (linuxha1 in our example) should look something likethis:
heartbeat: 2003/02/10_13:52:22 info: Neither logfile nor logfacilityfound.
heartbeat: 2003/02/10_13:52:22 info: Logging defaulting to/var/log/ha-log
heartbeat: 2003/02/10_13:52:22 info: **************************
heartbeat: 2003/02/10_13:52:22 info: Configuration validated. Startingheartbeat 0.4.9f
heartbeat: 2003/02/10_13:52:22 info: nice_failback is in effect.
heartbeat: 2003/02/10_13:52:22 info: heartbeat: version 0.4.9f
heartbeat: 2003/02/10_13:52:22 info: Heartbeat generation: 17
heartbeat: 2003/02/10_13:52:22 info: Starting serial heartbeat on tty/dev/ttyS0 (19200 baud)
heartbeat: 2003/02/10_13:52:22 info: UDP Broadcast heartbeat started onport 694 (694) interface eth1
heartbeat: 2003/02/10_13:52:23 info: pid 28140 locked in memory.
heartbeat: 2003/02/10_13:52:23 info: pid 28137 locked in memory.
heartbeat: 2003/02/10_13:52:23 info: pid 28139 locked in memory.
heartbeat: 2003/02/10_13:52:23 notice: Using watchdog device:/dev/watchdog
heartbeat: 2003/02/10_13:52:23 info: pid 28141 locked in memory.
heartbeat: 2003/02/10_13:52:23 info: Local status now set to: 'up'
heartbeat: 2003/02/10_13:52:23 info: pid 28138 locked in memory.
heartbeat: 2003/02/10_13:52:23 info: pid 28134 locked in memory.
heartbeat: 2003/02/10_13:52:25 info: Link linuxha1.linux-ha.org:eth1 up.
heartbeat: 2003/02/10_13:53:23 WARN: node linuxha2.linux-ha.org: is dead
heartbeat: 2003/02/10_13:53:23 info: Dead node linuxha2.linux-ha.orgheld no resources.
heartbeat: 2003/02/10_13:53:23 info: Resources being acquired fromlinuxha2.linux-ha.org.
heartbeat: 2003/02/10_13:53:23 info: Local status now set to: 'active'
heartbeat: 2003/02/10_13:53:23 info: Running /etc/ha.d/rc.d/statusstatus
heartbeat: 2003/02/10_13:53:23 info: /usr/lib/heartbeat/mach_down:nice_failback: acquiring foreign resources
heartbeat: 2003/02/10_13:53:23 info: mach_down takeover complete.
heartbeat: 2003/02/10_13:53:23 info: mach_down takeover complete fornode linuxha2.linux-ha.org.
heartbeat: 2003/02/10_13:53:23 info: Acquiring resource group:linuxha1.linux-ha.org 192.168.85.3 datadisk::drbd0 datadisk::drbd1 mirror
heartbeat: 2003/02/10_13:53:23 info: Running/etc/ha.d/resource.d/IPaddr 192.168.85.3 start
heartbeat: 2003/02/10_13:53:23 info: /sbin/ifconfig eth0:0 192.168.85.3netmask 255.255.255.0 broadcast 192.168.85.255
heartbeat: 2003/02/10_13:53:23 info: Sending Gratuitous Arp for192.168.85.3 on eth0:0 [eth0]
heartbeat: 2003/02/10_13:53:23 /usr/lib/heartbeat/send_arp eth0192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:24 info: Running/etc/ha.d/resource.d/datadisk drbd0 start
heartbeat: 2003/02/10_13:53:24 info: Running/etc/ha.d/resource.d/datadisk drbd1 start
heartbeat: 2003/02/10_13:53:25 info: Running/etc/ha.d/resource.d/mirror start
heartbeat: 2003/02/10_13:53:25 /usr/lib/heartbeat/send_arp eth0192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:26 info: Resource acquisition completed.
heartbeat: 2003/02/10_13:53:28 /usr/lib/heartbeat/send_arp eth0192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:30 /usr/lib/heartbeat/send_arp eth0192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:32 /usr/lib/heartbeat/send_arp eth0192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:33 info: Local Resource acquisitioncompleted. (none)
heartbeat: 2003/02/10_13:53:33 info: local resource transitioncompleted.
heartbeat: 2003/02/10_13:56:30 info: Link linuxha2.linux-ha.org:eth1 up.
heartbeat: 2003/02/10_13:56:30 info: Status update for nodelinuxha2.linux-ha.org: status up
heartbeat: 2003/02/10_13:56:30 info: Running /etc/ha.d/rc.d/statusstatus
heartbeat: 2003/02/10_13:56:30 info: Status update for nodelinuxha2.linux-ha.org: status active
heartbeat: 2003/02/10_13:56:30 info: remote resource transitioncompleted.
heartbeat: 2003/02/10_13:56:30 info: Running /etc/ha.d/rc.d/statusstatus
heartbeat: 2003/02/10_13:56:31 info: Linklinuxha2.linux-ha.org:/dev/ttyS0 up.
NOTE: Your log may differ depending on when you startedheartbeat on linuxha2!!! I started heartbeat on the linuxha2@13:56:30...
OK, now try to ping your cluster's IP (192.168.85.3 in theexample). If this works, ssh to it and verify you're on linuxha1.
Next, make sure your services are tied to the .3 address. Bringup netscape and type in 192.168.85.3 for the URL. For Samba, tryto map the drive "\192.168.85.3\test" assuming you set up a sharecalled "test". See Samba docs to get that going. As anaside, however, you'll want to use the "netbios name" parameter to haveyour Samba share listed under the cluster name and not the hostname ofyour cluster member!
NOTE: If you can't bring up theservice IP address and you get ha-log entries similar to this:
If this all works, you've got availability. Now let's see if wehave High Availability :-)It may mean that you need to enable IP aliasing in your kernelbuild. Check /usr/src/linux/.config for "CONFIG_IP_ALIAS=y" if youdon't have it, you'll have the line "CONFIG_IP_ALIAS is not set". Rebuild your kernel with IP aliasing enabled.SIOCSIFADDR: No such device
SIOCSIFFLAGS: Nosuch device
SIOCSIFNETMASK:No such device
SIOCSIFBRDADDR:No such device
SIOCSIFFLAGS: Nosuch device
SIOCADDRT: Nosuch device
Take down linuxha1. Kill power, kill heartbeat, whatever youhave the stomach for, but don't just yank both the serial andeth1 heartbeat cables. If you do that, you'll have servicesrunning on both nodes and when you re-connect the heartbeat, a bit ofchaos....
Now ping the cluster IP. Approximately 5-10 seconds later it shouldstart responding again. Telnet again and verify you're onlinuxha2. If it happens but takes more like 30 seconds, somethingis wrong.
If you get this far, it's probably working, but you should probablycheck all your heartbeats, too.
First, check your serial heartbeat. Unplug the crossover cablefrom your eth1 NIC that you're using for your bcast heartbeat. Waitabout 10 seconds.
Now, look at /var/log/ha-log on linuxha2 and make sure there's no linelike this:
1999/08/16_12:40:58 node linuxha1.linux-ha.org:is dead
If you get that, your serial heartbeat isn't working and your secondnode is taking over. To avoid any problems, shut down heartbeat onthe first node, then test your null modem cable. Run the aboveserial tests again.
If your log is clean, great. Re-connect the crossovercable. Once that's done, disconnect the serial cable, wait 10seconds and check the linuxha2 log again.
If it's clean, congrats! If not, you can check /var/log/ha-logand /var/log/ha-debug for more clues.
Appendix A - Ethernet Crossover Cable Construction
Your cable diagram should be as follows:
Connector A Connector B
Connector A | Connector B |
Pin # | Pin # |
1 | 3 |
2 | 6 |
3 | 1 |
6 | 2 |
4 | 7 |
5 | 8 |
7 | 4 |
8 | 5 |
Rev 1.2.0
(c) 2003 Rudy Pawul
rpawul@iso-ne.com
文章来源于领测软件测试网 https://www.ltesting.net/