Now you have to decide on some of your implementation. Your "cluster" is established via a "heartbeat" between the two computers (nodes) generated by the software package of the same name. However, this heartbeat needs one or more media paths (serial via a null modem cable, ethernet via a crossover cable, etc.) between the nodes.
At this point, you're actually ready to begin hardware-wise. Of course, since you're looking into HA, you'll mostly likely want to avoid having only one point of failure. In this case, that would be your null modem cable/serial port or network interface card(NIC)/crossover cable. So, you need to decide whether you wish to add a second serial/null modem connection or a second network interface card (NIC)/crossover connnection to each node. See Appendix A for instructions on how to build a Cat-5 crossover cable. My heartbeat path setup uses one serial port and one extra NIC because I only had one null modem cable, had an extra of NIC on hand and thought it was good to have two medium types for the heartbeats.
Once your hardware is in order, you must install your OS and configure your networking (I used Red Hat). Assuming you have 2 NICs, one should be configured for your "normal" network and the other as a private network between your clustered nodes (via the crossover cable). For an example, we will assume that our cluster will have the following addresses:
Node 1 (linuxha1): 192.168.85.1 (normal 192x net)
10.0.0.1 (private 10x net for heartbeat)
Node 2 (linuxha2): 192.168.85.2 (192x)
10.0.0.2 (10x)
Note: None of these addresses should be
your "cluster address" - the address handled by heartbeat and failed
over between nodes!
Most *nix distributions this easy during installation, however, if you are having any problems, refer to either the Ethernet HOWTO, or the documentation for your distribution. To check your configuration, type:
ifconfig
This will show your network interfaces and their configuration. You can obtain your network routing information from "netstat -nr".
If it looks good, make sure you can ping between both nodes on all interfaces.
Next, if you're using one, you'll need to test your serial
connection. On one node, which will be the receiver, type:
cat
</dev/ttyS0
On the other node, type,:
echo
hello >/dev/ttyS0
You should see the text on the receiver node. If it works, change their roles and try again. If it doesn't, it may be as simple as having the wrong device file. Volker's HA Hardware Guide and the Serial HOWTO are two good resources for troubleshooting your serial connection.
There are binary RPMs at the website, or you can build heartbeat from source. Grab the tarball (or install the source RPM). Untar it into your favorite source directory. From the top of the source tree, type "./ConfigureMe configure", followed by "make" and "make install". If you have problems installing the RPMs found at the website and want a way to make your own, there may be help in the FAQ.
For our example, we'll assume the high availability services are Apache and Samba. The IP for the cluster is mandatory, and don't configure the cluster IP outside of the haresources file!. The haresources will need one line:
linuxha1.linux-ha.org 192.168.85.3 httpd smbSo, this line dictates that on startup, have linuxha1 serve the IP 192.168.85.3 and start apache and samba as well.
Note: httpd and smb are the name of startup scripts
for Apache and Samba, respectively. Heartbeat will look for
startup scripts of the same name in the following paths:
/etc/ha.d/resource.d
/etc/rc.d/init.d
These scripts must start services via "scriptname start" and
stop them via "scriptname stop".
So you can use any services as long as they conform to the above
standard.
Should you need to pass arguments to a custom script, the format would be:
scriptname::argumentSo, if we added a service "maid" which needed the argument "vacuum", our haresources line would modify to the following:
linuxha1 192.168.85.3 httpd smb maid::vacuum
This brings us to some added flexibility with
the service IP address. We are actually using a shorthand notation
above. The actual line could have read (we've canned the maid):
linuxha1 IPaddr::192.168.85.3 httpd smbWhere IPaddr is the name of our service script, taking the argument 192.168.85.3. Sure enough, if you look in the directory /etc/ha.d/resource.d, you will find a script called IPaddr. This script will also allow you to manipulate the netmask, broadcast address and base interface of this IP service. To specify a subnet with 32 addresses, you could define the service as (leaving off the IPaddr because we can!):
linuxha1 192.168.85.3/27 httpd smbThis sets the IP service address to 192.168.85.3, the netmask to 255.255.255.224 and the broadcast address would default to 192.168.85.31 (which is the highest address on the subnet). The last parameter you can set is the broadcast address. To override the default and set it to 192.168.85.16, your entry would read:
linuxha1 192.168.85.3/27/192.168.85.16 httpd smbYou may be wondering whether any of the above is necessary for you. It depends. If you've properly established a net route (independent of heartbeat) for the service's IP address, with the correct netmask and broadcast address, then no, it's not necessary for you. However, this case won't fit everybody and that's why the option's there! In addition, you may have more than one possible interface that could be used for the service IP. Read on to see how heartbeat treats this...
Once you straighten out your haresources file, copy ha.cf and
haresources to /etc/ha.d and you're ready to start!
auto_failback onor
auto_failback off
respawn hacluster /usr/lib/heartbeat/ipfail
ping pnode1 pnode2 pnodeNSimply replace pnode1, pnode2, ... pnodeN with the IP addresses of your ping nodes.
NOTE: You will want to check on the availability of the ping nodes prior to using them. If you cannot ping them from both of the HA nodes, they are useless.
Heartbeat decides which interface will be used by looking at the routing table. It tries to select the lowest cost route to the IP address to be taken over. In the case of a tie, it chooses the first route found. For most configurations this means the default route will be least preferred.
If you don't specify a netmask for the IP address in the haresources
file, the netmask associated with the selected route will be used.
Simmilarly, if an interface is not specivied, then the virtual ip address
will be added to the interface associated with the selected route.
If the broadcast address is omitted then the hightest address in
the subnet is used.
Configuring Authkeys
The third file to configure determines your authentication keys. There are three types of authentication methods available: crc, md5, and sha1. "Well, which should I use?", you ask. Since this document is called "Getting Started", we'll keep it simple......
If your heartbeat runs over a secure network, such as the crossover cable in our example, you'll want to use crc. This is the cheapest method from a resources perspective. If the network is insecure, but you're either not very paranoid or concerned about minimizing CPU resources, use md5. Finally, if you want the best authentication without regard for CPU resources, use sha1. It's the hardest to crack.
The format of the file is as follows:
auth <number>
<number> <authmethod> [<authkey>]
SO, for sha1, a sample /etc/ha.d/authkeys could be:
auth 1
1 sha1 key-for-sha1-any-text-you-want
For md5, you could use the same as the above, but replace "sha1" with "md5".
Finally, for crc, a sample might be:
auth 2
2 crc
Whatever index you put after the keyword auth must be found below in the keys listed in the file. If you put "auth 4", then there must be an "4 signaturetype" line in the list below.
Make sure its permissions are safe, like 600. And "any text
you want" is not quite right. There's a limit to the number
of characters you can use.
That's it!
If you want heartbeat to run on startup, what to do will differ on your distribution. You may need to place links to the startup script in the appropriate init level directories, but the RPM versions will do this for you. I have heartbeat start at its default sequential priority (75, which means it starts after services 74 and lower and before services with priority 76-99), end at its default sequential priority (05), and only care about the 0(halt), 6(reboot), 3(text-only), 5(X) run levels.
So, if I had to do it by hand, I'd need to type in the following (as root, of course):
cd /etc/rc.d/rc0.d ; ln -s ../init.d/heartbeat
K05heartbeat
cd /etc/rc.d/rc3.d ; ln -s ../init.d/heartbeat
S75heartbeat
cd /etc/rc.d/rc5.d ; ln -s ../init.d/heartbeat
S75heartbeat
cd /etc/rc.d/rc6.d ; ln -s ../init.d/heartbeat
K05heartbeat
The last time I ran slackware, there was no /etc/rc.d/init.d
directory (may have changed by now) and to do the same thing, I would
have placed in /etc/rc.d/rc.local:
/etc/ha.d/heartbeat start
***This assumes you copy the file ha.rc to /etc/ha.d/heartbeat.
If you can't find /etc/rc.d/init.d with your distribution and you're
unsure of how processes start, you can use the rc.local method.
But you're on your own for shutdown, I just don't remember...
Note: If you use the watchdog function, you'll need to
load its module at bootup as well. You can put the following
command at the bottom of the /etc/rc.d/rc.sysinit file:
/sbin/insmod softdog
For the rc.local method, just put the same line right above where you
start heartbeat.
Once you've started heartbeat, take a peek at your log file (default
is /var/log/ha-log) before testing it. If all is peachy, the
service owner's log (linuxha1 in our example) should look something like
this:
heartbeat: 2003/02/10_13:52:22 info: Neither logfile nor logfacility
found.
heartbeat: 2003/02/10_13:52:22 info: Logging defaulting to
/var/log/ha-log
heartbeat: 2003/02/10_13:52:22 info: **************************
heartbeat: 2003/02/10_13:52:22 info: Configuration validated. Starting
heartbeat 0.4.9f
heartbeat: 2003/02/10_13:52:22 info: nice_failback is in effect.
heartbeat: 2003/02/10_13:52:22 info: heartbeat: version 0.4.9f
heartbeat: 2003/02/10_13:52:22 info: Heartbeat generation: 17
heartbeat: 2003/02/10_13:52:22 info: Starting serial heartbeat on tty
/dev/ttyS0 (19200 baud)
heartbeat: 2003/02/10_13:52:22 info: UDP Broadcast heartbeat started on
port 694 (694) interface eth1
heartbeat: 2003/02/10_13:52:23 info: pid 28140 locked in memory.
heartbeat: 2003/02/10_13:52:23 info: pid 28137 locked in memory.
heartbeat: 2003/02/10_13:52:23 info: pid 28139 locked in memory.
heartbeat: 2003/02/10_13:52:23 notice: Using watchdog device:
/dev/watchdog
heartbeat: 2003/02/10_13:52:23 info: pid 28141 locked in memory.
heartbeat: 2003/02/10_13:52:23 info: Local status now set to: 'up'
heartbeat: 2003/02/10_13:52:23 info: pid 28138 locked in memory.
heartbeat: 2003/02/10_13:52:23 info: pid 28134 locked in memory.
heartbeat: 2003/02/10_13:52:25 info: Link linuxha1.linux-ha.org:eth1 up.
heartbeat: 2003/02/10_13:53:23 WARN: node linuxha2.linux-ha.org: is dead
heartbeat: 2003/02/10_13:53:23 info: Dead node linuxha2.linux-ha.org
held no resources.
heartbeat: 2003/02/10_13:53:23 info: Resources being acquired from
linuxha2.linux-ha.org.
heartbeat: 2003/02/10_13:53:23 info: Local status now set to: 'active'
heartbeat: 2003/02/10_13:53:23 info: Running /etc/ha.d/rc.d/status
status
heartbeat: 2003/02/10_13:53:23 info: /usr/lib/heartbeat/mach_down:
nice_failback: acquiring foreign resources
heartbeat: 2003/02/10_13:53:23 info: mach_down takeover complete.
heartbeat: 2003/02/10_13:53:23 info: mach_down takeover complete for
node linuxha2.linux-ha.org.
heartbeat: 2003/02/10_13:53:23 info: Acquiring resource group:
linuxha1.linux-ha.org 192.168.85.3 datadisk::drbd0 datadisk::drbd1 mirror
heartbeat: 2003/02/10_13:53:23 info: Running
/etc/ha.d/resource.d/IPaddr 192.168.85.3 start
heartbeat: 2003/02/10_13:53:23 info: /sbin/ifconfig eth0:0 192.168.85.3
netmask 255.255.255.0 broadcast 192.168.85.255
heartbeat: 2003/02/10_13:53:23 info: Sending Gratuitous Arp for
192.168.85.3 on eth0:0 [eth0]
heartbeat: 2003/02/10_13:53:23 /usr/lib/heartbeat/send_arp eth0
192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:24 info: Running
/etc/ha.d/resource.d/datadisk drbd0 start
heartbeat: 2003/02/10_13:53:24 info: Running
/etc/ha.d/resource.d/datadisk drbd1 start
heartbeat: 2003/02/10_13:53:25 info: Running
/etc/ha.d/resource.d/mirror start
heartbeat: 2003/02/10_13:53:25 /usr/lib/heartbeat/send_arp eth0
192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:26 info: Resource acquisition completed.
heartbeat: 2003/02/10_13:53:28 /usr/lib/heartbeat/send_arp eth0
192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:30 /usr/lib/heartbeat/send_arp eth0
192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:32 /usr/lib/heartbeat/send_arp eth0
192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:33 info: Local Resource acquisition
completed. (none)
heartbeat: 2003/02/10_13:53:33 info: local resource transition
completed.
heartbeat: 2003/02/10_13:56:30 info: Link linuxha2.linux-ha.org:eth1 up.
heartbeat: 2003/02/10_13:56:30 info: Status update for node
linuxha2.linux-ha.org: status up
heartbeat: 2003/02/10_13:56:30 info: Running /etc/ha.d/rc.d/status
status
heartbeat: 2003/02/10_13:56:30 info: Status update for node
linuxha2.linux-ha.org: status active
heartbeat: 2003/02/10_13:56:30 info: remote resource transition
completed.
heartbeat: 2003/02/10_13:56:30 info: Running /etc/ha.d/rc.d/status
status
heartbeat: 2003/02/10_13:56:31 info: Link
linuxha2.linux-ha.org:/dev/ttyS0 up.
NOTE: Your log may differ depending on when you started
heartbeat on linuxha2!!! I started heartbeat on the linuxha2
@13:56:30...
OK, now try to ping your cluster's IP (192.168.85.3 in the
example). If this works, ssh to it and verify you're on linuxha1.
Next, make sure your services are tied to the .3 address. Bring
up netscape and type in 192.168.85.3 for the URL. For Samba, try
to map the drive "\192.168.85.3\test" assuming you set up a share
called "test". See Samba docs to get that going. As an
aside, however, you'll want to use the "netbios name" parameter to have
your Samba share listed under the cluster name and not the hostname of
your cluster member!
NOTE: If you can't bring up the service IP address and you get ha-log entries similar to this:
If this all works, you've got availability. Now let's see if we have High Availability :-)It may mean that you need to enable IP aliasing in your kernel build. Check /usr/src/linux/.config for "CONFIG_IP_ALIAS=y" if you don't have it, you'll have the line "CONFIG_IP_ALIAS is not set". Rebuild your kernel with IP aliasing enabled.SIOCSIFADDR: No such device
SIOCSIFFLAGS: No such device
SIOCSIFNETMASK: No such device
SIOCSIFBRDADDR: No such device
SIOCSIFFLAGS: No such device
SIOCADDRT: No such device
Take down linuxha1. Kill power, kill heartbeat, whatever you
have the stomach for, but don't just yank both the serial and
eth1 heartbeat cables. If you do that, you'll have services
running on both nodes and when you re-connect the heartbeat, a bit of
chaos....
Now ping the cluster IP. Approximately 5-10 seconds later it should
start responding again. Telnet again and verify you're on
linuxha2. If it happens but takes more like 30 seconds, something
is wrong.
If you get this far, it's probably working, but you should probably
check all your heartbeats, too.
First, check your serial heartbeat. Unplug the crossover cable
from your eth1 NIC that you're using for your bcast heartbeat. Wait
about 10 seconds.
Now, look at /var/log/ha-log on linuxha2 and make sure there's no line
like this:
1999/08/16_12:40:58 node linuxha1.linux-ha.org:
is dead
If you get that, your serial heartbeat isn't working and your second
node is taking over. To avoid any problems, shut down heartbeat on
the first node, then test your null modem cable. Run the above
serial tests again.
If your log is clean, great. Re-connect the crossover
cable. Once that's done, disconnect the serial cable, wait 10
seconds and check the linuxha2 log again.
If it's clean, congrats! If not, you can check /var/log/ha-log
and /var/log/ha-debug for more clues.
Appendix A - Ethernet Crossover Cable Construction
Your cable diagram should be as follows:
Connector A Connector B
Connector A | Connector B |
Pin # | Pin # |
1 | 3 |
2 | 6 |
3 | 1 |
6 | 2 |
4 | 7 |
5 | 8 |
7 | 4 |
8 | 5 |
Rev 1.2.0
(c) 2003 Rudy Pawul
rpawul@iso-ne.com