= Setting up active-active load-sharing hash-based stateful firewall = by Pablo Neira Ayuso in 2010 If you want to know more about this configuration and other firewall architectures, please read: * Demystifying cluster-based fault-tolerant firewalls. IEEE Internet Computing, 13(6):31-38, December 2009. Available at: https://perso.ens-lyon.fr/laurent.lefevre/pdf/IC2009_Neira_Gasca_Lefevre.pdf == 0x0 intro == Under this directory you can find a script that allows you to setup a simple active-active hash-based load-sharing firewall cluster based on the iptables' cluster match. == 0x1 testbed == My testbed looks like the following: ---------- eth1 eth2 ---------- client A ------| |--- firewall 1 ----| | (192.168.0.2) | switch | (.0.5) (.1.5) | switch |--- server | | | | (192.168.1.2) client B ------| |--- firewall 2 ----| | (192.168.0.11) ---------- (.0.5) (.1.5) ---------- eth1 eth2 The firewalls perform SNAT to masquerade clients. Note that both cluster firewall have the same IP addresses. For administrative purposes, it is a good idea that each firewall has its one IP address to SSH them, make sure you add the appropriate rule to skip the cluster match rule-set! More comments: although the picture shows two switches, I'm actually using one and I separated the clients and the server in two different VLANs. The script also sets a multicast MAC address that is the same for both firewalls so that the switch floods the same packets to both firewalls. Using a multicast MAC address is a RFC violation [1], since network node must not include multicast MAC address in ARP replies, but: a) it is the only way I found so far to obtain the behaviour from my HP procurve switches. b) the VRRP MAC address range is not supported appropritely by switch vendors, at least by my HP procurve switches. If switch vendors support this MAC address range appropriately, they will handle them as multicast MAC address. As of 2011 I did not find any switch handling VRRP MAC address range as multicast ports (they still handle them as normal unicast MAC addresses, therefore my solution does not work with two nodes with the same VRRP MAC address). The cluster match relies upon the Connection Tracking System (conntrack). Thus, traffic coming in the reply direction which does not belong this node is labeled as INVALID for TCP and ICMP protocols. The scripts add a rule to drop this traffic to avoid possible packet duplication. For UDP traffic, you will have to add a rule to drop NEW traffic in the reply direction because conntrack considers it valid. If you don't do this, both nodes may accept reply traffic, thus, sending duplicated packets to the client, which is not what you want. During my last experiments, I was using the Linux kernel 2.6.37 in the firewalls and the server. Everything you need to setup this configuration is available in stock Linux kernels. No external patches with new features are required. == 0x2 running scripts == Copy the script to each node, then adjust the script variables to your configuration. On firewall 1: firewall1# ./clusterip-node1.sh start On firewall 2: firewall2# ./clusterip-node2.sh start == 0x3 trouble-shooting == Some troubleshooting may help to understand how this setup works. Check the following if you experience problems: 1) Check that Multicast MAC address are assigned to the NICs: firewall1$ ip maddr [...] 2: eth1 [...] link 01:00:5e:00:01:01 static 3: eth2 [...] link 01:00:5e:00:01:02 static The scripts add the multicast MAC addresses to the NICs, if this is not done the traffic will be discarded by the firewalls' networking stack. 2) ICMP ping the server from one the clients: client$ ping -c 1 192.168.1.2 PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data. 64 bytes from 192.168.1.2: icmp_seq=1 ttl=63 time=0.220 ms --- 192.168.1.2 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.220/0.220/0.220/0.000 ms If this does not work, make sure the firewalls are including the multicast MAC address in their ARP replies, you can check this by looking at the neigbour cache: client$ ip neighbour [...] 192.168.0.5 dev eth1 lladdr 01:00:5e:00:01:01 REACHABLE server$ ip neighbour [...] 192.168.1.5 dev eth1 lladdr 01:00:5e:00:01:02 REACHABLE firewall$ ip neighbour [...] 192.168.0.5 dev eth1 lladdr 01:00:5e:00:01:01 REACHABLE 192.168.1.5 dev eth2 lladdr 01:00:5e:00:01:02 REACHABLE 3) Test TCP connections: you can use netcat to start simple connections between the client and the server. You can also use intensive HTTP traffic generation to test performance like injectX.c and httpterm from Willy Tarreau: http://1wt.eu/tools/inject/ http://1wt.eu/tools/httpterm/ clientA:~/http-client-benchmark# ./client -t 60 -u 200 -G 192.168.1.2:8000 # hits hits/s ^h/s ^bytes kB/s errs rst tout mhtime 266926 26692 26766 3881270 3779 0 0 0 0.237 294067 26733 27141 3935621 3785 0 0 0 0.176 clientB~/http-client-benchmark# ./client -t 30 -u 40 -G 192.168.1.2:8020 # hits hits/s ^h/s ^bytes kB/s errs rst tout mhtime 53250 17750 17368 2518448 2513 0 0 0 0.240 70766 17691 17516 2539907 2505 0 0 0 0.297 ^h/s is the current number of HTTP petitions per second. This means that you get ~45000 HTTP petitions per second. In my setup, with only one firewall active I get ~27000 HTTP petitions per second. We obtain extra performance of ~66%, not that bad 8-). I have configured httpterm to send object of 0 bytes over HTTP to obtain the maximum number of HTTP flows. This is the worst case scenario in firewall load. I forgot to mention that I set CPU affinity for NICs IRQs. I've got two cores, one for each firewall NIC. == 0x4 report sucessful setups == My testbed is composed of low-cost basic five years old HP proliant systems, you can see that the numbers are not great. I like knowing about numbers, I'd appreciate if you drop me a line to tell me the numbers that you get and your experience. == 0x5 conclusions and future works == The cluster match allows to setup load-sharing hash-based stateful firewalls that is a way to avoid having a spare backup firewall as it happens in classical Primary-Backup setups. Still, there is some pending work to fully integrate conntrackd and HA managers with it (in case that you want high availability, of course). -o- [1] More specifically, it's a RFC 1812 (section 3.3.2) violation. It's been reported that this is a problem for CISCO routers: http://marc.info/?l=netfilter&m=128810399113170&w=2 Michele Codutti: "The problem is the multicast MAC address that these routers doesn't "like". They discard any incoming packet with MAC multicast address to be compliant with RFC1812. The only documented (by Cisco) workaround is to put a fixed arp entry with the multicast address that maps the clustered IP in the router." If you keep reading the mailing thread, the reported problem affected Cisco 7200 VXR. --02/02/2010