I'm working on a project to take two FortiGate 60C routers, two Dell 5424 switches, and two Dell r410 servers (each with two GE interfaces) and lash them together such that a hardware failure in any component leaves the service running. This is conceptually pretty simple, in practice various constraints mean it is not as trivial as it seems. There are several approaches I could take:

  • create a left and a right vlan, plumb one leg of each server into a different vlan on each switch, and run a routing protocol like OSPF or BGP over both legs.

this would work, but requires complexity on each server which could limit the server OS choices (we'd need support for whatever protocol is chosen).

  • create a left and a right vlan, and use the load balancer on the fortigates to monitor the servers and decide which way to send traffic.

this will work OK for inbound traffic, but creates some complexity for outbound traffic - how does each server determine which vlan to use for return traffic?

  • run RSTP on the servers, and let layer two deal with the problem, use the fortigates to load balance between the servers in one vlan.

this has charm, in that it makes the IP build of the network much simpler, OTOH it pushes some complexity further down the stack, and it places similar restrictions on server OS choice, in that whatever we run needs to support RSTP. In practice, we mainly want Debian running OpenVZ, so this shouldn't be a problem.

The fortigates don't support STP, other then a basic passing or blocking STP frames, but if we're running them as an active/standby HA pair only one will be active, so it should all work.

The Dell switches support original IEEE STP, RSTP and MST (but not Cisco PVST+ and PVRST+), which means that they should interoperate with a linux bridge running STP/RSTP. The devil, of course, is in the detail - the RSTP support in linux is not well documented.

To test, initially, I tested standard STP - install Debian squeeze, bridge-utils and vlan packages. I made a bridge in /etc/network/interfaces:

iface br1 inet dhcp
 bridge_ports eth0 eth1
 bridge_stp on

then plumbed eth0 and eth1 to ports on each switch. The switch ports were configured as access ports in a vlan with a dhcp server.

Turn on RSTP on the switches - it should be backwards compatible. Make sure you know where the root of the spanning tree is - it is important that the STP root is on one of the switches, and that the linux boxes lose any STP elections, so that the STP root doesn't finish up on one of the Linux servers. On the primary switch, I did:

spanning-tree mode rstp 
spanning-tree priority 0

and on the secondary:

spanning-tree mode rstp 
spanning-tree priority 4096

(you can also use the web GUI). Run

# ifup br1

wait 30 seconds, and

# brctl showstp br1 

should show one port forwarding, one blocking, and you should have connectivity (DHCP may time out, in which case setting a static IP address on the bridge would be simpler). You can randomly pull cables from the server, or power down the switches, and you should retain connectivity, albeit with 30-60 second gaps while STP reconverges.

So far so good, onto RSTP. I gleaned most of this from the linux bridge mailing list, particularly the thread starting https://lists.linux-foundation.org/pipermail/bridge/2008-March/005765.html

First up, do the install dance:

# aptitude install build-essential
# git clone git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/rstp.git
# cd rstp
# make clean ; make ; make install

included in the git repo is a bridge-stp script, that attempts to start the rstpd daemon, and enable RSTP for you. This script needs to be in /sbin/bridge-stp - it's called every time you try and enable STP on a bridge, and needs to return zero for RSTP to be enabled (there are more details in the thread link above).

This is the most crucial point - you must have a /sbin/bridge-stp, and to enable RSTP it must return 0!

You need to read the script and decide if it's going to do what you want - there are various fixes discussed in the thread if you do decide to use it. I decided to start rstpd from init.d, and so made a null bridge-stp script:

# cat > /sbin/bridge-stp
#!/bin/sh
exit 0
^D

and a simple init script for rstpd by copy and modifying /etc/init.d/skeleton:

# diff skeleton rstpd
 3,10c3,10
 < # Provides:          skeleton
 < # Required-Start:    $remote_fs $syslog
 < # Required-Stop:     $remote_fs $syslog
 < # Default-Start:     2 3 4 5
 < # Default-Stop:      0 1 6
 < # Short-Description: Example initscript
 < # Description:       This file should be used to construct scripts to be
 < #                    placed in /etc/init.d.
 ---
 > # Provides:          rstpd
 > # Required-Start:    mountkernfs $local_fs
 > # Required-Stop:     $local_fs
 > # Should-Start:      ifupdown
 > # Should-Stop:       ifupdown
 > # Default-Start:     S
 > # Default-Stop:      0 6
 > # Short-Description: Start the Rapid STP Daemon
 22,25c22,25
 < DESC="Description of the service"
 < NAME=daemonexecutablename
 < DAEMON=/usr/sbin/$NAME
 < DAEMON_ARGS="--options args"
 ---
 > DESC="Rapid STP Daemon"
 > NAME=rstpd
 > DAEMON=/sbin/$NAME
 > #DAEMON_ARGS="--options args"

Make sure you set insserv dependancies correctly - you want rstpd up and running before the bridges are bought up, or turning on RSTP won't work. I ended up changing the dependancies in /etc/init.d/networking to require rstpd first.

Once rstpd is running from boot, we need to add a line to turn on RSTP in /etc/network/interfaces:

iface br1 inet dhcp
 bridge_ports eth0 eth1
 bridge_stp on
 up rstpctl rstp br1 on

bring it up, if RSTP works, you should get:

# ifup br1
# cat /sys/class/net/br0/bridge/stp_state
2

# rstpctl showportdetail br1 eth0
Stp Port eth0: PortId: 8001 in Bridge 'br1':
Priority:          128
State:             Discarding             Uptime: 723      
PortPathCost:      admin: Auto            oper: 20000    
Point2Point:       admin: Auto            oper: Yes      
Edge:              admin: N               oper: N        
Partner:                                  oper: Rapid    
PathCost:          20000

should show you what operating mode the port is in, and what RSTP state it is in. Now you can unplug cables and power down switches, and you should see outages of at most a couple of seconds. All good

Dealing with VLANs.

Standard 802.1w RSTP is agnostic on the subject of vlans - it describes how you create a single spanning-tree for your entire ethernet, regardless of what you might be doing with vlans (as opposed to PVST+ and PVRST+, the cisco proprietary STP variants that run a spanning-tree per vlan). The simplicity of 802.1w has a couple of implications:

  • you can't do the vlan load balancing tricks that are common in the cisco world - rooting different vlans on different switches so that you make some use of your backup links. In practice, that means that you only get 1Gb out of each server, not 2Gb - if maximal bandwidth out of your servers is the desired outcome, RSTP isn't going to get you there.

  • it doesn't really make sense to bolt vlan interfaces into different bridges on a linux box if you're going to run STP - each bridge will try and run STP itself, and madness will ensue.

So the way that I ended up doing vlans is mildly counterintuitive - by making vlan subinterfaces of the first bridge and then bonding them into another bridge that exists only within the server, and doesn't run STP. So, for eg, to make vlan 101 on the switch network available to some VE's on the box, I converted the switchports facing the linux boxes into trunks, with vlan 101 tagged. I then added this to /etc/network/interfaces:

auto br1.101
iface br1.101 inet manual
 vlan_raw_device br1

auto vlan101
iface vlan101 inet manual
 bridge_ports br1.101
 bridge_stp off
 bridge_fd 0
 bridge_maxwait 0

This gives you a bridge called vlan101 that you can then insert your OpenVZ/KVM/LXC virtual ethernet devices into, you can repeat ad nauseum for other vlans.

There's an implicit race in the above config, in that br1.101 needs to be up before vlan101 can bond it. At the loss of some readability, a more robust approach is probably:

auto vlan101
iface vlan101 inet manual
 bridge_ports br1.101
 bridge_stp off
 bridge_fd 0
 bridge_maxwait 0
 pre-up vconfig set_name_type DEV_PLUS_VID_NO_PAD
 pre-up vconfig add br1 101
 post-down vconfig rem br1.101

Last observations

  • the native vlan under linux can be a bit of a lottery - if you're making vlan sub interfaces of any interface (hardware interfaces, or bridges, or other vlans), then you should not be suprised if the base device stops working (ie mixing eth0 and eth0.101 and eth0.1234 can lead to unpredictable behavior, and eth0 and eth0.101 and eth0.101.2345 even more so).

  • it is a good idea to enable root-guard on the switch ports facing the servers, if the switch supports it - that will stop a server setting its bridge priority to zero and forcing an election.