Load balancing your FTP service

Load balancing a network protocol is something quite common nowadays. There are loads of ways to do that for HTTP for instance, and generally speaking all “single flow” protocols can be load-balanced quite easily.
However, some protocols are not as simple as HTTP and require several connections. This is exactly what is FTP.

Reminder: FTP modes

Let take a deeper look at the FTP protocol, in order to better understand how we can load-balance it.
In order for an FTP client to work properly, two connections must be opened between the client and the server:

  • A control connection
  • A data connection

The control connection is initiated by the FTP client on the TCP port 21 on the server. On the other end, the data connection can be created in different ways.

The first way is the through an “active” FTP session. In this mode the client sends a “PORT” command which randomly opens one of its network port and instruct the server to connect to it using port 20 as source port.
This mode is usually discouraged or even server configuration prevent it for security reasons (the server initiate the data connection to the client).

The second FTP mode is the “passive” mode. When using the passive mode a client sends a “PASV” command to the server. As a response the server opens a TCP port and sends the number and IP address as part of the PASV response so the client knows what socket to use. Modern FTP clients usually use this mode first is supported by the server.

There is a third mode which is the “extended passive” mode. It is very similar to the “passive” mode but the client sends an “EPSV” command (instead of “PASV”) and the server respond with only the number of the TCP port that has been chosen for data connection (without sending the IP address).

Load balancing concepts & deployment

So now that we know how FTP works we also know that load-balancing FTP requires balancing both the control connections and the data connections. The load balancer must also make sure that data connections are sent the right backend server, the one which replied to the client command.

Load balancing with LVS/Keepalived

Keepalived is a Linux based load-balancing system. It wraps the IPVS (also called LVS) software stack from the Linux-HA project and offer additional features like backend monitoring and VRRP redundancy.
The schema bellow shows how Keepalived proceed with FTP load-balancing. It tracks control connection on port 21 and dynamically handles the data connections using a Linux kernel module called “ip_vs_ftp” which inspect the control connection in order to be aware of the port that will be used to open the data connection.

Configuration steps are quite simple. Bellow is an example for a Debian-like system:

First install the software:
sudo apt-get install keepalived

Then create a configuration file using sample:
sudo cp /usr/share/doc/keepalived/samples/keepalived.conf.sample /etc/keepalived/keepalived.conf

Edit the newly created file in order to add a new virtual server and the associated backend servers:
virtual_server 192.168.0.39 21 {
    delay_loop 6
    lb_algo rr
    lb_kind NAT
    protocol TCP
    real_server 10.1.2.101 21 {
        weight 1
        TCP_CHECK {
            connect_port 21
            connect_timeout 3
        }
    }
    real_server 10.1.2.102 21 {
        weight 1
        TCP_CHECK {
            connect_port 21
            connect_timeout 3
        }
    }
}

The exemple above define a virtual server that listen on socket 192.168.0.39:21. Connections sent to this socket are redirected to backend servers using round-robin algorithm and after masquerading source IP address.
Additionally we need to load the FTP helper in order to track FTP data connections:
echo 'ip_vs_ftp' >> /etc/modules

It is important to note that this setup leverage the ftp kernel helper which reads the content of FTP frames. This means that it doesn’t work when FTP is secured using SSL/TLS

Dealing with secure FTP

When the FTP session is encrypted (end-to-end encryption of course) it’s not possible for a proxy or load balancer to track the session in order to anticipate the port openings for the data transfer channel.
It is therefore preferable to reserve distincts ports ranges for each of the backend servers. This setup is more painful to maintain but allows to circumvent the problems that comes with encryption.

Basically, we will divide the load balancing into several parts:

  • control connection load balancing
  • data transfer connection load balancing – one for load balancing pool for each backend server

So if we have 2 backend servers – as shown in the diagram below – we will create 3 load balancing connection pools (let’s call it that for now).

Prerequisites
  • The FTP server must be able to specify the ports to be used for the passive mode.
    For example, ProFTPd uses the configuration settings below:

    PassivePorts 20010 20020

    Obviously we would have to set different values on the other servers, making sure port ranges do not overlap.

  • It is important to note that, if you are using a NAT system in front of your FTP server, the server must allow you to specify the external IP address. For example, ProfTPd uses the configuration parameter below:

    MasqueradeAddress 192.168.0.39

If you have NAT in place but your FTP server is unable to specify the IP address returned in a PASV reply, most FTP clients won’t work (hence the requirement)! You can still try to force your FTP client to use EPSV instead of PASV even if the connection is In IPv4, but most client won’t let you do that

Indeed, the response to a PASV command returns both a port and an IP address. If your FTP server does not know the IP the client uses to connect to the server and the load-balancer can not provide this address translation on the fly (because the stream is encrypted) command is likely to fail. A common symptom is you can login to the server but you can’t browse the folder content.
The EPSV command, on the other hand, returns only a port number and the client must re-use the IP which is used to establish the connection to the server whatever the actual IP of the server.

FTP/TLS load balancing with Keepalived/LVS

This method, if it meets the prerequisites stated above, should work with FTPs (SSL wrapper) & FTP / TLS

It is not possible with LVS to specify a port range in the configuration (unless of course to create a “virtual_server” for each port which would be tedious and inefficient). On the other hand it is possible to use a funky functionality of iptables: the packets mangling. Here we ask the Linux kernel to mark the packets related to the FTP / TLS connections (using port numbers) and apply a mark to them. We can then reference this mark in the configuration instead of referencing a socket as a virtual_server.

First, we need to create the iptables rules. In our example we would then have:

sudo iptables -t mangle -A PREROUTING -i eth1 -p tcp --sport 1024: --dport 20000:20009 -j MARK --set-mark 0x1
sudo iptables -t mangle -A PREROUTING -i eth1 -p tcp --sport 1024: --dport 20010:20019 -j MARK --set-mark 0x2

Then we just add the configuration bellow in the /etc/keepalived/keepalived.conf file:


virtual_server fwmark 1 {
    delay_loop 6
    lb_algo rr
    lb_kind NAT
    protocol TCP

    real_server 10.1.2.101 {
        TCP_CHECK {
            connect_port 21
            connect_timeout 3
        }
    }
}

virtual_server fwmark 2 {
    delay_loop 6
    lb_algo rr
    lb_kind NAT
    protocol TCP

    real_server 10.1.2.102 {
        TCP_CHECK {
            connect_port 21
            connect_timeout 3
        }
    }
}

Load balancing with HAProxy

This method, if it meets the prerequisites stated above, should work with FTPs (SSL wrapper) & FTP / TLS

HAProxy is a modern and widely used load balancer. It provides similar features as Keepalived as much more. Nevertheless HAProxy is not able to track data connections as related to the global FTP session. For this reason we have to trick the FTP protcol in order to provide connection consistency within the session.

First install the software:
sudo apt-get install haproxy

HAProxy has the notion of “frontends” and “backends”. Frontends allow to define specific sockets (or set of sockets) each of which can be linked to different backends.

So we can use the configuration bellow:


frontend alfControlChannel
    bind *:21
    default_backend alfPool

frontend alf1DataChannel
    bind *:20000-20009
    default_backend alf1

frontend alf2DataChannel
    bind *:20010-20019
    default_backend alf2

backend alfPool
    server alf1 10.1.2.101 check port 21 inter 20s
    server alf2 10.1.2.102 check port 21 inter 20s

backend alf1
    server alf1 10.1.2.101 check port 21 inter 20s

backend alf2
    server alf2 10.1.2.102 check port 21 inter 20s

So in this case the frontend that handle the control connection load-balancing (alfControlChannel) alternatively sends requests to all backend server (alfPool). Each server (alf1 & alf2) will negotiate a data transfer socket on a different frontend (alf1DataChannel & alf2DataChannel). Each of this frontend will only forward data connection to the only corresponding backend (alf1 or alf2), thus making the load balancing sticky.
And… job done!


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.