2020年3月25日星期三

Reply UDP with correct source address on a multihomed Linux server

这篇文章有中文版:https://blog.swineson.me/________(还未发布)

Multihoming means connecting a machine to multiple computer networks. A multihomed server has multiple IP address on either single or multiple network interfaces.
One particular problem for Linux is, all outgoing UDP packets will use the primary IP address, even for requests sent to the secondary IP address.

Fix at userland, if possible

First, check whether your application supports multihoming. For example, OpenVPN supports a --multihome switch to enable multihome support.
Sometimes, you can bind the server to a specific IP address and run multiple instances on different IP addresses.

Otherwise, fix at kernel side

A: If all IP addresses are assigned to the same network interface

For example:
eth0:
    inet 192.0.2.2/24 brd 192.0.2.255 scope global eth0
        valid_lft forever preferred_lft forever
    inet 192.0.2.3/24 brd 192.0.2.255 scope global eth0
        valid_lft forever preferred_lft forever
The main routing table:
default via 192.0.2.1 dev eth0 onlink
We use iptables to perform a DNAT to the primary IP address for incoming UDP packets.
sudo iptables -t nat -A PREROUTING -i eth0 -m addrtype --dst-type LOCAL -p udp -j REDIRECT
sudo conntrack -F
If you prefer to use the newer nftables instead of the older iptables, here is the equivalant nft commands:
sudo nft add table nat
sudo nft add chain nat prerouting { type nat hook prerouting priority dstnat \; }
sudo nft add rule ip nat prerouting iifname "eth0" meta l4proto udp fib daddr type local counter redirect
sudo conntrack -F
That's it, we've done.

B: If IP addresses are assigned to different network interfaces

For example:
eth0:
    inet 192.0.2.2/24 brd 192.0.2.255 scope global eth0
        valid_lft forever preferred_lft forever
eth1:
    inet 198.51.100.2/24 brd 198.51.100.255 scope global eth0
        valid_lft forever preferred_lft forever
The main routing table:
default via 192.0.2.1 dev eth0 onlink
First, we disable reverse path filter on eth1:
sudo sysctl net.ipv4.conf.eth1.rp_filter=0
You may want to write the sysctl configuration to /etc/sysctl.d to execute it automatically on boot. If you are worried about whether disabling rp_filter causes security issues, you can use additional firewall rules to protect you.
Next, we set up routing policies:
sudo ip rule add fwmark 0x42 pref 42 table 42
sudo ip route add default table 42 via 198.51.100.1 dev eth1
Then we set up connection tracking and DNAT for the UDP packets:
sudo iptables -t mangle -A INPUT -i eth1 -j MARK --set-mark 0x42
sudo iptables -t mangle -A INPUT -i eth1 -j CONNMARK --save-mark
sudo iptables -t mangle -A OUTPUT -j CONNMARK --restore-mark
sudo iptables -t nat -A PREROUTING -i eth1 -m addrtype --dst-type LOCAL -p udp -j DNAT --to-destination 192.0.2.2
sudo conntrack -F
The nftables equivalents are:
sudo nft add table mangle
sudo nft add table nat

sudo nft add chain ip mangle input { type filter hook input priority mangle \; }
sudo nft add chain ip mangle output { type route hook output priority mangle \; }
sudo nft add chain nat prerouting { type nat hook prerouting priority dstnat \; }

sudo nft add rule ip mangle input iifname "eth1" counter meta mark set 0x42
sudo nft add rule ip mangle input counter ct mark set mark
sudo nft add rule ip mangle output counter meta mark set ct mark
sudo nft add rule ip nat prerouting iifname "eth1" meta l4proto udp fib daddr type local counter dnat 192.0.2.2

sudo conntrack -F

Why SNAT may not work

You might have attempted to use an SNAT on outgoing packets and failed. That is because SNAT causes the request packet and the reponse packet being considered as separate connections by conntrack.
For example, your server listens on 0.0.0.0:53, and a request sents from 203.0.113.1:1024 to 198.51.100.2:53. Conntrack tracks a connection from 203.0.113.1:1024 to 198.51.100.2:53.
The server accepts the request, but replies with 192.0.2.2:53. This time, conntrack adds another connection from 192.0.2.2:53 to 203.0.113.1:1024, not merging with the previous one.
When the packet travels through the firewall, SNAT only applies to the second connection. To get things worse, the outgoing port number may be changed because the kernel thinks port 53 is already occupied!

If we use the DNAT method, conntrack will instead track a connection from 203.0.113.1:1024 to 192.0.2.2:53, even though the packet is actually sent to 198.51.100.2:53.
Then, when the server replies with 192.0.2.2:53, the reply packet directly matches the conntrack record, causing the source address to be restored to 198.51.100.2:53.

How to match and filter NATed packets

Since the filter chain comes after mangle and nat chain, we are unable to determine the intended destination address using the iptables -d switch.
Instead, you should use -m conntrack --ctorigdst. You can also use -m conntrack --ctstatus DNAT to determine whether the packet is DNATed or not.
For nftables uesrs, ct original daddr and ct status have the same usage.