Skip to content

Performance issues when adding subnets to firewall instead of interface #1153

@saiarcot895

Description

@saiarcot895

Not sure if this should be considered a netavark firewall issue, a kernel issue, or just performance limitations.

I have a server (Ryzen 5600X, 1 gigabit ethernet port with 4 VLANs on top of it) that is acting as my home router as well as running a bunch of containers. I have 47 containers running, across 11 networks (resulting in 11 network interfaces getting created). Each of those networks has a private IPv4 subnet, public IPv6 subnet, and a ULA (private) IPv6 subnet assigned. I'm using firewalld as my backend (which itself is using nftables as the backend), with other non-container-related rules configured for general connectivity and firewalling. I also have CAKE SQM set up in both the ingress and egress directions, with the bandwidth set to 1Gb/s. The SQM for the ingress direction is set up by redirecting packets that come in to an IFB interface; this is done by adding a tc filter rule.

Recently, I did a test with iperf3 between this server and two devices connected via ethernet. On both devices, TCP traffic from the server to the device is sent around 920Mb/s-970Mb/s (basically the full line rate), but TCP traffic from the device to the server is sent at a max of ~600Mb/s, with occasional drops to 200Mb/s. During this time, I can see that ksoftirqd is running at 100% on the server, suggesting that there's a CPU bottleneck involved. (When traffic is sent from the server to the device, ksoftirqd is not at 100% on either side.)

Here, 192.168.3.1 is the server which is running as a router and is running all of the containers.

$ iperf3 -c 192.168.3.1 -t 60
Connecting to host 192.168.3.1, port 5201
[  5] local 192.168.3.21 port 35770 connected to 192.168.3.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  73.8 MBytes   618 Mbits/sec    0    296 KBytes
[  5]   1.00-2.00   sec  73.5 MBytes   617 Mbits/sec    0    250 KBytes
[  5]   2.00-3.00   sec  73.0 MBytes   612 Mbits/sec    0    227 KBytes
[  5]   3.00-4.00   sec  72.6 MBytes   609 Mbits/sec    0    168 KBytes
[  5]   4.00-5.00   sec  69.5 MBytes   583 Mbits/sec    0    285 KBytes
[  5]   5.00-6.00   sec  72.6 MBytes   609 Mbits/sec    0    250 KBytes
[  5]   6.00-7.00   sec  72.1 MBytes   605 Mbits/sec    0    227 KBytes
[  5]   7.00-8.00   sec  70.4 MBytes   590 Mbits/sec    0    168 KBytes
[  5]   8.00-9.00   sec  72.1 MBytes   605 Mbits/sec    0    174 KBytes
[  5]   9.00-10.00  sec  71.2 MBytes   598 Mbits/sec    0    238 KBytes
[  5]  10.00-11.00  sec  72.6 MBytes   609 Mbits/sec    0    122 KBytes
[  5]  11.00-12.00  sec  72.1 MBytes   605 Mbits/sec    0    221 KBytes
[  5]  12.00-13.00  sec  72.6 MBytes   609 Mbits/sec    0    168 KBytes
[  5]  13.00-14.00  sec  71.8 MBytes   602 Mbits/sec    0    180 KBytes
[  5]  14.00-15.00  sec  68.4 MBytes   574 Mbits/sec    0    378 KBytes
[  5]  15.00-16.00  sec  71.2 MBytes   598 Mbits/sec    0    116 KBytes
[  5]  16.00-17.00  sec  71.2 MBytes   598 Mbits/sec    0    250 KBytes
[  5]  17.00-18.00  sec  72.6 MBytes   609 Mbits/sec    0    168 KBytes
[  5]  18.00-19.00  sec  72.1 MBytes   605 Mbits/sec    0    331 KBytes
[  5]  19.00-20.00  sec  73.0 MBytes   612 Mbits/sec    0    261 KBytes
[  5]  20.00-21.00  sec  72.6 MBytes   609 Mbits/sec    0    180 KBytes
[  5]  21.00-22.00  sec  70.4 MBytes   590 Mbits/sec    0    267 KBytes
[  5]  22.00-23.00  sec  72.1 MBytes   605 Mbits/sec    0    174 KBytes
[  5]  23.00-24.00  sec  72.2 MBytes   606 Mbits/sec    0    221 KBytes
[  5]  24.00-25.00  sec  69.6 MBytes   584 Mbits/sec    0    349 KBytes
[  5]  25.00-26.00  sec  71.2 MBytes   598 Mbits/sec    0    180 KBytes
[  5]  26.00-27.00  sec  71.2 MBytes   598 Mbits/sec    0    238 KBytes
...
$ iperf3 -c 192.168.3.1 --bidir
Connecting to host 192.168.3.1, port 5201
[  5] local 192.168.3.21 port 37654 connected to 192.168.3.1 port 5201
[  7] local 192.168.3.21 port 37670 connected to 192.168.3.1 port 5201
[ ID][Role] Interval           Transfer     Bitrate         Retr  Cwnd
[  5][TX-C]   0.00-1.00   sec  75.5 MBytes   633 Mbits/sec    0    459 KBytes
[  7][RX-C]   0.00-1.00   sec  90.5 MBytes   759 Mbits/sec
[  5][TX-C]   1.00-2.00   sec  76.5 MBytes   642 Mbits/sec    0    279 KBytes
[  7][RX-C]   1.00-2.00   sec  89.9 MBytes   754 Mbits/sec
[  5][TX-C]   2.00-3.00   sec  78.1 MBytes   655 Mbits/sec    0    378 KBytes
[  7][RX-C]   2.00-3.00   sec   106 MBytes   891 Mbits/sec
[  5][TX-C]   3.00-4.00   sec  78.4 MBytes   657 Mbits/sec    0    349 KBytes
[  7][RX-C]   3.00-4.00   sec  89.0 MBytes   747 Mbits/sec
[  5][TX-C]   4.00-5.00   sec  77.2 MBytes   648 Mbits/sec    0    192 KBytes
[  7][RX-C]   4.00-5.00   sec   108 MBytes   909 Mbits/sec
[  5][TX-C]   5.00-6.00   sec  79.4 MBytes   666 Mbits/sec    0    279 KBytes
[  7][RX-C]   5.00-6.00   sec  80.2 MBytes   673 Mbits/sec
[  5][TX-C]   6.00-7.00   sec  74.9 MBytes   628 Mbits/sec    0    325 KBytes
[  7][RX-C]   6.00-7.00   sec  97.5 MBytes   818 Mbits/sec
[  5][TX-C]   7.00-8.00   sec  65.1 MBytes   546 Mbits/sec    0    296 KBytes
[  7][RX-C]   7.00-8.00   sec  94.4 MBytes   792 Mbits/sec
[  5][TX-C]   8.00-9.00   sec  72.2 MBytes   606 Mbits/sec    0    267 KBytes
[  7][RX-C]   8.00-9.00   sec  99.2 MBytes   833 Mbits/sec
[  5][TX-C]   9.00-10.00  sec  78.1 MBytes   655 Mbits/sec    0    465 KBytes
[  7][RX-C]   9.00-10.00  sec   101 MBytes   846 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-10.00  sec   756 MBytes   634 Mbits/sec    0             sender
[  5][TX-C]   0.00-10.00  sec   754 MBytes   633 Mbits/sec                  receiver
[  7][RX-C]   0.00-10.00  sec   957 MBytes   803 Mbits/sec    0             sender
[  7][RX-C]   0.00-10.00  sec   956 MBytes   802 Mbits/sec                  receiver
$ iperf3 -c 192.168.3.1 --reverse -t 60
Connecting to host 192.168.3.1, port 5201
Reverse mode, remote host 192.168.3.1 is sending
[  5] local 192.168.3.21 port 38648 connected to 192.168.3.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   117 MBytes   982 Mbits/sec
[  5]   1.00-2.00   sec   117 MBytes   984 Mbits/sec
[  5]   2.00-3.00   sec   117 MBytes   985 Mbits/sec
[  5]   3.00-4.00   sec   116 MBytes   976 Mbits/sec
[  5]   4.00-5.00   sec   117 MBytes   985 Mbits/sec
[  5]   5.00-6.00   sec   117 MBytes   984 Mbits/sec
[  5]   6.00-7.00   sec   117 MBytes   983 Mbits/sec
^C[  5]   7.00-7.33   sec  38.2 MBytes   983 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-7.33   sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-7.33   sec   858 MBytes   982 Mbits/sec                  receiver
iperf3: interrupt - the client has terminated

I started with disabling CAKE SQM, and that restored it back to the full line rate, and CPU usage increase was barely noticeable (not counting the CPU usage for iperf3 itself). That initially led me to think there was something in the SQM setup that was causing some slowdown in the kernel (maybe the redirection of packets to the ifb interface that gets created?). However, I couldn't see any reports of this online.

I then set up the same CAKE SQM on the device (192.168.3.21) to see if it was reproducible there, but iperf3 was able to send and receive at full line rate, so whatever it was, it was something specific to the server.

I used perf top to see if anything stood out, and the top consumer was nftables. Comparing it to perf top on the device, I didn't see nftables at the top of the list, which suggests it was related to the firewall. On a hunch, I stopped all of the pods and containers, and iperf3 was able to achieve line rate in both directions.

I looked at the rules generated, and the netavark_zone is adding each subnet of each network, rather than adding the interface of each network into the zone. Because of this, there are 3x as many entries. Additionally, because I have some firewall policies between netavark_zone and the host (and other zones), there are some additional rules being generated.

I modified the zone definition to have the interfaces instead of the subnets, and after making this change, iperf3 can achieve line rate, with no noticeable CPU increase.

This suggests one or more of the following:

  1. Nftables is inefficient for some reason when parsing/matching IPv4 or IPv6 addresses.
  2. Firewalld is generating rules for nftables inefficiently and is not taking advantages of built-in features (such as sets).
  3. The number of rules that need to be generated because of the use of IPv4/IPv6 addresses instead of interfaces is so much greater that it slows down the firewall processing code.

I'm not sure about 1, but I'd like to think that this is fairly well optimized in the kernel. For 2, while writing this up, I checked firewalld's issue tracker to see if there's anything there, and found firewalld/firewalld#1399, so this is somewhat of a known issue there. For 3, I'm thinking that if there's no specific reason for using subnets instead of interfaces in the firewall rules, netavark could instead filter by interface instead of subnet, which would prevent rule explosion if there's both an IPv4 and IPv6 address present for the network.

Can suggestion 3 be looked into?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions