Hi,
I have an issue with my network performance. :'(
Data:
OS: OPNsense 21.7.1-amd64
CPU: AMD Ryzen 9 5900X 12-Core Processor (12 cores)
Memory: 2 x DDR4-3200 ECC 32GB
NIC: Intel E810XXVDA2 (driver: ice-0.29.4)
SFP: 2 x SFP28 25 Gbit/s
LACP: true
Connecting to host 172.23.0.1, port 5201
[ 5] local 172.23.23.23 port 49090 connected to 172.23.0.1 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 383 MBytes 3.21 Gbits/sec 0 1.60 MBytes
[ 5] 1.00-2.00 sec 654 MBytes 5.48 Gbits/sec 0 1.60 MBytes
[ 5] 2.00-3.00 sec 651 MBytes 5.46 Gbits/sec 0 1.60 MBytes
[ 5] 3.00-4.00 sec 651 MBytes 5.46 Gbits/sec 0 1.61 MBytes
[ 5] 4.00-5.00 sec 498 MBytes 4.17 Gbits/sec 0 1.70 MBytes
[ 5] 5.00-6.00 sec 650 MBytes 5.45 Gbits/sec 0 1.70 MBytes
[ 5] 6.00-7.00 sec 449 MBytes 3.76 Gbits/sec 0 1.70 MBytes
[ 5] 7.00-8.00 sec 192 MBytes 1.61 Gbits/sec 0 1.61 MBytes
[ 5] 8.00-9.00 sec 539 MBytes 4.52 Gbits/sec 0 1.62 MBytes
[ 5] 9.00-10.00 sec 298 MBytes 2.50 Gbits/sec 0 1.61 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 4.85 GBytes 4.16 Gbits/sec 0 sender
[ 5] 0.00-10.00 sec 4.85 GBytes 4.16 Gbits/sec receiver
ice0: flags=8b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=e10438<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,LRO,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6>
ether ***
media: Ethernet autoselect (25G-AUI <full-duplex>)
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
ice1: flags=8b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=e10438<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,LRO,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6>
ether ***
hwaddr ***
media: Ethernet autoselect (25G-AUI <full-duplex>)
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
lagg0: flags=8b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=e10438<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,LRO,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6>
ether ***
inet6 fe80::b696:91ff:fea6:ab18%lagg0 prefixlen 64 scopeid 0x9
inet ***** netmask 0xffffff00 broadcast ****
laggproto lacp lagghash l2,l3,l4
laggport: ice0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: ice1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
groups: lagg
media: Ethernet autoselect
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
What could be the problem?
Hi,
This looks like single-thread performance.. what does your per-CPU utilization look like? (e.g. 'top -P' during an iperf3 test)
Also check the output from 'netstat -Q'.
Cheers,
Stephan
How many parallel streams did you use?
I just use one iperf3 session and the cpu is not really stressed. :o
Connecting to host 172.23.0.1, port 5201
[ 5] local 172.23.23.23 port 47672 connected to 172.23.0.1 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 541 MBytes 4.54 Gbits/sec 0 1.28 MBytes
[ 5] 1.00-2.00 sec 550 MBytes 4.61 Gbits/sec 0 1.58 MBytes
[ 5] 2.00-3.00 sec 656 MBytes 5.51 Gbits/sec 0 1.58 MBytes
[ 5] 3.00-4.00 sec 656 MBytes 5.50 Gbits/sec 0 1.62 MBytes
[ 5] 4.00-5.00 sec 656 MBytes 5.51 Gbits/sec 0 1.62 MBytes
[ 5] 5.00-6.00 sec 659 MBytes 5.53 Gbits/sec 0 1.63 MBytes
[ 5] 6.00-7.00 sec 658 MBytes 5.52 Gbits/sec 0 1.63 MBytes
[ 5] 7.00-8.00 sec 584 MBytes 4.90 Gbits/sec 0 1.63 MBytes
[ 5] 8.00-9.00 sec 645 MBytes 5.41 Gbits/sec 0 1.63 MBytes
[ 5] 9.00-10.00 sec 632 MBytes 5.31 Gbits/sec 0 1.63 MBytes
[ 5] 10.00-11.00 sec 542 MBytes 4.55 Gbits/sec 0 1.63 MBytes
[ 5] 11.00-12.00 sec 630 MBytes 5.28 Gbits/sec 0 1.63 MBytes
[ 5] 12.00-13.00 sec 266 MBytes 2.23 Gbits/sec 0 1.65 MBytes
[ 5] 13.00-14.00 sec 649 MBytes 5.44 Gbits/sec 0 1.65 MBytes
[ 5] 14.00-15.00 sec 654 MBytes 5.48 Gbits/sec 0 1.65 MBytes
[ 5] 15.00-16.00 sec 656 MBytes 5.51 Gbits/sec 0 1.65 MBytes
[ 5] 16.00-17.00 sec 655 MBytes 5.49 Gbits/sec 0 1.65 MBytes
[ 5] 17.00-18.00 sec 626 MBytes 5.26 Gbits/sec 0 1.65 MBytes
[ 5] 18.00-19.00 sec 521 MBytes 4.37 Gbits/sec 0 1.65 MBytes
[ 5] 19.00-20.00 sec 650 MBytes 5.45 Gbits/sec 0 1.65 MBytes
[ 5] 20.00-21.00 sec 641 MBytes 5.38 Gbits/sec 0 1.65 MBytes
[ 5] 21.00-22.00 sec 640 MBytes 5.37 Gbits/sec 0 1.65 MBytes
[ 5] 22.00-23.00 sec 636 MBytes 5.34 Gbits/sec 0 1.65 MBytes
[ 5] 23.00-24.00 sec 516 MBytes 4.33 Gbits/sec 0 1.65 MBytes
[ 5] 24.00-25.00 sec 516 MBytes 4.33 Gbits/sec 0 1.65 MBytes
[ 5] 25.00-26.00 sec 388 MBytes 3.25 Gbits/sec 0 1.65 MBytes
[ 5] 26.00-27.00 sec 531 MBytes 4.46 Gbits/sec 0 1.65 MBytes
[ 5] 27.00-28.00 sec 519 MBytes 4.35 Gbits/sec 0 1.65 MBytes
[ 5] 28.00-29.00 sec 515 MBytes 4.32 Gbits/sec 0 1.65 MBytes
[ 5] 29.00-30.00 sec 478 MBytes 4.01 Gbits/sec 0 1.65 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-30.00 sec 17.1 GBytes 4.88 Gbits/sec 0 sender
[ 5] 0.00-30.01 sec 17.1 GBytes 4.88 Gbits/sec receiver
top -P
last pid: 43540; load averages: 1.54, 0.66, 0.44 up 1+01:41:04 13:49:43
78 processes: 3 running, 75 sleeping
CPU 0: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU 1: 20.2% user, 0.0% nice, 1.0% system, 0.0% interrupt, 78.8% idle
CPU 2: 0.0% user, 0.0% nice, 28.6% system, 0.0% interrupt, 71.4% idle
CPU 3: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU 4: 0.0% user, 0.0% nice, 0.0% system, 1.0% interrupt, 99.0% idle
CPU 5: 0.0% user, 0.0% nice, 0.0% system, 100% interrupt, 0.0% idle
CPU 6: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU 7: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU 8: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU 9: 1.0% user, 0.0% nice, 99.0% system, 0.0% interrupt, 0.0% idle
CPU 10: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU 11: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
Mem: 3595M Active, 7487M Inact, 2007M Wired, 1133M Buf, 50G Free
Swap: 4096M Total, 4096M Free
netstat -Q
root@fwn01:~ # netstat -Q
Configuration:
Setting Current Limit
Thread count 12 12
Default queue limit 256 10240
Dispatch policy deferred n/a
Threads bound to CPUs disabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 3000 flow default ---
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 source direct ---
ip6 6 256 flow default ---
Workstreams:
WSID CPU Name Len WMark Disp'd HDisp'd QDrops Queued Handled
0 0 ip 803 1406 0 0 0 30184701 30183126
0 0 igmp 0 1 0 0 0 2 2
0 0 rtsock 0 0 0 0 0 0 0
0 0 arp 0 0 0 0 0 0 0
0 0 ether 0 0 0 0 0 0 0
0 0 ip6 0 256 0 0 1086 8305058 8305058
1 1 ip 0 2 0 0 0 31716 31716
1 1 igmp 0 1 0 0 0 2 2
1 1 rtsock 0 0 0 0 0 0 0
1 1 arp 0 1 0 0 0 51 51
1 1 ether 0 0 0 0 0 0 0
1 1 ip6 0 1 0 0 0 239 239
2 2 ip 0 640 0 0 0 2187948 2187948
2 2 igmp 0 1 0 0 0 2 2
2 2 rtsock 0 0 0 0 0 0 0
2 2 arp 0 1 0 0 0 592 592
2 2 ether 0 0 129116770 0 0 0 129116770
2 2 ip6 0 256 0 0 7336 25508639 25508639
3 3 ip 0 6 0 0 0 213615 213615
3 3 igmp 0 1 0 0 0 6 6
3 3 rtsock 0 0 0 0 0 0 0
3 3 arp 0 2 0 0 0 51160 51160
3 3 ether 0 0 26129849 0 0 0 26129849
3 3 ip6 0 1 0 0 0 234 234
4 4 ip 0 5 0 0 0 219790 219790
4 4 igmp 0 1 0 0 0 2 2
4 4 rtsock 0 0 0 0 0 0 0
4 4 arp 0 1 0 0 0 3266 3266
4 4 ether 0 0 0 0 0 0 0
4 4 ip6 0 1 0 0 0 238 238
5 5 ip 0 67 0 0 0 905710 905710
5 5 igmp 0 0 0 0 0 0 0
5 5 rtsock 0 256 0 0 48930 372583 372583
5 5 arp 0 0 0 0 0 0 0
5 5 ether 0 0 0 0 0 0 0
5 5 ip6 0 0 0 0 0 0 0
6 6 ip 0 23 0 0 0 592388 592388
6 6 igmp 0 0 0 0 0 0 0
6 6 rtsock 0 0 0 0 0 0 0
6 6 arp 0 0 0 0 0 0 0
6 6 ether 0 0 0 0 0 0 0
6 6 ip6 0 73 0 0 0 238728 238728
7 7 ip 0 9 0 0 0 220322 220322
7 7 igmp 0 0 0 0 0 0 0
7 7 rtsock 0 0 0 0 0 0 0
7 7 arp 0 0 0 0 0 0 0
7 7 ether 0 0 0 0 0 0 0
7 7 ip6 0 1 0 0 0 6 6
8 8 ip 0 1 0 0 0 226 226
8 8 igmp 0 0 0 0 0 0 0
8 8 rtsock 0 0 0 0 0 0 0
8 8 arp 0 0 0 0 0 0 0
8 8 ether 0 0 0 0 0 0 0
8 8 ip6 0 0 0 0 0 0 0
9 9 ip 0 5 0 0 0 115408 115408
9 9 igmp 0 1 0 0 0 2 2
9 9 rtsock 0 0 0 0 0 0 0
9 9 arp 0 2 0 0 0 45357 45357
9 9 ether 0 0 0 0 0 0 0
9 9 ip6 0 0 0 0 0 0 0
10 10 ip 0 3000 0 0 509 7527121 7527121
10 10 igmp 0 1 0 0 0 2 2
10 10 rtsock 0 0 0 0 0 0 0
10 10 arp 0 3 0 0 0 10928 10928
10 10 ether 0 0 0 0 0 0 0
10 10 ip6 0 256 0 0 306 3586362 3586362
11 11 ip 0 9 0 0 0 34905 34905
11 11 igmp 0 0 0 0 0 0 0
11 11 rtsock 0 0 0 0 0 0 0
11 11 arp 0 0 0 0 0 0 0
11 11 ether 0 0 0 0 0 0 0
11 11 ip6 0 0 0 0 0 0 0
Try using
iperf3 -c <IP> -P <number of parallel threads>
first to see if it influences the results. There is a whole plethora of options enabled by default on a NIC and a single thread will most likely cause all packets to arrive on a single core due to the L2+L3 packet headers having the same source/destination. The fact that one core has a 100% interrupt rate confirms this. Given that your system has 12 cores, I'd say start with 12 parallel threads.
EDIT: The 100% interrupt rate is from your netisr policy being 'deferred'. Refer to https://forum.opnsense.org/index.php?topic=23986.0 for more information.
I enabled now net.isr.bindthreads and it's a little bit better. But it's not like a 25 Gbit/s should be. :/
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-15.00 sec 9.44 GBytes 5.41 Gbits/sec 0 sender
[ 5] 0.00-15.00 sec 9.44 GBytes 5.40 Gbits/sec receiver
I also tested it with parallel mode and it makes it worse.
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-15.00 sec 4.66 GBytes 2.67 Gbits/sec 0 sender
[ 5] 0.00-15.00 sec 4.65 GBytes 2.66 Gbits/sec receiver
[ 7] 0.00-15.00 sec 4.20 GBytes 2.41 Gbits/sec 0 sender
[ 7] 0.00-15.00 sec 4.20 GBytes 2.41 Gbits/sec receiver
[SUM] 0.00-15.00 sec 8.86 GBytes 5.07 Gbits/sec 0 sender
[SUM] 0.00-15.00 sec 8.86 GBytes 5.07 Gbits/sec receiver
I found the issue with the network performance. It was related to the Intel network card. Now the performance is good.
I changed three things:
* update the nic firmware
* update the nic driver
* most important load the DDP package (ice_ddp_load="YES")
Yay, glad to hear. Do you mind sharing your current performance numbers?
Thanks,
Franco