Bandwidth cut in half when traversing system but direct bandwidth test is fine

Started by SimonGuy, April 01, 2024, 12:27:00 AM

Previous topic - Next topic
Preamble
Hi there :-),
i have a problem with my opnsense setup that is strange to me, maybe someone has in idea on where to poke further.

The Situation/Problem

Given is a opnsense box on decent hardware (see hardware) that is supposed to route traffic from interface1 to interface2 or vice versa at nearly link speed (1G).

Launching iperf from the opnsense machine testing the connection to each system (ubuntu test machine and windows test machine) it reaches "at least high" speed.










FromToIperf result
opnsense  ubuntu  ~833 Mbits/sec 
opnsense  windows  ~653 Mbits/sec 
windows  ubuntu  ~302 Mbits/sec 

Observation
It seems that when traffic is routed through opnsense, my bandwidth is cut in half.


ix0,ix2                                                                                                               
media: Ethernet autoselect (10Gbase-Twinax <full-duplex,rxpause,txpause>)                                             
                                                                                                                       
                                                               ┌──────────────────────────────────────────────────────┐
                                                               │                       opnsense                       │
                                                               │                                                      │
┌──────────────────────┐                                       │                                                      │
│An Ubuntu test machine│                                       ├────────────┐                                         │
│192.168.77.47         │                                       │            ├────────────────┐     ┌─────────────┐    │
│                      │    Tagged    ┌──────────┐   Tagged    ├───┐        │ lagg0_vlan335  │◄───►│interface1   │    │
│CPU use < 25%         │◄────────────►│          │◄───────────►│ix0│        ├────────────────┘     │192.168.77.51│    │
└──────────────────────┘      1G      │ Multiple │    10G      ├───┘        │                      └─────────────┘    │
                                      │ Mikrotik │             │      lagg0 │                                         │
┌──────────────────────┐    Tagged    │ Switches │   Tagged    ├───┐        │                      ┌─────────────┐    │
│A Windows test machine│◄────────────►│          │◄───────────►│ix1│        ├────────────────┐     │interface2   │    │
│10.10.10.76           │      1G      └──────────┘    10G      ├───┘        │ lagg0_vlan1111 │◄───►│10.10.10.1   │    │
│                      │               Less then               │            ├────────────────┘     └─────────────┘    │
│CPU use < 25%         │               5% CPU use              ├────────────┘                                         │
└──────────────────────┘                                       │                                                      │
                                                               │          All CPUs are idle during transmission       │
                                                               └──────────────────────────────────────────────────────┘




Opnsense System



CPU typeIntel(R) Xeon(R) CPU D-1518 @ 2.20GHz (4 cores, 8 threads)



Memory usage11 % ( 918/8044 MB ) { ARC size 229 MB }



VersionOPNsense 24.1.4-amd64



Network Card (onboard)Intel(R) X552 (SFP+)




MainboardSupermicro X10SDV-TP8F


MTU related

Opnsense uses the following MTUs:



InterfaceMTU



ix01470



ix11470



lagg01470




lagg0_vlan3351300

What has been done so far



  • Messed with the MTU (should not be a problem has the opnsense can communicate fine with ubnutu?)
  • Checked and unchecked "Disable reply-to"
  • Used "pfctl -d" temporarily to disable the firewall
  • Unchecked all three of "Hardware CRC", "Hardware TSO" and "Hardware LRO"


I did not mess with the checkbox "VLAN Hardware Filtering" yet.

Special notes



  • The system is in a HA mode with an identical other node
  • The system has a lot (20+) Interfaces on vlans all on lagg0
  • An IPSEC VDI Tunnel is again slower, even when using local ehternet only connection (just a side node, its what got the investigation started...)
  • Checked CPU load on all related switches, all are sube 10% and bandwidth seems no issue aswell
  • The system has only a few fw rules
  • IPERF Options used (-w 64KB), tests added to the post are with "-t 2" but to shorten the output. longer tests show similar results



Some results

From opnsense to windows test machine

Connecting to host 10.10.10.76, port 6666
[  5] local 10.10.10.2 port 6661 connected to 10.10.10.76 port 6666
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.03   sec   105 MBytes   853 Mbits/sec    0    209 KBytes
[  5]   1.03-2.00   sec  98.6 MBytes   856 Mbits/sec    0    209 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-2.00   sec   204 MBytes   855 Mbits/sec    0             sender
[  5]   0.00-2.00   sec   204 MBytes   855 Mbits/sec                  receiver



From opnsense to ubuntu test machine

Connecting to host 192.168.77.47, port 6666
[  5] local 192.168.77.52 port 7437 connected to 192.168.77.47 port 6666
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   109 MBytes   914 Mbits/sec    0   3.00 MBytes
[  5]   1.00-2.00   sec   108 MBytes   903 Mbits/sec    0   3.00 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-2.00   sec   217 MBytes   909 Mbits/sec    0             sender
[  5]   0.00-2.00   sec   217 MBytes   909 Mbits/sec                  receiver


From windows test machine to ubuntu test machine

Connecting to host 192.168.77.47, port 6666
[  4] local 10.10.10.76 port 57609 connected to 192.168.77.47 port 6666
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  47.6 MBytes   399 Mbits/sec
[  4]   1.00-2.00   sec  49.0 MBytes   411 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-2.00   sec  96.6 MBytes   405 Mbits/sec                  sender
[  4]   0.00-2.00   sec  96.6 MBytes   405 Mbits/sec                  receiver


My questions



  • Why can the system talk to the test machines so fast but traffic that traverses the opnsense is cut in half?
  • Can i rule out MTU settings as the opnsense system can communicate fine when doing so directly? (it should use the same ports with the same vlans and so on... so it must be fine - right?)
  • Is there any reason to doubt the network card or the driver when again, it can work fine communicating directly? I suppose not(?)


Update 2024-04-05


  • Fiddling with the flow control sysctl setting did not help with the problem.
  • Changing to different DAC cables CISCO, HP, Huawei (recommenden by our supplier for this card/board)
dev.ix.0.fc = 0
dev.ix.1.fc = 0


I would be greatfull for any advice. After days been spent on this problem i am losing my mind.

Kind regards and thanks in advance
SimonGuy

I've been struggling with the same problem and have been trying all kinds of tuneables to get things back to normal. Nothing was working. I just added the following to disable flow control:

dev.ix.0.fc = 0   
dev.ix.1.fc = 0

My interfaces are ix0 and ix1. As soon as I added these, my throughput doubled. I'm also using an Intel card, so maybe this will work for you.

Hi pholt,

thank you very much for taking your time to write a response. I will try this as soon as possible. Unfortunately i will have to wait until tonight to be able to test it. Just wanted to get a quick response out, as i am very grateful for your tip.

I will report back tonight.

 

Hi pholt,

sadly this did not help. I spent my night adding different old 10G network cards but i had no luck. Some weren't recognized at all, others didn't find a carrier. My test setup was a bit clunky, as i had to pull the unit out of the rack, which limited my DAC cable options.

At the end of the night, i ordered some Mellanox CX312A ConnectX-3. I had no problems with them on other FreeBSD systems in the past.

It is sad, as this system (Supermicro X10SDV-TP8F) was specifically bought because it hat dual SFP+ 10G. Now i have four systems of this type, all where the 10G ports are even worse then the 1G ports. Our supplier Thomas Krenn stated that this hardware is opensense compatible. Though i have not contacted their support, as i assume this does not qualify as a hardware issue.

Thanks for your help anyway.
Regards from here to there :-)

For those who seek answers...

We got it up to around 10G by fiddling around with some tunables: https://binaryimpulse.com/2022/11/opnsense-performance-tuning-for-multi-gigabit-internet/

Also, Some wired network switch links where replaced wich where responsible for die bandwidth halfling. Everything related to speed > 1G seems to be related to the tunables.

Thanks to all for tanking their time to read/help with this.

Kind regards

hi, I am in the same situation, I have the same hardware as you, my connection is not fast, just 200MB/200MB upload and download, but behind the fw I only have 100MB download and 112MB upload, what parameters helped you?