Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - rungekutta

#31
Ok. Don't really know what to suggest then other than making sure bios and software are latest version and maybe try to verify you've got a genuine Intel nic? That's mainstream enterprise hardware and should be able to run 1Gb/s easily and should also be well supported.

Maybe someone else can add.
#32
1Gb/s should be easily achievable on quite modest hardware let alone enterprise gear like yours. What Intel NICs have you tried? And what are your other settings - are you using IPS, traffic shaping, anything else different from default?
#33
Hardware and Performance / Re: 10GB LAN Performance
December 18, 2021, 10:36:54 AM
Quote from: johnoatwork on December 17, 2021, 12:37:39 AM
Actually I've run up both TNSR and VyOS as a VMs with SR-IOV passthrough VFs from the T520-CRs. I don't have the numbers yet for VyOS, but with TNSR and a basic set of ACLs performance right out of the box is double what I was getting with FreeBSD packet filters.
That's to say you got double 1Gb/s i.e. still only ~2Gb/s forwarding performance? Still sounds very low. Would be interesting to hear your equivalent VyOS performance.

Quote from: johnoatwork on December 17, 2021, 12:37:39 AM
Anyway, I've ordered an X710-DA2 for testing (the Intel card I previously tested with was x520).  Still hoping I can do this with OPNsense as it really is a great product.  But if I can't pinpoint the throughput issues I'll have to run with a Linux based distro.
After all my woes (https://forum.opnsense.org/index.php?topic=25263.15) I managed to get forwarding performance up to ~5 Gb/s through the Chelsio T520-SO-CR and Ryzen hardware so bit weird that your performance is is so low after having followed similar steps. Will be interesting to hear your results on Intel X710. And as mentioned on Linux also. NB that's a side project for me as well - setting up a minimal Debian 11 with routing and firewall through nftables, also unbound and dhcp server etc. Not got as far as live-testing it yet but curious how it will perform in comparison.
#34
Hardware and Performance / Re: 10GB LAN Performance
December 06, 2021, 01:07:28 PM
That's native mode. If in emulated mode the log will say something like

Quotegeneric_netmap_register   Emulated adapter for cxgbe1 activated

Also make sure that your interface assignments are against these VFs (vcxl) and not cxgbe.

Last but not least are you running Suricata? With that enabled I was never able to top 1Gb/s through the fw even without any rules active. So there's clearly some kind of bottleneck there too.
#35
Hardware and Performance / Re: 10GB LAN Performance
December 03, 2021, 01:50:53 PM
Check that OpnSense is using netmap in native mode. I just went through quite a struggle to get decent performance on the same NICs (coming from T420-CR to begin with). It was not at all trivial and none of it properly documented anywhere. See https://forum.opnsense.org/index.php?topic=25263.0
#36
Quote from: testo_cz on November 20, 2021, 05:25:59 PM
Was the earlier  firmware in the NIC something like too old or it was perhaps customized ?
Because as people often reuse HW / NICs , it might not have a genuin firmware. For example customized by e.g. server vendor.

The firmware was pretty old, 1.12 something which according to release notes is from 2014. Current version is 1.26 and from 2021. So Proxmox updated this for me, but not the boot rom, which was also from 2014. So when I had managed to update that as well (with Chelsio tools in DOS, booting from a USB stick) I could pass through the card ok to the OpnSense VM.

Quote from: testo_cz on November 20, 2021, 05:25:59 PM
I'm only getting familiar with Suricata.... Is it like utilizing 100% CPU if enabled ?
When I had rules enabled it pegged something like 3 or 4 cores (out of 8 available) but never got above 1Gb/s. Without any rules it used less CPU (less than 1 core) but still limited throughput to roughly 1Gb/s. I haven't looked much into tuning it, but there aren't many exposed options either via the GUI.
#37
Hardware and Performance / Re: WAN performance issue
November 17, 2021, 06:02:09 PM
Quote from: testo_cz on November 14, 2021, 09:36:15 AM
I don't see bxe driver or the Broadcom cards to be natively supported by netmap (4).
@rungekutta might soon report the difference in throughput between natively supported Chelsio NIC and NIC supported only in emulated netmapmode.
The difference was from approx 1Gb/s to approx 6Gb/s. So quite a big difference. Netmap in emulated mode (ie without native support in NIC drivers) seem to struggle to break 1Gb/s no matter the hardware. At least in Hardened/FreeBSD.
#38
To add 2 more points.

First, adding


hw.cxgbe.nrxq_vi=8
hw.cxgbe.ntxq_vi=8
hw.cxgbe.nnmtxq_vi=8
hw.cxgbe.nnmrxq_vi=8


creates 8 rx/tx queues also for these virtual ports:


vcxl0: netmap queues/slots: TX 8/1023, RX 8/1024
vcxl0: 8 txq, 8 rxq (NIC); 8 txq, 8 rxq (netmap)
vcxl1: netmap queues/slots: TX 8/1023, RX 8/1024
vcxl1: 8 txq, 8 rxq (NIC); 8 txq, 8 rxq (netmap)


Second, unfortunately, Suricata now destroys performance, even without any rules active!

Here's a WAN speed test from another machine (through OPNsense)


root@xxx:~/tmp # ./fast
-> 984.61 Mbps
root@xxx:~/tmp # ./fast
-> 5.55 Gbps


First is with Suricata enabled and in IPS mode (but no rules), the second is with Suricata disabled. Disappointing. But maybe this can be tuned. For now, I'm disabling Suricata.
#39
Ok, I'm back with some results.

It took a while to get the card up and running. Unlike FreeBSD, OPNsense does not include firmware for this card (Chelsio T520-SO-CR) (why?!). Fortunately Proxmox updated the card firmware for me automatically (although with some alarming error messages - seems to have gone ok though...). Next problem was that passthrough didn't work, the VM never got past SeaBios initialisation. I managed to resolve that from FreeDOS and flashing the boot ROM on the card with tools downloaded from Chelsio.

Once I got the card up and running in OpnSense, netmap unfortunately still ran in emulated mode, and with the same underwhelming results as before.

More Googling showed that netmap only works with virtual functions on this card. So I had to add

hw.cxgbe.num_vis=2

in /boot/loader.conf.local. Then after boot, I have vcxl0 as well as cxl0 mapped to the same physical port on the card. BUT vcxl0 looks more promising:


root@xxx:~ # dmesg | grep vcxl
vcxl0: <port 0 vi 1> on cxl0
vcxl0: Ethernet address: 00:07:43:36:a3:a1
vcxl0: netmap queues/slots: TX 2/1023, RX 2/1024
vcxl0: 1 txq, 1 rxq (NIC); 2 txq, 2 rxq (netmap)
vcxl1: <port 1 vi 1> on cxl1
vcxl1: Ethernet address: 00:07:43:36:a3:a9
vcxl1: netmap queues/slots: TX 2/1023, RX 2/1024
vcxl1: 1 txq, 1 rxq (NIC); 2 txq, 2 rxq (netmap)
vcxl1: link state changed to UP
vcxl0: link state changed to UP
vcxl1: tso4 disabled due to -txcsum.
vcxl1: tso6 disabled due to -txcsum6.
vcxl1: nrxq (1) != kernel RSS buckets (8);performance will be impacted.


So I reassigned LAN and WAN to the virtual functions instead and re-ran iperf3. Better!


root@xxx:~ # iperf3 -c 192.168.200.1 -R
Connecting to host 192.168.200.1, port 5201
Reverse mode, remote host 192.168.200.1 is sending
[  5] local 192.168.200.10 port 58912 connected to 192.168.200.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   819 MBytes  6.87 Gbits/sec
[  5]   1.00-2.00   sec   843 MBytes  7.07 Gbits/sec
[  5]   2.00-3.00   sec   830 MBytes  6.96 Gbits/sec
[  5]   3.00-4.00   sec   827 MBytes  6.94 Gbits/sec
[  5]   4.00-5.00   sec   835 MBytes  7.00 Gbits/sec
[  5]   5.00-6.00   sec   856 MBytes  7.18 Gbits/sec
[  5]   6.00-7.00   sec   831 MBytes  6.97 Gbits/sec
[  5]   7.00-8.00   sec   870 MBytes  7.30 Gbits/sec
[  5]   8.00-9.00   sec   823 MBytes  6.90 Gbits/sec
[  5]   9.00-10.00  sec   825 MBytes  6.92 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.17  sec  8.17 GBytes  6.89 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  8.16 GBytes  7.01 Gbits/sec                  receiver

iperf Done.
root@xxx:~ # iperf3 -c 192.168.200.1
Connecting to host 192.168.200.1, port 5201
[  5] local 192.168.200.10 port 62693 connected to 192.168.200.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   662 MBytes  5.55 Gbits/sec    0   1.03 MBytes
[  5]   1.00-2.00   sec   657 MBytes  5.51 Gbits/sec    5   1021 KBytes
[  5]   2.00-3.00   sec   660 MBytes  5.53 Gbits/sec    0   1.09 MBytes
[  5]   3.00-4.00   sec   661 MBytes  5.55 Gbits/sec    0   1.09 MBytes
[  5]   4.00-5.00   sec   654 MBytes  5.48 Gbits/sec    0   1.20 MBytes
[  5]   5.00-6.00   sec   657 MBytes  5.51 Gbits/sec    0   1.32 MBytes
[  5]   6.00-7.00   sec   656 MBytes  5.50 Gbits/sec    0   1.32 MBytes
[  5]   7.00-8.00   sec   653 MBytes  5.48 Gbits/sec    0   1.32 MBytes
[  5]   8.00-9.00   sec   658 MBytes  5.52 Gbits/sec    0   1.32 MBytes
[  5]   9.00-10.00  sec   653 MBytes  5.48 Gbits/sec    0   1.32 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  6.42 GBytes  5.51 Gbits/sec    5             sender
[  5]   0.00-10.00  sec  6.42 GBytes  5.51 Gbits/sec                  receiver

iperf Done.


So about 5-7 times faster, and getting closer to line speed now... Not quite there yet, but big improvement.

And this took a lot of trial and error!
#40
I've got good experience with the Unifi stuff from Ubiquiti. You can go both more and less high grade from there. Depends on how much area you need to cover, how many active clients, etc.
#41
Thanks for the info. NB I've ordered a Chelsio T520-SO-CR on eBay. These are the (10Gb) cards that Netgate themselves sell for pfSense. And the FreeBSD crowd seems to love them. So can't really get more "supported" than that.

Then there's the question of CPU and chipset of course. The FreeBSD hardware list that you linked frankly leaves a lot to be desired. The latest "supported" AMD CPU on that list is from 2003. Let's hope I shouldn't read that literally...

Also, with regards to FreeBSD performance overall, I saw this: https://www.phoronix.com/scan.php?page=article&item=freebsd-13-beta1&num=1

... which also surprised me a bit. FreeBSD 13 is apparently now "closer to parity with Linux performance on the same hardware" (note "closer") and if you look at the results, 13 in turn is sometimes twice as fast, sometimes much more than that, compared to FreeBSD 12.

So in summary, netmap runs in emulated mode possibly due to lack of NIC support in drivers, already starting from possible question marks on FreeBSD 12 performance overall, possibly exacerbated further by HardenedBSD, and to round it off with question marks around how FreeBSD interacts with modern AMD CPUs...

I'll report back when I have been able to try the T520.
#42
Hardware and Performance / Re: WAN performance issue
November 09, 2021, 08:05:09 AM
Interesting. Your case is also different to mine in that you've had better performance with pfSense in the past as comparison. So I guess you could then compare the differences (as they are siblings, so to speak). Could the difference lie in FreeBSD vs HardenedBSD? That difference will disappear when OpnSense moves back to vanilla FreeBSD (Q1 2022?). Have you looked at Spectre mitigations and such? I think OpnSense applies stricter settings per default than pfSense. See https://docs.opnsense.org/troubleshooting/hardening.html

I like OpnSense too and would like to stay on it. Will try with a more recent Chelsio card and probably see through the migration off HardenedBSD, but if neither of those things help I'm moving to a Linux based fw, either Vyos or roll my own from a minimal Debian. Don't want to go there though.
#43
Hardware and Performance / Re: WAN performance issue
November 08, 2021, 07:44:15 PM
Not just you. As per the other current thread I am also struggling to get more than 1-1.5Gb out of my 10Gb card on OpnSense. A minimal Linux install in the same environment for comparison easily saturates 10Gb. https://forum.opnsense.org/index.php?topic=25263.0

What card have you got?
#44
Quote from: testo_cz on November 07, 2021, 10:43:25 PM
and this w.r.t netmap doesnt look much convincing:
556.373055 [ 320] generic_netmap_register   Emulated adapter for cxgbe1 activated
Thank you - maybe this is the smoking gun I've been looking for? I've been Googling a bit on the topic and it's surprising how hard it is to find good information. The Netmap documentation (https://github.com/luigirizzo/netmap/blob/master/README.md) claims native support for Intel, Realtek and Nvidia. And adds a sentence "FreeBSD has also native netmap support in the Chelsio 10/40G cards.". But doesn't mention which cards. Meanwhile, Chelsio har published a whitepaper (https://www.chelsio.com/wp-content/uploads/resources/FreeBSD-T5-Netmap.pdf) where they brag about Netmap performance, adding "Chelsio recently released its support for T5-based adapters into the FreeBSD kernel.". So I've got a T4 (10Gb) adapter... maybe it's not natively supported then? A bit unclear.

Also not clear now much of a performance hit I should expect from an emulated adapter as opposed to native support in the driver.

In any case I'll probably try to find a T520 on eBay.

Quote from: testo_cz on November 07, 2021, 10:43:25 PM
An idea: Your Ryzen-based motherboard might not be fully supported by FreeBSD12.1 kernel/drivers set which are the base for OPNsense 21.x. IMHO poor performance could be the result.
Have you double-checked this ?

Where/how would I check this...?

Quote from: testo_cz on November 07, 2021, 10:43:25 PM
BTW: I see you're using /boot/loader.conf.local . I thought its not taken into account anymore in OPNsense, being told we should use System Settings->Tunables in the GUI.
Does sysctl -a | grep 'hw.cxgbe' confirms your settings are being applied ?

Yes, that is working, and needs to be there for things to work. Otherwise the driver isn't loaded at the right time, and the card doesn't even get recognised during boot.

Thanks for the other suggestions on tunables, I'll look into them as well. Although I've been fiddling around with quite a few including "net.inet.ip.random_id" and others but only seen very marginal differences, not the factor 5 or 10 that I'm looking for here...
#45
Quote from: blblblb on November 02, 2021, 10:04:06 PM
Might want to look at this:
https://forum.opnsense.org/index.php?topic=25410.msg122060#msg122060

I'm not yet sure what the culprit is. Could you use some of my commands with UDP/-u mode and -Z -N whenever possible?
Also -P n where n is half your cores*2 count.  (just to avoid competing for resources elsewhere, leave some cores "free"). You can use all of them, though, but I suggest trying -P 2 first.

TL;DR run iperf3 with -u mode, it will show you packet loss. It's also relevant.

See below. The CPU is Ryzen 3700x so definitely shouldn't be the bottleneck. Under Linux, it still almost idles whiles pushing through 10Gb in iPerf3.


root@XXXX:~ # iperf3 -c 192.168.200.10 -u -b 0 -N -Z -P4
Connecting to host 192.168.200.10, port 5201
[  5] local 192.168.200.1 port 35346 connected to 192.168.200.10 port 5201
[  7] local 192.168.200.1 port 35715 connected to 192.168.200.10 port 5201
[  9] local 192.168.200.1 port 50565 connected to 192.168.200.10 port 5201
[ 11] local 192.168.200.1 port 42027 connected to 192.168.200.10 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec  34.3 MBytes   288 Mbits/sec  24660
[  7]   0.00-1.00   sec  34.3 MBytes   288 Mbits/sec  24660
[  9]   0.00-1.00   sec  34.3 MBytes   288 Mbits/sec  24660
[ 11]   0.00-1.00   sec  34.3 MBytes   288 Mbits/sec  24660
[SUM]   0.00-1.00   sec   137 MBytes  1.15 Gbits/sec  98640
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec  33.2 MBytes   278 Mbits/sec  23820
[  7]   1.00-2.00   sec  33.2 MBytes   278 Mbits/sec  23820
[  9]   1.00-2.00   sec  33.2 MBytes   278 Mbits/sec  23820
[ 11]   1.00-2.00   sec  33.2 MBytes   278 Mbits/sec  23820
[SUM]   1.00-2.00   sec   133 MBytes  1.11 Gbits/sec  95280
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec  34.5 MBytes   290 Mbits/sec  24800
[  7]   2.00-3.00   sec  34.5 MBytes   290 Mbits/sec  24800
[  9]   2.00-3.00   sec  34.5 MBytes   290 Mbits/sec  24800
[ 11]   2.00-3.00   sec  34.5 MBytes   290 Mbits/sec  24800
[SUM]   2.00-3.00   sec   138 MBytes  1.16 Gbits/sec  99200
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec  33.0 MBytes   277 Mbits/sec  23700
[  7]   3.00-4.00   sec  33.0 MBytes   277 Mbits/sec  23700
[  9]   3.00-4.00   sec  33.0 MBytes   277 Mbits/sec  23700
[ 11]   3.00-4.00   sec  33.0 MBytes   277 Mbits/sec  23700
[SUM]   3.00-4.00   sec   132 MBytes  1.11 Gbits/sec  94800
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec  35.5 MBytes   298 Mbits/sec  25500
[  7]   4.00-5.00   sec  35.5 MBytes   298 Mbits/sec  25500
[  9]   4.00-5.00   sec  35.5 MBytes   298 Mbits/sec  25500
[ 11]   4.00-5.00   sec  35.5 MBytes   298 Mbits/sec  25500
[SUM]   4.00-5.00   sec   142 MBytes  1.19 Gbits/sec  102000
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   5.00-6.00   sec  35.6 MBytes   298 Mbits/sec  25540
[  7]   5.00-6.00   sec  35.6 MBytes   298 Mbits/sec  25540
[  9]   5.00-6.00   sec  35.6 MBytes   298 Mbits/sec  25540
[ 11]   5.00-6.00   sec  35.6 MBytes   298 Mbits/sec  25540
[SUM]   5.00-6.00   sec   142 MBytes  1.19 Gbits/sec  102160
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   6.00-7.00   sec  34.9 MBytes   293 Mbits/sec  25100
[  7]   6.00-7.00   sec  34.9 MBytes   293 Mbits/sec  25100
[  9]   6.00-7.00   sec  34.9 MBytes   293 Mbits/sec  25100
[ 11]   6.00-7.00   sec  34.9 MBytes   293 Mbits/sec  25100
[SUM]   6.00-7.00   sec   140 MBytes  1.17 Gbits/sec  100400
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   7.00-8.00   sec  34.7 MBytes   291 Mbits/sec  24890
[  7]   7.00-8.00   sec  34.7 MBytes   291 Mbits/sec  24890
[  9]   7.00-8.00   sec  34.7 MBytes   291 Mbits/sec  24890
[ 11]   7.00-8.00   sec  34.7 MBytes   291 Mbits/sec  24890
[SUM]   7.00-8.00   sec   139 MBytes  1.16 Gbits/sec  99560
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   8.00-9.00   sec  34.2 MBytes   287 Mbits/sec  24580
[  7]   8.00-9.00   sec  34.2 MBytes   287 Mbits/sec  24580
[  9]   8.00-9.00   sec  34.2 MBytes   287 Mbits/sec  24580
[ 11]   8.00-9.00   sec  34.2 MBytes   287 Mbits/sec  24580
[SUM]   8.00-9.00   sec   137 MBytes  1.15 Gbits/sec  98320
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   9.00-10.00  sec  34.8 MBytes   292 Mbits/sec  24980
[  7]   9.00-10.00  sec  34.8 MBytes   292 Mbits/sec  24980
[  9]   9.00-10.00  sec  34.8 MBytes   292 Mbits/sec  24980
[ 11]   9.00-10.00  sec  34.8 MBytes   292 Mbits/sec  24980
[SUM]   9.00-10.00  sec   139 MBytes  1.17 Gbits/sec  99920
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec   345 MBytes   289 Mbits/sec  0.000 ms  0/247570 (0%)  sender
[  5]   0.00-10.00  sec   345 MBytes   289 Mbits/sec  0.007 ms  40/247570 (0.016%)  receiver
[  7]   0.00-10.00  sec   345 MBytes   289 Mbits/sec  0.000 ms  0/247570 (0%)  sender
[  7]   0.00-10.00  sec   345 MBytes   289 Mbits/sec  0.009 ms  39/247570 (0.016%)  receiver
[  9]   0.00-10.00  sec   345 MBytes   289 Mbits/sec  0.000 ms  0/247570 (0%)  sender
[  9]   0.00-10.00  sec   345 MBytes   289 Mbits/sec  0.006 ms  38/247570 (0.015%)  receiver
[ 11]   0.00-10.00  sec   345 MBytes   289 Mbits/sec  0.000 ms  0/247570 (0%)  sender
[ 11]   0.00-10.00  sec   345 MBytes   289 Mbits/sec  0.012 ms  40/247570 (0.016%)  receiver
[SUM]   0.00-10.00  sec  1.35 GBytes  1.16 Gbits/sec  0.000 ms  0/990280 (0%)  sender
[SUM]   0.00-10.00  sec  1.35 GBytes  1.16 Gbits/sec  0.008 ms  157/990280 (0.016%)  receiver