Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - testo_cz

#1
just finished my re-tests of OPN running on Proxmox:
1. setup OPNsense VM network adapters with multiqueue https://pve.proxmox.com/pve-docs/chapter-qm.html#qm_network_device (e.g. set 4, if vCPU == 4, as in my case)

2. setup net.isr bind-/max-threads via System-Settings-Tunables https://docs.opnsense.org/troubleshooting/performance.html#kernel-support

I use two additional VMs (in WAN, and LAN) and run iperf3 server/client on them respectively, OPNsense VM is doing default NAT between them. Proxmox connects all via two Linux vmbr (WAN, LAN). Another option would be to run iperf3 server or client on Proxmox host itself.

My result is 2 Gigabit/s in either direction for "iperf3 -P2 -t60" and 4-vCPU OPNsense VM.

Let us know your LAN client performance result, pls.
T.
#2
Have you tried iperf3 to doublecheck the link between LAN client and LAN interface at OPNsense VM ?

# @OPNsense, pkg install iperf3
iperf3 -s -B <lan-ip-opnsense>
# @LAN-client
iperf3 -c <lan-ip-opnsense> -P2
iperf3 -c <lan-ip-opnsense> -P2 -R
should saturate the link almost full gigabit
#3
Nice @gcorre !
Can you confirm that bsnmpd runs fine on the current OPNsense , please ?

Me personally, searched forums and got the impression, that bsnmpd is not preferred in OPNsense (since some troubles), however it is default in FreeBSD.
I took this Zabbix template (https://www.zabbix.com/integrations/opnsense) and disabled all those BEGEMOT-MIB items, and basically lost PF monitoring. Instead of BEGEMOT-MIB items for PF , I've tried collectd and Telegraf features to PF monitoring . Both output Prometheus format and so pluggable into Zabbix.

Well, for now, I use Telegraf's PF metrics via Prometheus output to Zabbix. The metrics include PF state table only, that is less  than BEGEMOT-MIB.

Cheers
#4
it looks like since 19.1 , bsnmp is superseded

https://forum.opnsense.org/index.php?topic=11398.msg51514#msg51514


has been discussed already

https://forum.opnsense.org/index.php?topic=19753.0


I'm dealing with Zabbix monitoring of OPNsense via snmp too these days.

T.
#5
Have you tried to install a switch between your OPNsense and ISP;s equipment ?
Namely some switch with controllable EEE , so you could additionally control EEE of the counterpart to your i211.

Seems to me that disabling EEE on your i211 NICs is necessary regardless how much the FreeBSD driver is being updated & bugfixed. So I would ruled out the driver from the list of possible causes.

#6
Hi

I wonder how your Ethernet network card will deal with the SFP module. If you could share some results later ?

Because, IMHO, the card supports Ethernet standards only , IEEE 802.3* , like for example:

        Supported link modes:   1000baseT/Full
                                10000baseT/Full
                                1000baseX/Full
                                10000baseSR/Full
                                10000baseLR/Full

and your GPON ONT SFP thus might have matching physical&electrical SFP interface however not matching link/protocol towards to the Ethernet NIC.

Then there is a question what HW part will manage WDM/TDM for GPON ... interesting.

Happy hacking -- thumbs up !

#7
Hardware and Performance / Re: 10GB LAN Performance
January 01, 2022, 06:42:24 PM
Quote from: johnoatwork on January 01, 2022, 03:59:38 AM
Bit of an update on this. After swapping the Chelsio cards for Intel X710-DA2s and getting more or less the same result I've figured out at least the iperf issue. iperf3 is single threaded, even if you run it with the -P option it still only hits one CPU core. If you want multithreaded operation you have to use iperf2.

I'd been checking CPU utilisation on the firewall dashboard while iperf3 was running and not seeing any significant numbers, but when I checked with top directly from the console the single CPU core being hit by iperf was running at close to 100%.

So I installed iperf2 and ran it multithreaded and boom! Near wire speed with 20+ concurrent threads!!

Running iperf continuously makes it easier to monitor top. For those who are interested, this runs the iperf2 client continuously with 50 threads:

iperf -c hostname -tinf -P 50

Then run top on the firewall like this:

top -PCH

But here's what I don't get. If I run iperf2 *through* the firewall to a server on the same 10Gbps network segment as the WAN, I get around 5Gbps with a single thread and 7-8Gbps multithreaded. But the same client running speedtest cli peaks at around 1Gbps. Looking at top on the firewall while speedtest is running doesn't show any significant CPU utilisation and anyway, if the firewall is only running a single thread for speedtest realistically it should be capable of way better than 1Gbps (half of that with WIN10!).

The obvious culprit is the ISP network but I'm still getting up to 8Gbps running speedtest directly from the firewall. I've also tested with mtr (no data loss and super low latency) and tracepath (no mtu issues all the way through to 1.1.1.1).

In summary, here is what I have found:

  • There is not much difference I can tell in performance between the Intel X710-DA2 and the Chelsio T520-CRs
  • The internal 10Gbps network and attached clients are healthy and can transfer data at close to wire speed
  • The overhead from packet filtering on the firewall (passing iperf traffic) is 2-3Gbps which is bearable. Faster CPUs might reduce this, but with 10 cores engaged utilisation is only about 25-30%
  • The ISP upstream network is healthy
So I'm not sure why there is such a big difference in firewall throughput between speedtest and iperf. I'm guessing speedtest uses tcp/443 and iperf defaults tcp/5001 (5201 for iperf3).

Unless the firewall is doing additional processing for tcp/443? I don't have any special rules set up for https and there is no IDS running at the moment. I'm going to have a close look at the proxy setup see if that leads anywhere.

Nice finding about iperf2 vs iperf3. Thanks.

I think "rungekutta" reported about similar forwarding performance as yours in iperf3 testing.
My ESX based testbed (only 10G capable I've got) runs also something over 5Gbps  with iperf3 but I haven't tweaked it much.

When you mentioned "10 cores engaged utilisation is only about 25-30%" , does that mean that each of the ten CPU cores is utilized at 25-30% ?

Few shots I would check:
if power management allows CPU to scale its frequency up ?

sysctl -a | grep cpu | grep freq


if this network tunables for multicore CPUs are on:

net.isr.maxthreads = "-1"
net.isr.bindthreads = "1"


if flow control per network interface is off ?

dev.ixl.#.fc = "0"


Further, I'd say that one may try to increase number of RX/TX queues and descriptors.
If not ixl(4) then iflib(4) based tunables might let you to do so. Check the sysctl values of 'nrxqs', 'ntxqs', 'nrxds' and 'ntxds' and see if you may override them to make them bigger/larger. Overrides require reboot , I guess.
Docs e.g. here:
https://www.freebsd.org/cgi/man.cgi?query=iflib&sektion=4&apropos=0&manpath=FreeBSD+12.2-RELEASE+and+Ports


This approach boosted forwarding performance on my ESX setup with vmx interfaces.


With regards the speedtest-cli to the Internet, I;d say to try to tcpdump/wireshark on both sides of firewall to see if the packets go nicely as expected or if there are resends, rubbish or something strange going on.



#8
So instead of single stream 770Mbps you are now getting 850Mbps . IMHO thats nice, not perfect but nice.

Networking tasks should now be spread over more CPU cores but still low. As the last time you have posted 33% sys CPU load at one core. IMHO healthy system. But this time it will spread over two plus cores.

I don't have benchmarks for single stream , I'm afraid. Always been using -P2 and resulting similarly to 950Mbps (either UP or DOWN load). Maybe I'll try later but can;t promise.

Maybe 850Mbps single-stream is just fine now but I'm not sure. There is always a performance decrease because of intentional non-offloading NICs and Netmap'ing within OPNsense, however the decrease is almost invisible with multiple streams/sessions in case of 1GbE. That is 950Mbps and more , in case of -P2, which is very nice considering MTUs , pps, and other limitations of source-destination NICs.

Reminds me, the OPNSense docs say , to keep TCP, UDP, LRO offloadings in the default = OFF.

And this might be useful too -- an example of "healthy" initialization of a powerful Intel 1GbE NIC (dmesg | grep igb0):

igb0: <Intel(R) PRO/1000 PCI-Express Network Driver> port 0xc020-0xc03f mem 0xfe8a0000-0xfe8bffff,0xfe880000-0xfe89ffff,0xfe8c4000-0xfe8c7fff irq 40 at device 0.0 on pci3
igb0: Using 1024 TX descriptors and 1024 RX descriptors
igb0: Using 2 RX queues 2 TX queues
igb0: Using MSI-X interrupts with 3 vectors
igb0: Ethernet address: 00:25:90:00:00:00
igb0: netmap queues/slots: TX 2/1024, RX 2/1024

Important is: MSI on, more than one HW-queue, more than one netmap queues mapped in non-emulated mode.

Is the single stream TCP performance somehow crucial for you ?

TCP stack itself has got some tunables too -- both via Iperf3 tool and the kernel sysctl.

T.

#9
I'd say, for the start , use these tunables

net.isr.maxthreads = "-1"
net.isr.bindthreads = "1"

to have multiple queues ;
And disable flow control on the Intel NICs, e.g.

dev.igb.0.fc = "0"
dev.igb.1.fc = "0"


The most 1GbE setups i've seen do the NAT  close to 1Gbps happily.

Is the CPU frequency scaling up/down ?

sysctl -a | grep cpu | grep freq



#10
You mainly run speedtest-cli from LAN to a speedtest server in the Internet , don't you ?

What Is the type/technology of your connection  to ISP ?

T.
#11
Hardware and Performance / Re: 10GB LAN Performance
December 17, 2021, 08:59:58 AM
I'say : try connect another PC with 10GbE to your switch stack,
to clearly see if the bottleneck is proxmox/win OR the OPNsense server.

Throughput around 1Gbps on this 20G setup seems to me crazy low. Unless you have left some IPS or shaping settings on OPNsense -- what Is CPu load on OPNsense server when you test throughput ?
#12
Performance of what precisely ?

IMHO your setup is non-standard and I don't see any diagnostics information in your post.

T.
#13
Hardware and Performance / Re: Speed negotiation fail
November 20, 2021, 05:33:40 PM
Sounds like broken....

Have you booted some other operating system to rule out OPNsense ?
#14
@rungekutta
Very nice info about your NIC setup. And throughput results.

Was the earlier  firmware in the NIC something like too old or it was perhaps customized ?
Because as people often reuse HW / NICs , it might not have a genuin firmware. For example customized by e.g. server vendor.

I'm only getting familiar with Suricata.... Is it like utilizing 100% CPU if enabled ?

#15
Hardware and Performance / Re: vmxnet tuning advice
November 14, 2021, 10:03:11 AM
I don't have such complex setup as you, but this post helped me:
https://forum.opnsense.org/index.php?topic=18754.msg90576#msg90576

Update:
Here for example the topic w.r.t. VMXNET tuning that confirms it works https://forum.opnsense.org/index.php?topic=25076.0 .

And my test VM shows 10Gbps for 8 vCPU (Xeon @2.4GHz) and 'iperf3 -P 5 -t60' and following tunables:

net.isr.maxthreads = -1
net.isr.bindthreads = 1
hw.pci.honor_msi_blacklist = 0
hw.ibrs_disable = 1
dev.vmx.0.iflib.override_ntxds="0,4096"
dev.vmx.0.iflib.override_nrxds="0,2048,0"
dev.vmx.1.iflib.override_ntxds="0,4096"
dev.vmx.1.iflib.override_nrxds="0,2048,0"


T.