Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - pfop

#1
Quote from: ks on June 21, 2024, 06:44:41 PM

Anyone else has experienced these kind of issues?

Thanks in advance

I've this issue too from time to time (sometimes once a month, sometimes multiple times a week) with an 5700G, and ASUS PRIME B550M-A board. RAM is from Kingston, default BIOS settings loaded, no overclocking.
I din't find the issue, so I ordered some intel hardware to migrate to...
#2
Quote from: netnut on June 18, 2024, 08:29:58 PM
Last three months I experienced major issues with a SuperMicro AM5 board and a 25Gb Intel 810 ethernet adapter, after endless debugging, two interim BIOS fixes and a new firmware update for the Intel 810 it looks like things are now mostly fixed, at least the "forced" SR-IOV mode I experienced and some other stuff that both machine and NIC BIOS/Firmware should have done but didn't.

Did you set ice_ddp_load="YES"' in /boot/loader.conf.local? Otherwise OPNsense will only use only one queue on the NIC, which limits packet processing to one core, which in most cases is not enough for 25GBit throughput.
#3
Quote from: Rkpaxam on June 26, 2024, 09:49:17 AM
im getting the following in the logs so I'm not sure if this has anything to do with it:2024-06-26T08:37:40   Notice   kernel   <6>igc0: link state changed to UP   
2024-06-26T08:37:32   Notice   kernel   <6>igc0: link state changed to DOWN   
2024-06-26T08:37:31   Notice   kernel   <6>igc0: link state changed to UP   
2024-06-26T08:37:28   Notice   kernel   <6>igc0: link state changed to DOWN   
2024-06-26T08:37:26   Notice   kernel   <6>igc0: link state changed to UP   
2024-06-26T08:37:23   Notice   kernel   <6>igc0: link state changed to DOWN   
2024-06-26T08:37:22   Notice   kernel   <6>igc0: link state changed to UP

This should not impact performance but kill the whole connection, it means that your interface igc0 is flapping. For example hardware issue or cable issue.
#4
Quote from: kevindd992002 on June 18, 2024, 06:46:22 PM
@pfop did you have the same results?

Hello, currently I've no OPNsense box to test with, sorry.
#5
Quote from: Magician1981 on March 31, 2024, 01:12:16 PM
My question is if qat in the 8505 functions on opnsense with accelerating wireguard vpn traffic?
Thank you.

Hello, no, it will not.
Only pfSense+ will utilize Crypto Offloading for Wireguard as of today.

BR
#6
Quote from: meyergru on March 18, 2024, 04:47:01 PM
I do not really know if it might be compiled statically, but if the speed results of @pfop are correct (i.e. OpnSense 24.1.3_1 = 100%, pfSense CE 2.7.2 = 200% and pfSense+ = 400%), it suggests Netgate really limits use of the integration to pfSense+.
The CPU Crypto offloading/acceleration with iimb.ko is only available in pfSense+ and not available on pfSense CE.
#7
Quote from: franco on March 11, 2024, 10:00:56 AM
Can you try to locate it on the plus install?
# find / -name "iimb.ko"


[23.09.1-RELEASE][root@pfSense.home.arpa]/root: find / -name "iimb.ko"
/boot/kernel/iimb.ko

Quote from: franco on March 11, 2024, 10:00:56 AM
And then try to see if it belongs to a package or if it is part of the non-free plus sources?
# pkg which /path/to/iimb.ko


[23.09.1-RELEASE][root@pfSense.home.arpa]/root: pkg which /boot/kernel/iimb.ko
/boot/kernel/iimb.ko was installed by package pfSense-kernel-pfSense-23.09.1


To summarize:
pfSense CE on bare metal C3758R, 1300MBit Wireguard throughput
OPNsense on bare metal C3758R, 630MBit Wireguard throughput (-51%)

pfSense+ on bare metal Ryzen 5700G, 6000MBit Wireguard throughput (with IIMB!) at only ~25% CPU load
OPNsense on bare metal Ryzen 5700G, 1800MBit (-70%)

FreeBSD 13.2 on vSphere VM, 1 Core i7-7700K, 990MBit throughput
FreeBSD 13.3 on vSphere VM, 1 Core i7-7700K, 1020MBit throughput
FreeBSD 14.0 on vSphere VM, 1 Core i7-7700K, 980MBit throughput

Unfortunately I was not able to do Wireguard tests on Ryzen 5700G with pfSense CE, as the network driver supplied there only supports one queue which could lead to measurement errors.

pfSense+ is clearly the leader with hardware acceleration of Wireguard.
But I still can't figure out, why OPNsense throughput is so much lower compared to pfSense CE or FreeBSD. There is clearly something unoptimized on OPNsense when using Wireguard.
#8
Quote from: Monviech on March 11, 2024, 08:24:37 AM
It seems like there was another thread where tunables have been described:

https://forum.opnsense.org/index.php?topic=37808.0

Thank you for your reply, I added those tunables already without any change in WG performance.
#9
Quote from: Monviech on March 10, 2024, 07:22:35 PM
So, this feature does all the magic in pfsense Plus? Intel CPU and IPsec Multi-Buffer (IPsec-MB, IIMB) Cryptographic Acceleration for ChaCha20-Poly1305 ?

https://docs.netgate.com/pfsense/en/latest/hardware/cryptographic-accelerators.html

To be honest, I tried to find this out, but I'm not a 'low level' expert unfortunately.
What I can say is, installing intel-ipsec-mb-1.5_1 and loading cryptodev.ko didn't make a difference on OPNsense.

Here the loaded modules on pfSense+ and OPNsense.
The iimb.ko on pfSense+ looks like the one we're missing...

pfSense+
[23.09.1-RELEASE][root@pfSense.home.arpa]/root: kldstat
Id Refs Address                Size Name
1   38 0xffffffff80200000  339f830 kernel
2    1 0xffffffff835a0000    abd98 ice_ddp.ko
3    1 0xffffffff8364c000     76f8 cryptodev.ko
4    1 0xffffffff83655000    1e2b0 opensolaris.ko
5    1 0xffffffff83674000   5d7790 zfs.ko
6    1 0xffffffff84710000     2220 cpuctl.ko
7    1 0xffffffff84713000     3210 intpm.ko
8    1 0xffffffff84717000     2178 smbus.ko
9    1 0xffffffff8471a000     9288 aesni.ko
10    1 0xffffffff84800000   666a08 iimb.ko
12    1 0xffffffff84753000     3158 amdtemp.ko
13    1 0xffffffff84757000     2130 amdsmn.ko
14    1 0xffffffff84724000    2e560 if_wg.ko


OPNsense
root@OPNsense:~ # kldstat
Id Refs Address                Size Name
1   86 0xffffffff80200000  216c2e0 kernel
2    1 0xffffffff8236d000     ab48 opensolaris.ko
3    1 0xffffffff82378000     4b58 if_enc.ko
4    3 0xffffffff8237d000    78aa0 pf.ko
5    1 0xffffffff823f6000     a458 cryptodev.ko
6    1 0xffffffff82401000    abc98 ice_ddp.ko
7    1 0xffffffff824ad000     f4c8 pfsync.ko
8    1 0xffffffff824bd000   59dfe0 zfs.ko
9    1 0xffffffff82a5b000     3b18 pflog.ko
10    1 0xffffffff82a5f000     f858 carp.ko
11    1 0xffffffff82a70000     aa70 if_gre.ko
12    1 0xffffffff82a7b000    16148 if_lagg.ko
13    2 0xffffffff82a92000     3538 if_infiniband.ko
14    1 0xffffffff82a96000     e8f8 if_bridge.ko
15    2 0xffffffff82aa5000     8958 bridgestp.ko
16    1 0xffffffff83010000     3378 acpi_wmi.ko
17    1 0xffffffff83014000     3218 intpm.ko
18    1 0xffffffff83018000     2180 smbus.ko
19    1 0xffffffff8301b000     3340 uhid.ko
20    1 0xffffffff8301f000     3380 usbhid.ko
21    1 0xffffffff83023000     31f8 hidbus.ko
22    1 0xffffffff83027000     3320 wmt.ko
23    1 0xffffffff8302b000     72a8 hifn.ko
24    1 0xffffffff83033000     2270 padlock.ko
25    1 0xffffffff83036000    15308 qat.ko
26    1 0xffffffff8304c000     43b0 safe.ko
27    1 0xffffffff83051000     3160 amdtemp.ko
28    1 0xffffffff83055000     2138 amdsmn.ko
29    1 0xffffffff83058000    2f560 if_wg.ko
30    1 0xffffffff83088000     4700 nullfs.ko
#10
It has been some weeks and I got some new hardware and did also some additional tests that might be interesting to the community.

The logical setup stays the same, but now I used a Ryzen 5700G with an Intel E810 Quad Port SFP28 network card.

OPNsense WG performance
OPNsense 24.1.1 Wireguard performance: 1800MBit
--> So the Ryzen 5700G is 3x faster compared to the C3758R, alltought the CPU itself is 4.8x faster

pfSense+ WG performance
pfSense+ 23.09.1 Wireguard performance: 6000MBit
--> Out of interest, I did some tests with pfSense+, which uses hardware acceleration for ChaCha20-Poly1305, and it shows an impressive 6000MBit throughput while the CPU is still 75% idle, impressive

Is the difference in post 1 because different FreeBSD versions are used?
I doubt that. I did some tests with Wireguard on FreeBSD VMs with identical configuration, and the results are really close to each other. I used for the FreeBSD tests an old i7-7700K, each FreeBSD VM got 1vCPU assigned.

FreeBSD 13.2: 990MBit
FreeBSD 13.3: 1020MBit
FreeBSD 14.0: 980MBit

Conclusion
One core of the i7-7700K has about 10% of the processing power of a Ryzen 5700G, still it achieves 50% of the throughput of the Ryzen 5700G with OPNsense.
So for me it is clearly an issue related to OPNsense, and not FreeBSD / Kernel version in general.
#11
Quote from: franco on February 20, 2024, 09:57:22 PM
It's probably a rule for scrubbing or something else being configured suboptimal vs. pfSense.
So I disabled scrubbing on all interfaces, applied the config, rebooted opnSense and retestet. Still same result.

Quote from: lewald on February 21, 2024, 10:45:33 AM
You can play with Normalization.

http://x.x.x.x/firewall_scrub.php

Add rule -
On Interface Wireguard Group
max MSS. 1300
Added the rule, applied it, rebooted opnSense and retested. Still same result.

Quote from: mimugmail on February 21, 2024, 12:37:33 PM
Or in instance tick advanced and set MTU to the same value on all devices.
As I don't know where to look for the issue really, I also tried this change (which in general doesn't make sense, as smaller packets will load the CPU more than bigger ones, and I'm testing on a 1500MTU only network).
So before MTU of the WG interfaces on both side were default, 1420:
opnSense: wg1: flags=80c1<UP,RUNNING,NOARP,MULTICAST> metric 0 mtu 1420
ServerB: 5: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default qlen 1000

After setting MTU 1300 on both sides:
opnSense: wg1: flags=80c1<UP,RUNNING,NOARP,MULTICAST> metric 0 mtu 1300
ServerB: 6: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1300 qdisc noqueue state UNKNOWN group default qlen 1000

And as expected, the speed went down by about 10MBit because of more overhead with smaller packets.


Quote from: meyergru on February 21, 2024, 01:07:15 PM
- use more threads (-P4 or -P8 are equal, only -P1 is a little slower).
I agree on this, on both opnSense and pfSense, one thread already get close to the maximum throughput, with more streams you only gain very little additional speed.

Quote from: meyergru on February 21, 2024, 01:07:15 PM
My CPU should be more or less comparable to the C3758R, even somewhat fast in single-thread application.
....
But when I look at "top" with threads and system processes enabled, I can see the kernel at ~300% (the rest is interrupts and user processes), so all 4 threads seem to get utilized.
I see 750-780% on the kernel process:
    0 root        142 -16    -     0B  2272K swapin   3  16:30 784.82% kernel
Most likely my system has somewhat better hardware processing, as my interrupts stay very close to 0%, that most likely why you see a bit less performance than my setup, even if you got a some 5% faster CPU than me.
#12
Hello together

Since opnSense 24.1, it uses kernel based WG implementation, you can also see this in their release notes:
wireguard: installed by default using the bundled FreeBSD 13.2 kernel module
Source: https://forum.opnsense.org/index.php?topic=38427.0

Mine is a fresh 24.1.1 install, no upgrade, where there could be some 'leftovers' from wireguard-go.
Current opnSense 24.1.1 is running FreeBSD 13.2-RELEASE-p9:

root@OPNsense:~ # uname -r
13.2-RELEASE-p9

pfSense using WG kernel module:
[2.7.2-RELEASE][root@pfSense]/root: kldstat | grep wg
9    1 0xffffffff83e4f000    2e560 if_wg.ko

OPNsense using WG kernel module:
root@OPNsense:~ # kldstat | grep wg
22    1 0xffffffff82df2000    2f560 if_wg.ko

BR
#13
Hello meyergru

Thank you for taking time to reply!
I did now disable spectre/meltdown settings and also the IPv4 random IDs, applied them, and rebooted the firewall.
net.inet.ip.random_id   0
hw.ibrs_disable 1
vm.pmap.pti 0

Redoing the test shows maybe a slight increase to 650MBit, but nowhere close to the 1300MBit from pfSense.
Microcode Update is installed - yes.

It's correct, that Wireguard is not using AES, there are some Intel Quick Assist implementations which can help, but this system has too old Quick Assist afaik, anyway it is the same for pfSense.

BR
#14
Hello colleagues

Intro

I'm a long time (over 15 years) pfSense user, now moving to OPNsense once my new fiber connection is ready, as OPNsense offers better NAT performance in my tests.
So far I used pfSense on ALIX and APU devices from PC Engines, as also virtually on VMs.

New hardware
For my fiber connection, which will be 10GBit symmetrical, I got a passive Quotom device, which is powered by an 8 Core Intel Atom C3758R CPU, 32GB DDR4 2400MHz ECC RAM (2x 16GB) and two NVME SSDs with ZFS Mirror.
The devices provide 4x SFP+ X553 ports, 5x RJ45 2.5G Intel I225-V.

Issue with Wireguard performance
What currently is bugging me, is the Wireguard performance on OPNsense, compared to pfSense.
On the C3758R I get with pfSense 2.7.2 and the 'WireGuard' version 0.2.1 package 1300Mbit of Wireguard performance.
On the C3758R I get with OPNsense 24.1.1 630Mbit of Wireguard performance.

Setup
The setup for both tests is exactly the same, also the same physical box was used for all tests.

ServerA is wired directly to SFP+ port1 (ix1) on OPNsense with a 10G LR SM optic.
ServerB is wired directly to SFP+ port2 (ix2) on OPNsense with a 10G LR SM optic.



ix1 = OPNsense LAN, MTU 1500
ix2 = OPNsense WAN, outbound NAT active, MTU 1500

Testing
Doing iperf3 tests between ServerA and ServerB, I can reach with 1 stream up to 3.5GBit, with more streams, I can saturate the 10Gbit interfaces.

When estabilishing a Wireguard VPN between FW01 and ServerB, iperf3 tests between ServerA to ServerB's WG IP, I can reach with 1 stream about 630MBit and the CPU utilization is at 100%.

pfSense Wireguard performance
Doing the exactly same with pfSense, with the same physical Firewall, I can reach 1300MBit through Wireguard with the exact same setup.

Question
Has anyone an idea, why OPNsense is 50% slower in regards to Wireguard throughput? Is there any hidden options that can be modified, to get closer to the 1300MBit possible on pfSense?

I look forward to an constructive discussion!

Best regards