Hi All,
I've been searching thru the threads regarding slow wireguard performance on opnsense I'm hoping someone is able to provide some clarity as to what is causing my wireguard to max out at about 383Mbits/Sec
Here is my layout:
I'm testing between 2 locations that have 1GB speed on Fibre obtic network PPPoE Connection.
when I run iperf3 between both locations using the WAN IP I get near line speed however when I test using the internal IP of a machine behind the opnsense router I get a max of about 383Mbits/Sec, and this is even with parallel connections
I also tested opnsense as a VM in proxmox and opnsense installed on the hardware without a hypervisor (Identical hardware) and the speeds did not change all that much
This is the summary output of iperf3 using the WAN IP:
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 81.5 MBytes 684 Mbits/sec 91 394 KBytes
[ 5] 1.00-2.00 sec 94.2 MBytes 790 Mbits/sec 28 429 KBytes
[ 5] 2.00-3.00 sec 92.1 MBytes 772 Mbits/sec 2 465 KBytes
[ 5] 3.00-4.00 sec 90.0 MBytes 755 Mbits/sec 47 346 KBytes
[ 5] 4.00-5.00 sec 91.1 MBytes 764 Mbits/sec 19 378 KBytes
[ 5] 5.00-6.00 sec 92.5 MBytes 776 Mbits/sec 20 405 KBytes
[ 5] 6.00-7.00 sec 92.1 MBytes 773 Mbits/sec 21 433 KBytes
[ 5] 7.00-8.00 sec 91.8 MBytes 770 Mbits/sec 4 465 KBytes
[ 5] 8.00-9.00 sec 89.8 MBytes 753 Mbits/sec 37 360 KBytes
[ 5] 9.00-10.00 sec 89.5 MBytes 751 Mbits/sec 9 402 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 905 MBytes 759 Mbits/sec 278 sender
[ 5] 0.00-10.04 sec 903 MBytes 754 Mbits/sec receiver
These are the speeds using the LAN IP
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.01 sec 46.1 MBytes 385 Mbits/sec 107 529 KBytes
[ 5] 1.01-2.00 sec 45.0 MBytes 379 Mbits/sec 0 574 KBytes
[ 5] 2.00-3.00 sec 47.5 MBytes 398 Mbits/sec 0 620 KBytes
[ 5] 3.00-4.00 sec 50.0 MBytes 419 Mbits/sec 0 663 KBytes
[ 5] 4.00-5.00 sec 76.2 MBytes 640 Mbits/sec 22 546 KBytes
[ 5] 5.00-6.00 sec 45.0 MBytes 377 Mbits/sec 0 597 KBytes
[ 5] 6.00-7.00 sec 41.2 MBytes 345 Mbits/sec 0 640 KBytes
[ 5] 7.00-8.00 sec 62.5 MBytes 526 Mbits/sec 0 693 KBytes
[ 5] 8.00-9.00 sec 72.5 MBytes 608 Mbits/sec 18 581 KBytes
[ 5] 9.00-10.00 sec 77.5 MBytes 650 Mbits/sec 0 671 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 564 MBytes 473 Mbits/sec 147 sender
[ 5] 0.00-10.05 sec 561 MBytes 468 Mbits/sec receiver
the hardware specs I have are:
Intel(R) Celeron(R) N5105 @ 2.00GHz
8GB of Memory
128 SSD
I am using the kernel package for wireguard, any help would be appreciated.
I get similar results with the same CPU, even somewhat lower, but I think that is because I use crowdsec and Netflow. The CPU maxes out at 100%, whereas the counterpart, an AMD V1500B is only at 40%.
Wireguard uses all CPU threads, and the N5105 has no hyperthreading, so only 4 threads. AFAIK, when available, AVX features are being leveraged for ChaCha20. The N5105 does not have these extensions, as you can see with 'x86info -a':
Feature flags:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflsh ds acpi mmx fxsr sse sse2 ss ht tm pbe sse3 pclmuldq dtes64 monitor ds-cpl vmx est tm2 ssse3 sdbg cx16 xTPR pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc-deadline aes xsave osxsave rdrnd
With the V1500B this reads:
Feature flags:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflsh mmx fxsr sse sse2 ht sse3 pclmulqdq mwait ssse3 fma cmpxchg16b sse4_1 sse4_2 [1:ecx:22] popcnt aes xsave osxsave avx f16c [1:ecx:30]
If you want faster cryptography, you need something like an N100 or better.
okay so I ran another test.
this time I did a port-forward to a machine behind the opnsense installed wireguard and ran an iperf3 test to it
when I do the iperf3 test from the WAN IP I get the near line speeds:
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 85.3 MBytes 715 Mbits/sec 26 506 KBytes
[ 5] 1.00-2.00 sec 91.7 MBytes 770 Mbits/sec 42 359 KBytes
[ 5] 2.00-3.00 sec 91.6 MBytes 768 Mbits/sec 20 366 KBytes
[ 5] 3.00-4.00 sec 91.8 MBytes 770 Mbits/sec 21 373 KBytes
[ 5] 4.00-5.00 sec 92.0 MBytes 772 Mbits/sec 6 392 KBytes
[ 5] 5.00-6.00 sec 95.1 MBytes 798 Mbits/sec 4 411 KBytes
[ 5] 6.00-7.00 sec 93.3 MBytes 782 Mbits/sec 21 420 KBytes
[ 5] 7.00-8.00 sec 94.1 MBytes 790 Mbits/sec 21 430 KBytes
[ 5] 8.00-9.00 sec 93.8 MBytes 787 Mbits/sec 22 443 KBytes
[ 5] 9.00-10.00 sec 91.4 MBytes 767 Mbits/sec 43 449 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 920 MBytes 772 Mbits/sec 226 sender
[ 5] 0.00-10.04 sec 918 MBytes 767 Mbits/sec receiver
when I run iperf3 behind the firewall I get this:
[ 5] local 192.168.7.5 port 48580 connected to 192.168.7.1 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 52.6 MBytes 441 Mbits/sec 74 477 KBytes
[ 5] 1.00-2.00 sec 44.7 MBytes 375 Mbits/sec 29 386 KBytes
[ 5] 2.00-3.00 sec 63.2 MBytes 528 Mbits/sec 0 486 KBytes
[ 5] 3.00-4.00 sec 55.1 MBytes 463 Mbits/sec 42 411 KBytes
[ 5] 4.00-5.00 sec 58.6 MBytes 492 Mbits/sec 0 498 KBytes
[ 5] 5.00-6.00 sec 64.8 MBytes 543 Mbits/sec 22 448 KBytes
[ 5] 6.00-7.00 sec 56.8 MBytes 477 Mbits/sec 41 377 KBytes
[ 5] 7.00-8.00 sec 47.4 MBytes 396 Mbits/sec 0 452 KBytes
[ 5] 8.00-9.00 sec 58.3 MBytes 489 Mbits/sec 22 375 KBytes
[ 5] 9.00-10.00 sec 30.1 MBytes 253 Mbits/sec 0 428 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 531 MBytes 446 Mbits/sec 230 sender
[ 5] 0.00-10.04 sec 530 MBytes 442 Mbits/sec receiver
This machine is much more powerful and has the AVX feature in it's processor:
Specs:
AMD Ryzen 7 PRO 3700U w/ Radeon Vega Mobile Gfx
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave [b]avx[/b] f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 [b]avx2[/b] smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca sev sev_es
it's got 30 GB of memory this machine is not starved for resources yet the speeds are not much better,
a. How would it help to make one side of a wireguard connection faster? Or what is the other side?
b. By "AVX", I meant the whole family of AVX extensions, including AVX2 and AVX512. I do not know which exactly is needed / used.
c. Once you pass the firewall, there may be other inspections done, like Crowdsec, Zenarmor, Intrusion detection or Netflow, that put stress on your OpnSense, limiting the attainable speed.
d. Also: Did you set a smaller MTU than 1420, especially if you go over IPv6 and / or PPPoE and /or VLAN?
Quote from: meyergru on November 29, 2023, 05:03:46 PM
a. How would it help to make one side of a wireguard connection faster? Or what is the other side?
b. By "AVX", I meant the whole family of AVX extensions, including AVX2 and AVX512. I do not know which exactly is needed / used.
c. Once you pass the firewall, there may be other inspections done, like Crowdsec, Zenarmor, Intrusion detection or Netflow, that put stress on your OpnSense, limiting the attainable speed.
d. Also: Did you set a smaller MTU than 1420, especially if you go over IPv6 and / or PPPoE and /or VLAN?
To answer your questions,
A: the WAN connection on both ends stays the same, removed wireguard from the opensense instance so that it would only focus on routing vs routing and VPN, the speeds on both ends is the same using the same ISP.
C: This is a newly deployed instance I haven't turned anything that is not on by default with the exception of wireguard and there are about 4 rules in wireguard that i'm using at the moment
the part that is very curious for me is when I do iperf3 via the WAN IP I get near line speeds routing back to the machine behind opnsense, however when I use wireguard whether it's being managed by opnsense or the machine behind the opnsense instance it shows similar slow speeds.
is there anything else I could look at?
you help is greatly apreciated.
I obviously miss how you are measuring. One side is an OpnSense with an N5105 CPU, but what is the other one?
I assumed this is a wireguard site-to-site VPN between two OpnSenses.
The speed you get between two VPN endpoints is limited by the minimum of both (and by the speed between both sides when do do not use encryption). Also, if the encryption is done on the router itself, everything that is done on the router adds to the CPU load (i.e. routing, NAT, firewalling, packet inspection, logging)...
I'm also surprised how slow Wireguard is on generic X86 machines, i guess it's lack of hardware offloading and acceleration. ChaCha also least profiting from cpu acceleration. No surprise OpenVPN could be faster on multigigabit connections. Perhaps WG is ment only for Androids / Arm CPUs?
First to note I didn't see any impact of OPNsense functions except ZenArmor active mode. No difference even with firewall off.
I saw slow performance also on V1500B too. It's like a curse, slow everywhere except on lowend devices.
Glad I took N305 type of PC instead of N100 for the firewall. I can now fly 1600mbps, but that still low considering it's 8505 vPro CPU with double IPC, all possible extensions and even QAT. The CPU is up to 200x faster in crypto benchmarks than Armada 385, it's stronger than my 16core desktop PC, yet an old router with Armada 385 without cpu extensions can do decent 800mbps - a half. I don't get it. How?
Performance of the cpus mentioned:
https://www.cpubenchmark.net/compare/4412vs4304vs5157vs3426vs4775/Intel-Celeron-N5105-vs-AMD-Ryzen-Embedded-V1500B-vs-Intel-N100-vs-AMD-Ryzen-7-3700U-vs-Intel-Pentium-Gold-8505
Quote from: meyergru on November 29, 2023, 07:10:56 PM
I obviously miss how you are measuring. One side is an OpnSense with an N5105 CPU, but what is the other one?
I assumed this is a wireguard site-to-site VPN between two OpnSenses.
The speed you get between two VPN endpoints is limited by the minimum of both (and by the speed between both sides when do do not use encryption). Also, if the encryption is done on the router itself, everything that is done on the router adds to the CPU load (i.e. routing, NAT, firewalling, packet inspection, logging)...
To add a little more colour for you, the machine on the other end is just a generic ubuntu 22.04 server, it's acting as a client, when it does iperf3 connection to the WAN IP I get the near line speeds, when it connects to wireguard hosted by opnsense or when it connects to the wireguard service on generic ubuntu 22.04 server behind the opnsense server I get the reduced performance.
i'm not an expert but I don't believe opnsense would be doing any crypotography when it's simply matching packets that match a NAT rule so that doesn't explain that. again I appreciate everyone input, either i've missed something big or perhaps I should see how pfsense will handle this work.
What I am trying to tell you is that if that counterpart Ubuntu machine is, say, able to handle wire speed at 1 GBit/s, but VPN speed at 300 MBit/s for the same reasons I think your OpnSense is slow, then maybe it is not your OpnSense that is the culprit.
In that case, you could use a 14900K for an OpnSense and nothing would change in your VPN speed measurements, because both sides have to handle the encryption.
To get into that situation is very easy: For example, if that "generic ubuntu 22.04 server" is running on the wrong 5.15 or 6.2 kernel or as a VM under Proxmox, then AVX2 extensions could be disabled by accident (https://github.com/openzfs/zfs/issues/15223), even if your CPU has them.
Details matter. There is a saying in german: "Wer misst, misst Mist." (Who measures, measures manure). The framework conditions are vital for the assessment of the validity of the outcome.