OPNsense Forum

Archive => 20.7 Legacy Series => Topic started by: randomwalk on September 09, 2020, 06:45:09 am

Title: High Packet Loss When Using VPN in OPNsense Virtualized in vSphere 7.0
Post by: randomwalk on September 09, 2020, 06:45:09 am
I am having a weird problem that I cannot figure out despite many hours of work. I hope someone here could give some thoughts as to why this is happening.

Background: I am running vSphere ESXi 7.0. I have a VM that is running OPNsense 20.1.9_1 (although I have also tried this with OPNsense 20.7.2 and get the same result). To be very conservative, I assigned 4 CPUs and 6 GB of RAM for this VM, and OPNsense reports that it is nowhere near using that much resources. I have two vswitches, one for the LAN and one for the WAN. On the particular LAN and WAN port associated with the OPNsense VM, I disabled all security (accept promiscuous mode, MAC address changes, and forged transmits) and I set VLAN trunking (0-4094).  I do run a few VLANs, so the VLAN trunking is needed. I don't think the reduced security is needed, but I just set everything to "accept" in case it was causing this problem. I have gigabit fiber on the WAN physical uplink, connected to the ISP gateway. Inside OPNsense Interface > Settings, I set it to disable all hardware offloading (CRC, TSO, LRO, and VLAN filtering).

Problem: Inside OPNsense, I have two gateways: one for the WAN and one through VPN (I use OpenVPN, with UDP protocol). There is no problem on the WAN gateway. I've tested large downloads that are tens of GBs from the internet and am able to get sustained full gigabit speed (around 90 MB/s) with 0% packet loss (as reported by DPINGER inside System > Gateways). So this indicates to me that there is no problem with the VM, the vswitches, or anything.

However, when I use firewall rules to direct traffic through the VPN gateway, I have problems.  If I force the download speed in the download client to be lowish, around 10 MB/s, then there is only a small amount of packet loss and the download proceeds to completion.  However, when I allow the speed to be higher, anything more than 15-20 MB/s, packet loss climbs very quickly (15 to 20% within 30 seconds, and continues to climb) and within 1-2 minutes, the VPN gateway just stops responding. The packet loss will go back down to 0% and VPN will work again if I stop the download. To be clear, the VPN connection is capable of much faster than 20 MB/s -- when I run OPNsense bare metal, I can easily get 50 MB/s on the VPN gateway with 0% packet loss.

Solutions?  So here is what is confusing me. When I run OPNsense in the VM, everything on the WAN works perfectly, at full gigabit speed with 0% packet loss. But when I direct traffic to VPN, I get huge packet loss that shuts down the gateway.

However, if I run OPNsense bare metal, I don't get any packet loss on WAN and VPN gateway. This indicates to me that the problem is not the VPN. There seems to be some weird interaction between using the VPN inside a VM that is causing the problem. I've tried everything, so what could it be?

Someone suggested that this looks like a fragmentation issue, and recommended that I play around with the MTU and MSS settings.  I've played around, but am not sure how to properly set the MTU and MSS.  There are at least three places where you can set these parameters in OPNsense that I can see:

(1)  Inside Interfaces > [VPN Interface].  Do I need to set this for WAN interface too?
(2)  Inside Firewall > Settings > Normalization.  And this section is confusing to me, and I am not sure how to properly set it.
(3)  Inside VPN > OpenVPN > Clients, where I can try to set MTU and MSS directly in the VPN connection settings.

Which of the three places should I set MTU and MSS?  Do I need to reboot every time I change these settings for them to take effect?

My MTU test:  Based on the ping test suggested on the link below, the largest ping that does not fragment is 1472, which suggests my MTU is 1500.  This is using a Windows computer whose traffic is directed through the VPN.  https://kb.netgear.com/19863/Ping-Test-to-determine-Optimal-MTU-Size-on-Router (https://kb.netgear.com/19863/Ping-Test-to-determine-Optimal-MTU-Size-on-Router)
Title: Re: High Packet Loss When Using VPN in OPNsense Virtualized in vSphere 7.0
Post by: Fright on September 09, 2020, 07:42:11 am
what's in the openvpn log?
have you tried tcp?
Quote
(1)  Inside Interfaces > [VPN Interface].  Do I need to set this for WAN interface too?
(2)  Inside Firewall > Settings > Normalization.  And this section is confusing to me, and I am not sure how to properly set it.
(3)  Inside VPN > OpenVPN > Clients, where I can try to set MTU and MSS directly in the VPN connection settings.
(3) I think
Title: Re: High Packet Loss When Using VPN in OPNsense Virtualized in vSphere 7.0
Post by: randomwalk on September 09, 2020, 08:02:21 am
I have tried using TCP protocol for OpenVPN.  That does not solve the packet loss issue, and makes the maximum download speed (before the gateway shuts down) slower than UDP.

OpenVPN log level is set to 4.  Here is a snippet of what I see if I filter to the word "warning":

2020-09-08T22:51:34   openvpn[80340]: 128.14.134.170:35260 WARNING: Bad encapsulated packet length from peer (5635), which must be > 0 and <= 1626 -- please ensure that --tun-mtu or --link-mtu is equal on both peers -- this condition could also indicate a possible active attack on the TCP link -- [Attempting restart...]
2020-09-08T22:37:25   openvpn[51358]: WARNING: 'auth' is used inconsistently, local='auth [null-digest]', remote='auth SHA1'
2020-09-08T22:37:25   openvpn[51358]: WARNING: 'cipher' is used inconsistently, local='cipher AES-256-GCM', remote='cipher AES-256-CBC'
2020-09-08T22:37:25   openvpn[51358]: WARNING: 'link-mtu' is used inconsistently, local='link-mtu 1550', remote='link-mtu 1558'
2020-09-08T21:37:25   openvpn[51358]: WARNING: 'auth' is used inconsistently, local='auth [null-digest]', remote='auth SHA1'
2020-09-08T21:37:25   openvpn[51358]: WARNING: 'cipher' is used inconsistently, local='cipher AES-256-GCM', remote='cipher AES-256-CBC'
2020-09-08T21:37:25   openvpn[51358]: WARNING: 'link-mtu' is used inconsistently, local='link-mtu 1550', remote='link-mtu 1558'
2020-09-08T21:23:58   openvpn[80340]: 74.82.47.2:56208 WARNING: Bad encapsulated packet length from peer (5635), which must be > 0 and <= 1626 -- please ensure that --tun-mtu or --link-mtu is equal on both peers -- this condition could also indicate a possible active attack on the TCP link -- [Attempting restart...]
2020-09-08T20:37:25   openvpn[51358]: WARNING: 'auth' is used inconsistently, local='auth [null-digest]', remote='auth SHA1'
2020-09-08T20:37:25   openvpn[51358]: WARNING: 'cipher' is used inconsistently, local='cipher AES-256-GCM', remote='cipher AES-256-CBC'
2020-09-08T20:37:25   openvpn[51358]: WARNING: 'link-mtu' is used inconsistently, local='link-mtu 1550', remote='link-mtu 1558'
2020-09-08T19:38:50   openvpn[80340]: 162.142.125.35:49570 WARNING: Bad encapsulated packet length from peer (18245), which must be > 0 and <= 1626 -- please ensure that --tun-mtu or --link-mtu is equal on both peers -- this condition could also indicate a possible active attack on the TCP link -- [Attempting restart...]
2020-09-08T19:38:49   openvpn[80340]: 162.142.125.35:47916 WARNING: Bad encapsulated packet length from peer (5635), which must be > 0 and <= 1626 -- please ensure that --tun-mtu or --link-mtu is equal on both peers -- this condition could also indicate a possible active attack on the TCP link -- [Attempting restart...]
2020-09-08T19:37:25   openvpn[51358]: WARNING: 'auth' is used inconsistently, local='auth [null-digest]', remote='auth SHA1'
2020-09-08T19:37:25   openvpn[51358]: WARNING: 'cipher' is used inconsistently, local='cipher AES-256-GCM', remote='cipher AES-256-CBC'
2020-09-08T19:37:25   openvpn[51358]: WARNING: 'link-mtu' is used inconsistently, local='link-mtu 1550', remote='link-mtu 1558'
2020-09-08T19:31:40   openvpn[80340]: 193.34.131.57:39276 WARNING: Bad encapsulated packet length from peer (5635), which must be > 0 and <= 1626 -- please ensure that --tun-mtu or --link-mtu is equal on both peers -- this condition could also indicate a possible active attack on the TCP link -- [Attempting restart...]

Here is what I see if I filter to the word "error":

2020-09-08T19:38:50   openvpn[80340]: 162.142.125.35:33464 SIGUSR1[soft,tls-error] received, client-instance restarting
2020-09-08T19:38:50   openvpn[80340]: 162.142.125.35:33464 Fatal TLS error (check_tls_errors_co), restarting
2020-09-08T19:38:50   openvpn[80340]: 162.142.125.35:33464 TLS Error: tls-crypt unwrapping failed from [AF_INET]162.142.125.35:33464
2020-09-08T19:38:50   openvpn[80340]: 162.142.125.35:33464 tls-crypt unwrap error: packet too short
2020-09-08T19:38:50   openvpn[80340]: 162.142.125.35:32788 SIGUSR1[soft,tls-error] received, client-instance restarting
2020-09-08T19:38:50   openvpn[80340]: 162.142.125.35:32788 Fatal TLS error (check_tls_errors_co), restarting
2020-09-08T19:38:50   openvpn[80340]: 162.142.125.35:32788 TLS ERROR: initial packet local/remote key_method mismatch, local key_method=2, op=P_CONTROL_HARD_RESET_CLIENT_V1
2020-09-08T16:51:10   openvpn[80340]: 192.35.168.193:58604 SIGUSR1[soft,tls-error] received, client-instance restarting
2020-09-08T16:51:10   openvpn[80340]: 192.35.168.193:58604 Fatal TLS error (check_tls_errors_co), restarting
2020-09-08T16:51:10   openvpn[80340]: 192.35.168.193:58604 TLS Error: tls-crypt unwrapping failed from [AF_INET]192.35.168.193:58604
2020-09-08T16:51:10   openvpn[80340]: 192.35.168.193:58604 tls-crypt unwrap error: packet too short
2020-09-08T16:51:10   openvpn[80340]: 192.35.168.193:58230 SIGUSR1[soft,tls-error] received, client-instance restarting
2020-09-08T16:51:10   openvpn[80340]: 192.35.168.193:58230 Fatal TLS error (check_tls_errors_co), restarting
2020-09-08T16:51:10   openvpn[80340]: 192.35.168.193:58230 TLS ERROR: initial packet local/remote key_method mismatch, local key_method=2, op=P_CONTROL_HARD_RESET_CLIENT_V1

what's in the openvpn log?
have you tried tcp?
Quote
(1)  Inside Interfaces > [VPN Interface].  Do I need to set this for WAN interface too?
(2)  Inside Firewall > Settings > Normalization.  And this section is confusing to me, and I am not sure how to properly set it.
(3)  Inside VPN > OpenVPN > Clients, where I can try to set MTU and MSS directly in the VPN connection settings.
(3) I think
Title: Re: High Packet Loss When Using VPN in OPNsense Virtualized in vSphere 7.0
Post by: Fright on September 09, 2020, 08:13:02 am
verbose 4 is realy huge )
is there messages like "AEAD Decrypt error: bad packet ID.."?

Title: Re: High Packet Loss When Using VPN in OPNsense Virtualized in vSphere 7.0
Post by: randomwalk on September 09, 2020, 09:29:31 am
No, there is nothing like that in the log, at least not at log level 4.

verbose 4 is realy huge )
is there messages like "AEAD Decrypt error: bad packet ID.."?
Title: Re: High Packet Loss When Using VPN in OPNsense Virtualized in vSphere 7.0
Post by: Fright on September 09, 2020, 09:38:58 am
oops. sorry. it's probably on server side log only.
do you have access to server logs?
and have you already tried --mssfix?

Title: Re: High Packet Loss When Using VPN in OPNsense Virtualized in vSphere 7.0
Post by: randomwalk on September 10, 2020, 10:25:43 am
No, I don't have access to the server logs since I don't own the server side.

I have tried adding "mssfix 1300" or "mssfix 1000" in the "Advanced" box inside the OpenVPN client config.  I then rebooted to make sure everything is fresh.  Unfortunately, that does not get rid of the packet loss issue.

Without manually adding the mssfix option, the default mssfix is 1450 according to the OpenVPN log.

I have also tried to set MSS to 1300 or 1000 inside Interfaces > [VPN Interface], then rebooted to make sure it takes effect.  Unfortunately, this also did not resolve the packet loss issue.

Based on this, I'm not sure MSS is the fix for this problem.  Any other ideas?

oops. sorry. it's probably on server side log only.
do you have access to server logs?
and have you already tried --mssfix?
Title: Re: High Packet Loss When Using VPN in OPNsense Virtualized in vSphere 7.0
Post by: Fright on September 10, 2020, 04:02:40 pm
Quote
Based on this, I'm not sure MSS is the fix for this problem.  Any other ideas?
agree.
but something strange in this strings:
162.142.125.35:47916 WARNING: Bad encapsulated packet length from peer (5635)
192.35.168.193:58604 TLS Error: tls-crypt unwrapping
its port-scanning bots knocking on your ports
what is your OpenVPN config? server enabled? which port it use?
Title: Re: High Packet Loss When Using VPN in OPNsense Virtualized in vSphere 7.0
Post by: randomwalk on September 11, 2020, 08:57:22 am
Yes, I have a couple of instances of OpenVPN server running for remote access to my network, one on TCP and one on UDP.  I connect using certificates, so it's pretty secure.

Quote
Based on this, I'm not sure MSS is the fix for this problem.  Any other ideas?
agree.
but something strange in this strings:
162.142.125.35:47916 WARNING: Bad encapsulated packet length from peer (5635)
192.35.168.193:58604 TLS Error: tls-crypt unwrapping
its port-scanning bots knocking on your ports
what is your OpenVPN config? server enabled? which port it use?
Title: Re: High Packet Loss When Using VPN in OPNsense Virtualized in vSphere 7.0
Post by: Fright on September 11, 2020, 12:26:35 pm
Quote
have a couple of instances of OpenVPN server running for remote access to my network, one on TCP and one on UDP.
may be tcp-instance use 443 port? however, imho its not the source of packets loss issue. just interested why port probing logs so often

imo it starts to look like "openvpn through vmware" issue.
someone reported that he was helped by  'downgrading' from vmxnet3 to e1000.
someone reported that he was helped by shaping traffic on ovpn fw rule to tighten the channel a little.
i would try adding a "mtu-test" directive and look in logs for results.
and then try to use mssfix with "fragment" directive
(something like:
tun-mtu 1500
fragment 1300
mssfix)