[SOLVED] slow IPsec performance

Started by fraenki, August 29, 2017, 03:45:36 PM

Previous topic - Next topic
August 29, 2017, 03:45:36 PM Last Edit: November 17, 2017, 12:42:45 AM by fraenki
Hi,

I have deployed a new OPNsense cluster that shows abysmal IPsec performance:

- traffic over IPsec: ~1-2 Mbps
- traffic without IPsec: full speed

SSH file transfers will start at ~25 Mbps, but will immediately drop to 3 Mbps and drop even further within a few seconds.
HTTPS file transfers may even stall completely (this being our main issue). Other connections with "large" data transfers will also abort/stall.

There's no significant load shown in "top" when utilizing the IPsec tunnel.
Tunnel config is pretty old fashioned: AES256/SHA256/DH Group 2 (same for Phase 1+2)

Enable/disable HW offloading does not make any difference.
Hardware is a Intel x5-Z8350 SOC with a Realtek NIC (UP board).

I've seen a lot similar reports for pfSense:
https://superuser.com/questions/570049/pfsense-firewall-blocking-some-outbound-web-packets-large-http-downloads-just
https://forum.pfsense.org/index.php?topic=74159.msg405436
https://forum.pfsense.org/index.php?topic=123823.msg683776
(Just google for "pfsense ipsec speed"...)

We have some other OPNsense clusters that don't show this issue.

FWIW, this is the only location with a PPPoE router. I've tested the same PPPoE router (Zyxel VMG1312-B30A) at another location with no issues. So I don't think it's the router that causes this issue.

Any ideas?


Thanks
- Frank

FWIW, I've already tested with MTU 1300 and MSS 1300 on the WAN interface, but this didn't change anything.

If you have 25Mbps and a throttle to 1-2Mbps it's mostly packetloss  (line, nic, driver etc) and a suboptimal windows size. Packetloss would also slow down IPSec, so I'd go for problems on the line or the nic.

Quote from: mimugmail on August 29, 2017, 06:05:56 PM
If you have 25Mbps and a throttle to 1-2Mbps it's mostly packetloss  (line, nic, driver etc) and a suboptimal windows size. Packetloss would also slow down IPSec, so I'd go for problems on the line or the nic.

Please note that the throttle only occurs for traffic that goes through the IPsec tunnel. When sending traffic to the same host without IPsec it easily reaches full speed.

Oh, ok, now I've read the complete thread :)
Can you try MSS to 1000 on IPSEC or LAN interface?

I only use the router/modems from Zyxel in bridge mode, perhaps they have some sort or IPSEC replay detection which is enabled?

August 29, 2017, 10:18:28 PM #5 Last Edit: August 29, 2017, 10:25:00 PM by fraenki
Quote from: mimugmail on August 29, 2017, 07:07:30 PM
Can you try MSS to 1000 on IPSEC or LAN interface?

I've set both MTU and MSS to 1000. Doesn't make a difference. :(

Quote from: mimugmail on August 29, 2017, 07:07:30 PM
I only use the router/modems from Zyxel in bridge mode, perhaps they have some sort or IPSEC replay detection which is enabled?

I'm pretty sure it's not the router, I've tested the Zyxel router at home before sending it to the remote location. I was able to use IPsec at full speed with this router and another OPNsense firewall. (BTW the Zyxel router replaced a LANCOM router which showed the same IPsec performance issue.)

I *guess* it's a OPNsense configuration issue, or a general networking issue. I've read so many similar reports regarding pfSense, but wasn't able to find a solution yet. :(

I've captured a TCP dump (on OPNsense) while copying a large file over SSH. I think it doesn't look too bad, right? (see attachment)


- Frank

MTU should be higher than MSS. Try 1200 MTU and 1000 MSS. Also show a complete capture (first 5 seconds but wth 3way handshake) inside the tunnel.
Do you use some QoS outside of the tunnel to guarantee traffic to IPSEC? Then the reordering could also throttle the tunnel.

August 30, 2017, 02:33:27 PM #7 Last Edit: August 30, 2017, 03:57:08 PM by fraenki
Quote from: mimugmail on August 30, 2017, 07:36:07 AM
MTU should be higher than MSS. Try 1200 MTU and 1000 MSS.

OPNsense already reduces the MSS by 40. If I configure a MSS of 1000, OPNsense will set it to 960. Would this be sufficient or should I try 1200MTU/1000MSS nonetheless?

Quote from: mimugmail on August 30, 2017, 07:36:07 AM
Also show a complete capture (first 5 seconds but wth 3way handshake) inside the tunnel.

I've captured the HTTPS connection, because it stalls/breaks very quickly and the dump is rather small:
(sorry, can't paste it in this forum due to post size limits)

TCP capture on the "bad" OPNsense at the remote location (behind a PPPoE router):
http://paste.debian.net/plainh/565326dd

TCP capture on the "good" OPNsense at the other location:
http://paste.debian.net/plainh/b8df834c

And >1 minute after the HTTPS connection has died, I'm seing the following log entries on the "good" firewall:

00:02:42.558779 rule 1..16777216/0(match): block out on enc0: HTTP_SERVER.443 > HTTP_CLIENT.58101: Flags [P.], seq 3004363110:3004363875, ack 423027980, win 1264, length 765
00:00:00.000056 rule 1..16777216/0(match): block out on enc0: HTTP_SERVER.443 > HTTP_CLIENT.58101: Flags [F.], seq 765, ack 1, win 1264, length 0
00:00:00.093263 rule 1..16777216/0(match): block out on enc0: HTTP_SERVER.443 > HTTP_CLIENT.58101: Flags [F.], seq 765, ack 1, win 1264, length 0
00:00:00.247810 rule 1..16777216/0(match): block out on enc0: HTTP_SERVER.443 > HTTP_CLIENT.58101: Flags [FP.], seq 0:765, ack 1, win 1264, length 765
00:00:00.248096 rule 1..16777216/0(match): block out on enc0: HTTP_SERVER.443 > HTTP_CLIENT.58101: Flags [FP.], seq 0:765, ack 1, win 1264, length 765
00:00:00.247992 rule 1..16777216/0(match): block out on enc0: HTTP_SERVER.443 > HTTP_CLIENT.58101: Flags [FP.], seq 0:765, ack 1, win 1264, length 765
00:00:00.247945 rule 1..16777216/0(match): block out on enc0: HTTP_SERVER.443 > HTTP_CLIENT.58101: Flags [FP.], seq 0:765, ack 1, win 1264, length 765
00:00:00.248026 rule 1..16777216/0(match): block out on enc0: HTTP_SERVER.443 > HTTP_CLIENT.58101: Flags [FP.], seq 0:765, ack 1, win 1264, length 765
00:00:00.247961 rule 1..16777216/0(match): block out on enc0: HTTP_SERVER.443 > HTTP_CLIENT.58101: Flags [FP.], seq 0:765, ack 1, win 1264, length 765
00:00:00.248023 rule 1..16777216/0(match): block out on enc0: HTTP_SERVER.443 > HTTP_CLIENT.58101: Flags [FP.], seq 0:765, ack 1, win 1264, length 765

(this is from another test, so the local port and seq numbers will not match anything)

Does this tell us why the HTTPS connection suddenly stalls/breaks?

Quote from: mimugmail on August 30, 2017, 07:36:07 AM
Do you use some QoS outside of the tunnel to guarantee traffic to IPSEC? Then the reordering could also throttle the tunnel.

No QoS, it's a (supposedly) simple setup: PPPoE router <-> VLAN switch <-> OPNsense.
The VLAN switch is simple and stupidly cheap, have it at another location too.

Thanks
- Frank

In the 3way handshake the MSS is still 1460, so your setting doesn't work.
Also the windows size of 256 is way too small .. it should grow to 64kb when there's no loss.

A pcap file (you know my private Mail) would be better to trace.

Quote from: mimugmail on August 30, 2017, 04:44:29 PM
In the 3way handshake the MSS is still 1460, so your setting doesn't work.
Also the windows size of 256 is way too small .. it should grow to 64kb when there's no loss.

That's because the TCP dump was taken from the virtual IPsec interface "enc0". On OPNsense the MTU for this interface cannot be changed. However, I've set an MTU1200/MSS1000 on the WAN interface, but it didn't change anything.

Quote from: mimugmail on August 30, 2017, 04:44:29 PM
A pcap file (you know my private Mail) would be better to trace.

Will do, thanks for your help :)


- Frank

The problem is that ESP is not TCP, so on the WAN side it wont be reduced since it don't sees any TCP packets.

But I'm quite sure it's something with the line because the window size is really too small.

Quote from: mimugmail on August 31, 2017, 07:02:29 AM
The problem is that ESP is not TCP, so on the WAN side it wont be reduced since it don't sees any TCP packets.

That's true, but in this case the OPNsense firewall is behind NAT and thus it's all UDP 4500. On the WAN interface the packets seem to be really small...

10:42:25.923159 IP GOOD_OPNSENSE.4500 > BAD_OPNSENSE.4500: UDP-encap: ESP(spi=0xc63f6a01,seq=0x6ea), length 104
10:42:25.923444 IP BAD_OPNSENSE.4500 > GOOD_OPNSENSE.4500: UDP-encap: ESP(spi=0xce9f75ee,seq=0xe95), length 248
10:42:25.923812 IP BAD_OPNSENSE.4500 > GOOD_OPNSENSE.4500: UDP-encap: ESP(spi=0xce9f75ee,seq=0xe96), length 376
10:42:25.924155 IP BAD_OPNSENSE.4500 > GOOD_OPNSENSE.4500: UDP-encap: ESP(spi=0xce9f75ee,seq=0xe97), length 376


Quote from: mimugmail on August 31, 2017, 07:02:29 AM
But I'm quite sure it's something with the line because the window size is really too small.

Hmm, interesting... I'm just wondering: How would a line issue only affect IPsec and no other connections?


Thanks
- Frank

A quick update...

* tested with IKEv2 instead of IKEv1
* tested various MTU/MSS combinations (on WAN and all other interfaces, except enc0)
* in Firewall->Settings->Advanced tested the option "Disable reply-to"
* double-checked that no feature on the Switch causes this

Still no luck. Any idea?


Thanks
- Frank

I am currently testing IPSec performance (Release 17.7). I am using the AES-NI driver and the achievable performance is around 450 Mbps. This corresponds about to what is stated in the appliance shop:
https://www.applianceshop.eu/opnsense-quad-core-gen3-10gb-ssd.html#product-attribute-specs-table

Has anyone achieved higher performance ? What is the limiting factor for the poor performance ? I would assume the more cores a cpu has, the higher the throughput. The cpu load on the dashboard shows only 15%

Cheers
Peter


Hi Peter,

welcome to the forums!

Quote from: Cerbera on October 10, 2017, 12:12:33 PM
Has anyone achieved higher performance ? What is the limiting factor for the poor performance ? I would assume the more cores a cpu has, the higher the throughput. The cpu load on the dashboard shows only 15%

I think this is off-topic... This topic is about solving a very specific IPsec performance issue, not about comparing IPsec performance in general.


Regards
- Frank