MSS Clamping Strategy Question - VXLAN Environment with OPNsense

Started by systeme, February 19, 2025, 01:58:27 PM

Previous topic - Next topic
Hello,

We are experiencing TCP fragmentation issues on our network infrastructure, which have been resolved by implementing MSS Clamping (https://docs.opnsense.org/manual/firewall_scrub.html). Here are the details of our environment and situation:

Environment:

  • Proxmox Virtual Environment (PVE) with multi-site Software Defined Networking (SDN) using VXLAN zones
  • MTU set to 1450 on all VMs due to VXLAN encapsulation requiring 50 additional bytes (Proxmox documentation: https://pve.proxmox.com/pve-docs/chapter-pvesdn.html)
  • OPNsense virtualized on these PVE hosts


Current Configuration:

  • MSS Clamping enabled via "Firewall: Settings: Normalization"
  • Specific rules created for interfaces experiencing issues:
    • WireGuard VPN group interface (configured on OPNsense)
    • 2 LAN interfaces (one with VMs using WireGuard+OpenVPN, and another where we limited the source to the VM requiring long curl requests)
    • Max MSS value set to 1250

Symptoms observed before correction (non-exhaustive list):

  • Timeouts on long requests (curl)
  • Access issues to Proxmox consoles
  • SSH connection difficulties

Note:
  • Currently, no negative impact observed on IPSec tunnels configured on OPNsense or other LAN interfaces

Our question concerns the optimal strategy for MSS Clamping implementation:

  • Should we apply it globally across the entire network?
  • Or is it better to maintain our current targeted approach, applying it only to interfaces experiencing issues?
  • Would there be a knock-on effect if we activated this "normalization" everywhere?

Thank you in advance,

Best regards,

MSS clamping is mostly needed when the firewall is very strictly configured regarding ICMP types. You can have automatic MSS clamping with PMTUD if you allow ICMP to do its thing in your environment.

https://en.wikipedia.org/wiki/Path_MTU_Discovery
Hardware:
DEC740

Thank you for your answer.

How do I get the "path mtu discovery" (PMTUD) function to work properly? Is it possible to do this other than with MSS Clamping ?

Similar problem : https://community.spiceworks.com/t/network-mtu-problems/1112518/2


February 19, 2025, 02:49:57 PM #3 Last Edit: February 19, 2025, 02:51:59 PM by Monviech (Cedrik)
PMTU is a standard, each device on the layer3 network path must abide to this standard.

It only fails if some ICMP types are filtered along the way from source to destination.

If it works correctly the client will receive an ICMP message with the correct MSS to use and it will cache and use that MSS.

Though in your case, if you tunnel layer 2 over layer 3, it could make static MSS clamping mandatory because clients do not expect their layer2 network to reduce MTU with something like vxlan, though unsure here.
Hardware:
DEC740

As far as I know e.g. WireGuard will not send "fragmentation needed but DF set" messages.

So it can always happen that some intermediate smaller MTU link is not discoverable by end systems.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

ICMP is filtered on the public IP on the Proxmox VE side and on the additional public IP of the server used by the Opnsense WAN interface.
Do you think that allowing it would change anything? Since PMTU is a standard, it is natively authorized?

Edit: I've tried authorizing it, but it doesn't change anything.

Do you think MSS Clamping should be applied across the entire network?
Would there be any problems if we activated this "normalization" everywhere?

Reducing MSS can theoretically increase the amount of transmitted packets. E.g., if there is routed SMB or NFS traffic with Jumbo frames, squashing the over 8000 MSS down to 1420 would really cripple the performance.

I would never apply it organisation wide without knowing what kind of traffic there is.
Hardware:
DEC740

Thanks for your feedback.

It confirms what we thought, which is why we only applied it where it caused problems.

We don't have NFS, nor SMB etc... but if the infrastructure evolves. But in any case, it's preferable not to create other problems.