Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - velaar

#1
Solved!  8)

So it was the source host sending malformed packets. I'm not 100% certain as to why but I blame Windows. It is possible that Windows drivers were somehow affecting NV settings on the NICs so problem persisted across OSs.

Reflashing the NIC and resetting Windows drivers solved it.

I was only able to confirm that it was happening by doing a LOT more wiresharking on both ends of the network with  Promiscuous mode enabled across all interfaces.
#2
Alright. I give up. I spent 4 nights on it and have no clue as to what is going on.

[Problem statement]
In a 10 gigabit 2-tier home network 1 host mystically causes 100% CPU load and has speeds waaaay below link speed. Looks like no other hosts are affected.

Let's dig in.

[Network info, diagnostic steps]
- full-sized image attached

Apologies for a very quick drawing. So this is a part of the network in question. All links are 10Gbps on different media (twinwax, MMF, CAT6A).

ICX6610 serves as the main inter-vlan router and right now has a permissive ACL. Clients use ICX6610 as a default gateway and it forwards internet-bound traffic to OPN and through it to the internet via VLAN128.

Vlans: VLAN20, VLAN33 are client VLANs. VLAN128 is for the icx <> OPN connection only and VLAN64 is for OPN <> modem.


Method of testing is iperf3 running directly on OPN.

  • Client 1 on VLAN20 speeds to OPN are below 1Gbps and cause 100% cpu load on OPN.
  • Client 2 is on the same host as OPN physically but uses dedicated hardware. Achieves link speed (9.90Gbps) at ~ 50% CPU load on OPN
  • Client 3 is a separate non-virtualized Linux machine that also works at link speed (~9.1GBps) at about 50% CPU

  • Client 1 <> Client 2 is link speed
  • Client 1 <> Client 3 is also link speed

Attempted bringing client 1 into VLAN128 (so icx6610 doesn't do any routing) - same.

Notes:

  • client 1 is windows but was booted into linux and had exactly the same speed.
  • NIC was replaced with a different model to test on both - no notable difference.
  • Wireguard would occasionally show TCP window full and ACKd unseen segment.
  • PFsense presents the same behaviour
  • Last know working configuration was on PFsense 2.5.2 so a while ago. Have no intention going back there
  • Disabling firewall didn't help, while booting linux livecd on the same machine produced link speeds as expected


[Configuration]
OPNsense is qemu/libvirt virtualized with passthrough of an Intel IX (Intel Corporation 82599ES 10-Gigabit SFI/SFP+) NIC.
Client 2 on the same machine has a different NIC passed through it.

OPNsense has 12 cores allocated to it of Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz and 32GB of RAM

The issue presents itself on a clean installation as well as on a tuned OPNsense.

top -aSCHIP shows that the 100% CPU load is generated by interrupts.  intr{swi1: netisr XX}  and kernel{if_io_tqg_X}.

Any suggestions are welcome as I still can't diagnose the issue. Happy to provide any additional details.