Regular LAN Traffic hits Default deny / state violation rule since 24.7

Started by TheDJ, September 04, 2024, 10:39:00 PM

Previous topic - Next topic
Quote from: meyergru on September 09, 2024, 10:40:53 AM
With the normal 24.7.3 kernel, I can confirm the "pf: ICMP error message too short (ip6)" messages - which go away with the no-sa kernel.

I can also confirm the "pf: loose state match" notices with both kernels.

I just went back to OPNsense 24.1 (imported config from 24.7)  and, with debug logging turned on,... taddahhh... I also see same 'pf: loose state match' notices.
Best regard / besten Gruß
Reza

---
"Es irrt der Mensch solang er strebt" (Goethe)

Quote from: rkube on September 09, 2024, 04:16:09 PM
I just went back to OPNsense 24.1 (imported config from 24.7)  and, with debug logging turned on,... taddahhh... I also see same 'pf: loose state match' notices.

Thanks, good to know. Maybe it's a different (but somehow related) issue that did not surface in the same way until now.

Do you also see the performance degredation/FW hits?

Hi!

Quote from: TheDJ on September 10, 2024, 07:02:06 PM
Thanks, good to know.
I was a little bit disappointed at that moment.

Quote from: TheDJ on September 10, 2024, 07:02:06 PM
Maybe it's a different (but somehow related) issue that did not surface in the same way until now.
I'm going to take a step back and look at the whole thing with some distance to the maybe blinding "FreeBSD 14 / IPv6 issue".

Quote from: TheDJ on September 10, 2024, 07:02:06 PM
Do you also see the performance degredation/FW hits?
Unfortunately, yes.
But I did something just days before upgrading to OPNsense 27.4: I had previously done the bonding (lacp) of the interfaces and VLAN tagging on the host (promox) and put the resulting bond0.[vlan id] interfaces as separated virtio-networkcards to the OPNsense-VM. I just changed that, because I was unhappy with creating an interface for each new  (or changed) VLAN on the host and having to guess in which order it will be assigned to the OPNsense interfaces (this was the behavior under virtualbox).

So, I'm going back to 24.7 (no_SA kernel), but assemble again the bond- and vlan-interfaces on the host - assuming that linux (probably?) will have the better driver and working hardware offloading *fingers crossed* ;-)

Br
Reza

P.S.: Sorry, I'm a little bit sick at the moment and spending less time in front of my homelab at the moment ...
Best regard / besten Gruß
Reza

---
"Es irrt der Mensch solang er strebt" (Goethe)

Hi,

sorry, I missed your post for days ...

Quote from: meyergru on September 08, 2024, 10:47:18 PM
Quote from: rkube on September 08, 2024, 07:49:31 PM
The MTU I have set is (unfortunately) not the problem, as I am only testing between two local VLANs (MTU==1500) that are routed/filtered via opnsense. I could try jumbo frames, maybe it can get even worse ;-)

Yet you show results from a iperf3 test run against an internet IP?

Please dont beat me, but 198.18.0.0/15 are not public route-able IPs. (pssst: "bogus IPs" ;-] )

Quote from: meyergru on September 08, 2024, 10:47:18 PM
So there are alo VLANs and LAGGs in the mix? Maybe netmap and suricata as well? ???
I think of LAGGs and VLANs as very basic FW/Router interface types. Beside of pppoe I have not more in the mix. So... no netmap or suricata here.

Br
Reza
Best regard / besten Gruß
Reza

---
"Es irrt der Mensch solang er strebt" (Goethe)

@meyergru and @rkube: what are your ISPs (I assume both of you are in Germany)?

As mentioned, I am on a Telekom Dual Stack with 250/40. Maybe it is a routing/peering issue that coincidentally appeared at the same time. Then, the TCP packets might be just a little too late (running out of the TTL) and the state is closed? This would also explain why it is not perfectly consistent and now even hits 24.1?

Just for the record: 24.7.4 did not change/improve the behavior.

I didn't expect it to, because I did not see anything in the changelog that would indicate better behavior for v4, but I just wanted to note it here.

Is there anything else that could be done? I am very open to suggestions.

Quote from: TheDJ on September 12, 2024, 08:11:25 PM
Is there anything else that could be done? I am very open to suggestions.

Next of my shots into the dark light: We are reloading OPNsense, or just pf, a lot at the moment. But e.g. our laptop and other network devices, which has a lot of established TCP connections, stays "online" during this time.

So when OPNsense (pf) loses it's knowledge of all states (because of a reboot or config reload,...), the laptop still has the knowledge of it's already established connections.

When the laptop sends a TCP packet to another station with already established TCP state, it won't send a new SYN packet - it will just send acks (or maybe push acks) with sequence numbers.

OPNsense, seeing this traffic, does not know about this already established state and will log a debug message.
The more we test atm, the more we'll get this debug message.

And the packets will be blocked at "last rule", because of the state violation. "Works as designed" ;-)

Br
Reza

Best regard / besten Gruß
Reza

---
"Es irrt der Mensch solang er strebt" (Goethe)

This is true for very fresh traffic after a reboot/reconnect. But should stabilize after a few minutes. For me, the behavior is ongoing even after a few days.

After weeks of hunting down this behavior and literally exchanging every hardware component, I found the problem: it (presumably) was a firmware upgrade in a Wifi Access Point that worked as a wireless backhaul.

I deployed the new v7.XX branch for Zyxel NWA220AX-6E roughly at the same time. I performed multiple firmware upgrades on that device and even got it swapped via an RMA afterward, so I didn't think this was related.
Today I DOWNgraded it to a 6.XX firmware that I still had - 'poof' - all issues seem to be gone. I will continue to monitor the situation, but I believe that firmware for that device is borked. This leads to packet loss and in turn a closing of TCP states.

Quote from: TheDJ on October 21, 2024, 05:34:31 PM
Today I DOWNgraded it to a 6.XX firmware that I still had - 'poof' - all issues seem to be gone. I will continue to

Fingers crossed ;-)
Best regard / besten Gruß
Reza

---
"Es irrt der Mensch solang er strebt" (Goethe)