I have been an OPNsense user for multiple years and so far it has been mostly stable.
Since the upgrade to 24.7, I started to experience very long loading times and sometimes even timeouts for normal web traffic on websites and smartphone apps.
When checking the logs, I started to see FW rule hits for the Default deny / state violation rule for 443 traffic. This seems to be traffic from (prematurely) closed states (TCP A or PA flag). The FW optimization is set to conservative.
I am on a (German) Telekom DS connection and running OPNsense 24.7.3_1. I know about the current kernel problems for IPv6, but to me this seems like different behavior: I don't have any ping/traceroute problems, I tried the full revert experimental kernel from github and the rule hits stem from IPv4 connections.
The only thing I could imagine would be that the devices in the nets contact the servers with their v6 addresses, ICMP fails and they try to resend the package on v4 without an existing state. But I don't know if that is realistic behavior.
Any advice on how to investigate?
Thanks and regards!
Hello,
I can also observe this behaviour
Quotehits Default deny / state violation rule
with actually allowed traffic (iperf3).
Until just now I thought that the errors are not related, but we also share the one in the topic "[Solved] Log flooded with "pf: ICMP error message too short (ip6)"" (https://forum.opnsense.org/index.php?topic=42632.0) TCP symptoms.
QuoteBut I don't know if that is realistic behavior.
Currently, I have to say, I'm sceptical about `pf`, but I don't think it's directly related to the ICMPv6 issue.
QuoteAny advice on how to investigate?
Next I'm giving a try:
opnsense-update -zkr 24.7.3-no_sa
Many thx to @meyergru
I will also keep a close eye on this topic and report here if I find out anything.
Br
Reza
P.S.: @TheDJ: I'm native german speaking...
Just a shot in the dark: Service "IGMP Proxy" active on the some or all interfaces?
I will have to test that next. Even before I apply the patch 24.7.3-no_sa.
But if I remember correctly, only the interfaces on which the IGMP proxy is running are affected.
Of course, this could just be a coincidence, but I think the other interfaces have a "clean" packet rate that is sufficiently close to the native bit rate of the interface - and the interfaces involved in IGMP proxy have a severely restricted bit rate
I don't run the IGMP Proxy service (actually not even using IGMP at all in my network).
So, I would assume that this is not related.
As currently the other thread is more active I posted my assumption based on the current testing there just a few moments ago.
But I am very open to switch to this one, if we feel like the "ICMP error message too short (ip6)" logs can be targeted with the full-revert kernel (and are therefore manageble), while the state losses don't seem to be affected by the -no_sa kernel.
P.S. I am also native German speaker, but I think we should keep it international-friendly in this thread :)
Quote from: TheDJ on September 06, 2024, 11:02:15 PM
I don't run the IGMP Proxy service (actually not even using IGMP at all in my network).
So, I would assume that this is not related.
Unfortunately, I could not confirm my suspicion of IGMP proxy. Would have been too nice (easy). ;)
Quote from: TheDJ on September 06, 2024, 11:02:15 PM
As currently the other thread is more active I posted my assumption based on the current testing there just a few moments ago.
But I am very open to switch to this one, if we feel like the "ICMP error message too short (ip6)" logs can be targeted with the full-revert kernel (and are therefore manageble), while the state losses don't seem to be affected by the -no_sa kernel.
It seems to me that
TCP connections can be drastically slower than UDP connections on the same route.
Reproducibly, some routes, e.g. over my slow pppoe interface, are only about 25% slower. I suspect with faster routes (1Gibt/s) it is 80% to almost 100% packet loss.
On the connections that don't show 100% loss anyway, however, there are noticeable dropouts lasting a few seconds every few minutes, during which 100% packet loss occurs.
That's why everything to do with TCP hardware offloading came into question for me. In the meantime, however, I have tried every "hw offlaodung turn off" combination without finding any significant differences.
In the firewall log you can find individual messages, such as the one in the topic, which indicate that traffic that is actually permitted is being dropped by last rule.
Together with the observed "debug" message:
'pf: loose state match...', however, it seems clear what is happening:
TCP packets are discarded because pf no longer recognises the TCP states. Each discarded packet slows down the TCP connection - each additional discarded packet slows it down even more.
I think that the underlying problem also explains why TCP connections have different speeds depending on the direction. And I don't just mean a small difference:
❯ iperf3 -c 198.18.50.136 -t 3 --bidir
Connecting to host 198.18.50.136, port 5201
[ 5] local 198.18.178.160 port 42184 connected to 198.18.50.136 port 5201
[ 7] local 198.18.178.160 port 42188 connected to 198.18.50.136 port 5201
[ ID][Role] Interval Transfer Bitrate Retr Cwnd
[ 5][TX-C] 0.00-1.00 sec 201 KBytes [color=red]1.64 Mbits/sec 2 [/color] 1.41 KBytes
[ 7][RX-C] 0.00-1.00 sec 111 MBytes 931 Mbits/sec
[ 5][TX-C] 1.00-2.00 sec 0.00 Bytes [color=red]0.00 bits/sec 1 [/color] 1.41 KBytes
[ 7][RX-C] 1.00-2.00 sec 111 MBytes 932 Mbits/sec
[ 5][TX-C] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
[ 7][RX-C] 2.00-3.00 sec 111 MBytes 932 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
Br
Reza
I do not see that:
#iperf3 -c iperf.online.net 198.18.50.136 -t 3 --bidir
Connecting to host iperf.online.net, port 5201
[ 5] local 192.168.10.3 port 48222 connected to 51.158.1.21 port 5201
[ 7] local 192.168.10.3 port 48226 connected to 51.158.1.21 port 5201
[ ID][Role] Interval Transfer Bitrate Retr Cwnd
[ 5][TX-C] 0.00-1.00 sec 60.2 MBytes 505 Mbits/sec 16 5.20 MBytes
[ 7][RX-C] 0.00-1.00 sec 96.6 MBytes 810 Mbits/sec
[ 5][TX-C] 1.00-2.00 sec 54.5 MBytes 457 Mbits/sec 64 3.24 MBytes
[ 7][RX-C] 1.00-2.00 sec 132 MBytes 1.11 Gbits/sec
[ 5][TX-C] 2.00-3.00 sec 53.2 MBytes 447 Mbits/sec 114 3.11 MBytes
[ 7][RX-C] 2.00-3.00 sec 132 MBytes 1.11 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID][Role] Interval Transfer Bitrate Retr
[ 5][TX-C] 0.00-3.00 sec 168 MBytes 470 Mbits/sec 194 sender
[ 5][TX-C] 0.00-3.03 sec 146 MBytes 405 Mbits/sec receiver
[ 7][RX-C] 0.00-3.00 sec 417 MBytes 1.16 Gbits/sec 3203 sender
[ 7][RX-C] 0.00-3.03 sec 361 MBytes 1000 Mbits/sec
These numbers correspond to my expected performance.
You could try with "-u" to see if UDP is faster. If it is, I would guess that your MTU is misconfigured. Can you try this (https://www.baeldung.com/linux/maximum-transmission-unit-mtu-ip) to find your real maximum MTU?
Probably, setting "-M 1400" would show if this is the problem.
Quote from: meyergru on September 08, 2024, 06:30:38 PM
You could try with "-u" to see if UDP is faster. If it is, I would guess that your MTU is misconfigured. Can you try this (https://www.baeldung.com/linux/maximum-transmission-unit-mtu-ip) to find your real maximum MTU?
Probably, setting "-M 1400" would show if this is the problem.
Many thanks for the answer.
The MTU I have set is (unfortunately) not the problem, as I am only testing between two local VLANs (MTU==1500) that are routed/filtered via opnsense. I could try jumbo frames, maybe it can get even worse ;-)
The reference to pppoe only referred to the fact that this is my slowest connection (100/40); however, the problem is not as obvious there as with my gigabit/VLAN/LAGG connections.
I would like to point out that this topic is about allowed traffic inexplicably hitting the "last rule", a problem with pf and TCP states is suspected, and that this is interfering with TCP connections.
Br
Reza
Another shot in the dark from my side (I would not be able to explain it, but maybe someone else): could it be a VLAN routing problem? The (LAN) interfaces that have this problem on my device are all VLANs. Not LAGGs, but VLANs just like your setup.
At the same time, I have a road warrior WireGuard VPN setup on the same box, leaving via the same WAN, which (at least from very shallow use) did not encounter any problem in this regard.
...and wireguard is UDP only.
Quote from: rkube on September 08, 2024, 07:49:31 PM
The MTU I have set is (unfortunately) not the problem, as I am only testing between two local VLANs (MTU==1500) that are routed/filtered via opnsense. I could try jumbo frames, maybe it can get even worse ;-)
Yet you show results from a iperf3 test run against an internet IP?
Quote from: rkube on September 08, 2024, 07:49:31 PM
The reference to pppoe only referred to the fact that this is my slowest connection (100/40); however, the problem is not as obvious there as with my gigabit/VLAN/LAGG connections.
I would like to point out that this topic is about allowed traffic inexplicably hitting the "last rule", a problem with pf and TCP states is suspected, and that this is interfering with TCP connections.
So there are alo VLANs and LAGGs in the mix? Maybe netmap and suricata as well? ???
Quote from: meyergru on September 08, 2024, 10:47:18 PM
So there are alo VLANs and LAGGs in the mix? Maybe netmap and suricata as well? ???
For me, only VLANs. No LAGGs, netmap and suricata (had it in IDS mode before, but turned it of without any difference). Also, these VLANs have been stable before (for months).
I also had a traffic shaper running beforehand, but it does not make a difference if it is running with or without it (although the iperf results show way more Retr packets if it is running with the shaper).
With the shaper:
# iperf3 -c iperf.online.net 198.18.50.136 --bidir
Connecting to host iperf.online.net, port 5201
[ 5] local 10.200.10.2 port 44254 connected to 51.158.1.21 port 5201
[ 7] local 10.200.10.2 port 44266 connected to 51.158.1.21 port 5201
[ ID][Role] Interval Transfer Bitrate Retr Cwnd
[ 5][TX-C] 0.00-1.00 sec 6.12 MBytes 51.4 Mbits/sec 15 222 KBytes
[ 7][RX-C] 0.00-1.00 sec 23.2 MBytes 195 Mbits/sec
[ 5][TX-C] 1.00-2.00 sec 4.88 MBytes 40.9 Mbits/sec 59 222 KBytes
[ 7][RX-C] 1.00-2.00 sec 26.7 MBytes 224 Mbits/sec
[ 5][TX-C] 2.00-3.00 sec 4.75 MBytes 39.8 Mbits/sec 92 228 KBytes
[ 7][RX-C] 2.00-3.00 sec 26.7 MBytes 224 Mbits/sec
[ 5][TX-C] 3.00-4.00 sec 3.57 MBytes 29.9 Mbits/sec 77 222 KBytes
[ 7][RX-C] 3.00-4.00 sec 27.0 MBytes 227 Mbits/sec
[ 5][TX-C] 4.00-5.00 sec 4.76 MBytes 39.9 Mbits/sec 136 166 KBytes
[ 7][RX-C] 4.00-5.00 sec 27.1 MBytes 227 Mbits/sec
[ 5][TX-C] 5.00-6.00 sec 3.52 MBytes 29.5 Mbits/sec 145 225 KBytes
[ 7][RX-C] 5.00-6.00 sec 26.9 MBytes 225 Mbits/sec
[ 5][TX-C] 6.00-7.00 sec 4.76 MBytes 39.9 Mbits/sec 90 219 KBytes
[ 7][RX-C] 6.00-7.00 sec 27.0 MBytes 227 Mbits/sec
[ 5][TX-C] 7.00-8.00 sec 4.70 MBytes 39.4 Mbits/sec 84 148 KBytes
[ 7][RX-C] 7.00-8.00 sec 26.3 MBytes 221 Mbits/sec
[ 5][TX-C] 8.00-9.00 sec 3.52 MBytes 29.6 Mbits/sec 85 222 KBytes
[ 7][RX-C] 8.00-9.00 sec 27.7 MBytes 232 Mbits/sec
[ 5][TX-C] 9.00-10.00 sec 4.80 MBytes 40.3 Mbits/sec 123 152 KBytes
[ 7][RX-C] 9.00-10.00 sec 26.9 MBytes 226 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID][Role] Interval Transfer Bitrate Retr
[ 5][TX-C] 0.00-10.00 sec 45.4 MBytes 38.1 Mbits/sec 906 sender
[ 5][TX-C] 0.00-10.02 sec 42.3 MBytes 35.4 Mbits/sec receiver
[ 7][RX-C] 0.00-10.00 sec 277 MBytes 233 Mbits/sec 2261 sender
[ 7][RX-C] 0.00-10.02 sec 266 MBytes 222 Mbits/sec receiver
iperf Done.
Without the shaper:
# iperf3 -c iperf.online.net 198.18.50.136 --bidir
Connecting to host iperf.online.net, port 5201
[ 5] local 10.200.10.2 port 52252 connected to 51.158.1.21 port 5201
[ 7] local 10.200.10.2 port 52266 connected to 51.158.1.21 port 5201
[ ID][Role] Interval Transfer Bitrate Retr Cwnd
[ 5][TX-C] 0.00-1.00 sec 7.46 MBytes 62.6 Mbits/sec 91 382 KBytes
[ 7][RX-C] 0.00-1.00 sec 23.7 MBytes 199 Mbits/sec
[ 5][TX-C] 1.00-2.00 sec 4.66 MBytes 39.1 Mbits/sec 33 294 KBytes
[ 7][RX-C] 1.00-2.00 sec 29.1 MBytes 244 Mbits/sec
[ 5][TX-C] 2.00-3.00 sec 4.73 MBytes 39.7 Mbits/sec 12 259 KBytes
[ 7][RX-C] 2.00-3.00 sec 30.2 MBytes 253 Mbits/sec
[ 5][TX-C] 3.00-4.00 sec 4.70 MBytes 39.4 Mbits/sec 0 276 KBytes
[ 7][RX-C] 3.00-4.00 sec 31.9 MBytes 267 Mbits/sec
[ 5][TX-C] 4.00-5.00 sec 4.70 MBytes 39.4 Mbits/sec 0 253 KBytes
[ 7][RX-C] 4.00-5.00 sec 30.7 MBytes 257 Mbits/sec
[ 5][TX-C] 5.00-6.00 sec 4.63 MBytes 38.8 Mbits/sec 0 264 KBytes
[ 7][RX-C] 5.00-6.00 sec 29.4 MBytes 247 Mbits/sec
[ 5][TX-C] 6.00-7.00 sec 4.70 MBytes 39.5 Mbits/sec 0 273 KBytes
[ 7][RX-C] 6.00-7.00 sec 33.6 MBytes 282 Mbits/sec
[ 5][TX-C] 7.00-8.00 sec 4.67 MBytes 39.2 Mbits/sec 0 270 KBytes
[ 7][RX-C] 7.00-8.00 sec 31.9 MBytes 267 Mbits/sec
[ 5][TX-C] 8.00-9.00 sec 4.66 MBytes 39.1 Mbits/sec 0 262 KBytes
[ 7][RX-C] 8.00-9.00 sec 31.5 MBytes 265 Mbits/sec
[ 5][TX-C] 9.00-10.00 sec 4.70 MBytes 39.4 Mbits/sec 0 5.62 KBytes
[ 7][RX-C] 9.00-10.00 sec 31.4 MBytes 264 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID][Role] Interval Transfer Bitrate Retr
[ 5][TX-C] 0.00-10.00 sec 49.6 MBytes 41.6 Mbits/sec 136 sender
[ 5][TX-C] 0.00-10.02 sec 46.7 MBytes 39.1 Mbits/sec receiver
[ 7][RX-C] 0.00-10.00 sec 320 MBytes 268 Mbits/sec 1054 sender
[ 7][RX-C] 0.00-10.02 sec 303 MBytes 254 Mbits/sec receiver
iperf Done.
Both iperf runs are in line with what I expect for my line. But the TCP state losses and FW hits still happen.
What still seems strange to me: all of the TCP state losses and FW hits happen on v4 addresses, although the devices have SLAAC GUAs available. Of course, the public servers, for which those connection dropouts happen, might only have v4 addresses, so I'm not sure if that is any specific symptom.
The TCP dropouts also happen for some apps (e.g. German ZDF Mediathek) more often than for others.
For my internal networks, I have never experienced a state loss - only to the Internet.
What else could be done to diagnose this? I am close to downgrading to 24.1. The timeouts are really annoying during regular use.
With the normal 24.7.3 kernel, I can confirm the "pf: ICMP error message too short (ip6)" messages - which go away with the no-sa kernel.
I can also confirm the "pf: loose state match" notices with both kernels.
2024-09-09T10:30:02 Notice kernel pf: loose state match: TCP out wire: 79.222.100.212:488 99.88.77.66:64307 stack: 79.222.100.212:488 10.0.1.7:56542 [lo=2251669520 high=2251734519 win=502 modulator=0 wscale=7] [lo=232729624 high=232785857 win=510 modulator=0 wscale=7] 10:10 R seq=232729624 (232721729) ack=2251669520 len=0 ackskew=0 pkts=12:16 dir=in,rev
2024-09-09T10:30:02 Notice kernel pf: loose state match: TCP in wire: 10.0.1.7:56542 79.222.100.212:488 stack: - [lo=2251669520 high=2251734519 win=502 modulator=0 wscale=7] [lo=232729624 high=232785857 win=510 modulator=0 wscale=7] 10:10 R seq=232729624 (232721729) ack=2251669520 len=0 ackskew=0 pkts=12:15 dir=out,rev
2024-09-09T10:30:02 Notice kernel pf: loose state match: TCP out wire: 79.222.100.212:488 99.88.77.66:64307 stack: 79.222.100.212:488 10.0.1.7:56542 [lo=2251669520 high=2251734519 win=502 modulator=0 wscale=7] [lo=232729624 high=232785857 win=510 modulator=0 wscale=7] 10:10 R seq=232729624 (232721729) ack=2251669520 len=0 ackskew=0 pkts=12:15 dir=in,rev
2024-09-09T10:30:02 Notice kernel pf: loose state match: TCP out wire: 79.222.100.212:488 99.88.77.66:64307 stack: 79.222.100.212:488 10.0.1.7:56542 [lo=2251669520 high=2251734519 win=502 modulator=0 wscale=7] [lo=232723169 high=232785857 win=510 modulator=0 wscale=7] 9:4 R seq=2251669520 (2251669495) ack=232723169 len=0 ackskew=0 pkts=11:9 dir=out,fwd
2024-09-09T10:30:02 Notice kernel pf: loose state match: TCP in wire: 10.0.1.7:56542 79.222.100.212:488 stack: - [lo=2251669520 high=2251734519 win=502 modulator=0 wscale=7] [lo=232723169 high=232785857 win=510 modulator=0 wscale=7] 9:4 R seq=2251669520 (2251669495) ack=232723169 len=0 ackskew=0 pkts=11:9 dir=in,fwd
2024-09-09T10:30:02 Notice kernel pf: loose state match: TCP out wire: 10.0.2.36:9443 10.0.1.7:48268 stack: - [lo=1094662497 high=1094727369 win=502 modulator=0 wscale=7] [lo=1009336067 high=1009389476 win=510 modulator=0 wscale=7] 10:10 R seq=1094662497 (1094662473) ack=1009336067 len=0 ackskew=0 pkts=14:14 dir=out,fwd
2024-09-09T10:30:02 Notice kernel pf: loose state match: TCP in wire: 10.0.1.7:48268 10.0.2.36:9443 stack: - [lo=1094662497 high=1094727369 win=502 modulator=0 wscale=7] [lo=1009336067 high=1009389476 win=510 modulator=0 wscale=7] 10:10 R seq=1094662497 (1094662473) ack=1009336067 len=0 ackskew=0 pkts=14:14 dir=in,fwd
2024-09-09T10:30:02 Notice kernel pf: loose state match: TCP out wire: 10.0.2.36:9443 10.0.1.7:48268 stack: - [lo=1094662497 high=1094727369 win=502 modulator=0 wscale=7] [lo=1009336067 high=1009389476 win=510 modulator=0 wscale=7] 10:10 R seq=1094662497 (1094662473) ack=1009336067 len=0 ackskew=0 pkts=13:14 dir=out,fwd
2024-09-09T10:30:02 Notice kernel pf: loose state match: TCP in wire: 10.0.1.7:48268 10.0.2.36:9443 stack: - [lo=1094662497 high=1094727369 win=502 modulator=0 wscale=7] [lo=1009336067 high=1009389476 win=510 modulator=0 wscale=7] 10:10 R seq=1094662497 (1094662473) ack=1009336067 len=0 ackskew=0 pkts=13:14 dir=in,fwd
2024-09-09T10:30:02 Notice kernel pf: loose state match: TCP out wire: 10.0.2.36:9443 10.0.1.7:48268 stack: - [lo=1094662497 high=1094727369 win=502 modulator=0 wscale=7] [lo=1009336067 high=1009389476 win=510 modulator=0 wscale=7] 10:10 R seq=1094662497 (1094662473) ack=1009336067 len=0 ackskew=0 pkts=12:14 dir=out,fwd
2024-09-09T10:30:02 Notice kernel pf: loose state match: TCP in wire: 10.0.1.7:48268 10.0.2.36:9443 stack: - [lo=1094662497 high=1094727369 win=502 modulator=0 wscale=7] [lo=1009336067 high=1009389476 win=510 modulator=0 wscale=7] 10:10 R seq=1094662497 (1094662473) ack=1009336067 len=0 ackskew=0 pkts=12:14 dir=in,fwd
2024-09-09T10:30:02 Notice kernel pf: loose state match: TCP out wire: 10.0.2.36:9443 10.0.1.7:48268 stack: - [lo=1094662497 high=1094727369 win=502 modulator=0 wscale=7] [lo=1009336067 high=1009389476 win=510 modulator=0 wscale=7] 10:10 R seq=1094662497 (1094662473) ack=1009336067 len=0 ackskew=0 pkts=11:14 dir=out,fwd
2024-09-09T10:30:02 Notice kernel pf: loose state match: TCP in wire: 10.0.1.7:48268 10.0.2.36:9443 stack: - [lo=1094662497 high=1094727369 win=502 modulator=0 wscale=7] [lo=1009336067 high=1009389476 win=510 modulator=0 wscale=7] 10:10 R seq=1094662497 (1094662473) ack=1009336067 len=0 ackskew=0 pkts=11:14 dir=in,fwd
2024-09-09T10:30:02 Notice kernel TCP out wire: 10.0.2.36:9443 10.0.1.7:48268 stack: - [lo=1094662497 high=1094727369 win=502 modulator=0 wscale=7] [lo=1009336067 high=1009389476 win=510 modulator=0 wscale=7] 10:10 R seq=1094662497 (1094662473) ack=1009336067 len=0 ackskew=0 pkts=10:14 dir=out,fwd
2024-09-09T10:30:02 Notice kernel pf: loose state match: pf: ICMP error message too short (ip6)
2024-09-09T10:30:02 Notice kernel pf: loose state match: TCP in wire: 10.0.1.7:48268 10.0.2.36:9443 stack: - [lo=1094662497 high=1094727369 win=502 modulator=0 wscale=7] [lo=1009336067 high=1009389476 win=510 modulator=0 wscale=7] 10:10 R seq=1094662497 (1094662473) ack=1009336067 len=0 ackskew=0 pkts=10:14 dir=in,fwd
2024-09-09T10:30:02 Notice kernel pf: loose state match: TCP out wire: 10.0.2.36:9443 10.0.1.7:48268 stack: - [lo=1094662497 high=1094727369 win=502 modulator=0 wscale=7] [lo=1009332316 high=1009389476 win=510 modulator=0 wscale=7] 10:10 R seq=1094662497 (1094662473) ack=1009332316 len=0 ackskew=0 pkts=9:10 dir=out,fwd
2024-09-09T10:30:02 Notice kernel pf: loose state match: TCP in wire: 10.0.1.7:48268 10.0.2.36:9443 stack: - [lo=1094662497 high=1094727369 win=502 modulator=0 wscale=7] [lo=1009331028 high=1009389476 win=510 modulator=0 wscale=7] 10:10 R seq=1094662497 (1094662473) ack=1009331028 len=0 ackskew=0 pkts=9:9 dir=in,fwd
2024-09-09T10:30:01 Notice kernel pf: ICMP error message too short (ip6)
2024-09-09T10:30:01 Notice kernel pf: ICMP error message too short (ip6)
2024-09-09T10:30:01 Notice kernel pf: ICMP error message too short (ip6)
2024-09-09T10:30:00 Notice kernel pf: loose state match: TCP out wire: 157.240.252.35:443 99.88.77.66:18542 stack: 157.240.252.35:443 10.0.1.7:57294 [lo=3539577390 high=3539646254 win=502 modulator=0 wscale=7] [lo=4072763929 high=4072811232 win=269 modulator=0 wscale=8] 10:10 R seq=3539577390 (3539577366) ack=4072763929 len=0 ackskew=0 pkts=24:28 dir=out,fwd
However, I do not see any massive performance degradation because of this.
Do you notice any FW hits on the default deny for this traffic?
For me, these TCP state losses correspond quite well with the state losses, as far as I can tell. Right now, I wouldn't know any other reason, why incoming 443 traffic would be blocked (especially with the A and PA flags).
I think I'm running into an issue which has the same root cause as reported here.
After updating to 24.7.x I cannot connect my web server from the public internet anymore.
Before the update everything was running fine and I did not touch the configuration.
OPNsense has a port forwarding and allow rules etc. which were working fine to forward public internet traffic towards my internal web server.
But after the update each attempt to connect the web server is rejected via the floating "default deny / state violation" rule. Even incoming traffic with tcpflags S is catched by the "default deny" rule.
Are there any changes with OPNsense v24.7 why this happens, or recommendations to overcome the problem?
I'm currently running 24.7.3_1. Any hints are welcome.
Quote from: struppie on September 09, 2024, 03:45:15 PM
I think I'm running into an issue which has the same root cause as reported here.
....
Found the issue - I'm checking the source IP with GeoIPWhitelisting. Seems, that this does not work anymore as expected, need to analyse it in detail.
But using "any" for source (instead GeoIPWhitelisting) in the forwarding rule heals everything (means custom forwarding + rule matches and therefore the traffic does not run into default deny anymore).
Quote from: meyergru on September 09, 2024, 10:40:53 AM
With the normal 24.7.3 kernel, I can confirm the "pf: ICMP error message too short (ip6)" messages - which go away with the no-sa kernel.
I can also confirm the "pf: loose state match" notices with both kernels.
I just went back to OPNsense 24.1 (imported config from 24.7) and, with debug logging turned on,... taddahhh... I also see same
'pf: loose state match' notices.
Quote from: rkube on September 09, 2024, 04:16:09 PM
I just went back to OPNsense 24.1 (imported config from 24.7) and, with debug logging turned on,... taddahhh... I also see same 'pf: loose state match' notices.
Thanks, good to know. Maybe it's a different (but somehow related) issue that did not surface in the same way until now.
Do you also see the performance degredation/FW hits?
Hi!
Quote from: TheDJ on September 10, 2024, 07:02:06 PM
Thanks, good to know.
I was a little bit disappointed at that moment.
Quote from: TheDJ on September 10, 2024, 07:02:06 PM
Maybe it's a different (but somehow related) issue that did not surface in the same way until now.
I'm going to take a step back and look at the whole thing with some distance to the maybe blinding "FreeBSD 14 / IPv6 issue".
Quote from: TheDJ on September 10, 2024, 07:02:06 PM
Do you also see the performance degredation/FW hits?
Unfortunately, yes.
But I did something just days before upgrading to OPNsense 27.4: I had previously done the bonding (lacp) of the interfaces and VLAN tagging on the host (promox) and put the resulting bond0.[vlan id] interfaces as separated virtio-networkcards to the OPNsense-VM. I just changed that, because I was unhappy with creating an interface for each new (or changed) VLAN on the host and having to guess in which order it will be assigned to the OPNsense interfaces (this was the behavior under virtualbox).
So, I'm going back to 24.7 (no_SA kernel), but assemble again the bond- and vlan-interfaces on the host - assuming that linux (probably?) will have the better driver and working hardware offloading *fingers crossed* ;-)
Br
Reza
P.S.: Sorry, I'm a little bit sick at the moment and spending less time in front of my homelab at the moment ...
Hi,
sorry, I missed your post for days ...
Quote from: meyergru on September 08, 2024, 10:47:18 PM
Quote from: rkube on September 08, 2024, 07:49:31 PM
The MTU I have set is (unfortunately) not the problem, as I am only testing between two local VLANs (MTU==1500) that are routed/filtered via opnsense. I could try jumbo frames, maybe it can get even worse ;-)
Yet you show results from a iperf3 test run against an internet IP?
Please dont beat me, but 198.18.0.0/15 are not public route-able IPs. (pssst: "bogus IPs" ;-] )
Quote from: meyergru on September 08, 2024, 10:47:18 PM
So there are alo VLANs and LAGGs in the mix? Maybe netmap and suricata as well? ???
I think of LAGGs and VLANs as very basic FW/Router interface types. Beside of pppoe I have not more in the mix. So... no netmap or suricata here.
Br
Reza
@meyergru and @rkube: what are your ISPs (I assume both of you are in Germany)?
As mentioned, I am on a Telekom Dual Stack with 250/40. Maybe it is a routing/peering issue that coincidentally appeared at the same time. Then, the TCP packets might be just a little too late (running out of the TTL) and the state is closed? This would also explain why it is not perfectly consistent and now even hits 24.1?
Just for the record: 24.7.4 did not change/improve the behavior.
I didn't expect it to, because I did not see anything in the changelog that would indicate better behavior for v4, but I just wanted to note it here.
Is there anything else that could be done? I am very open to suggestions.
Quote from: TheDJ on September 12, 2024, 08:11:25 PM
Is there anything else that could be done? I am very open to suggestions.
Next of my shots into the
dark light: We are reloading OPNsense, or just pf, a lot at the moment. But e.g. our laptop and other network devices, which has a lot of established TCP connections, stays "online" during this time.
So when OPNsense (pf) loses it's knowledge of all states (because of a reboot or config reload,...), the laptop still has the knowledge of it's already established connections.
When the laptop sends a TCP packet to another station with already established TCP state, it won't send a new SYN packet - it will just send acks (or maybe push acks) with sequence numbers.
OPNsense, seeing this traffic, does not know about this already established state and will log a debug message.
The more we test atm, the more we'll get this debug message.
And the packets will be blocked at "last rule", because of the state violation. "Works as designed" ;-)
Br
Reza
This is true for very fresh traffic after a reboot/reconnect. But should stabilize after a few minutes. For me, the behavior is ongoing even after a few days.
After weeks of hunting down this behavior and literally exchanging every hardware component, I found the problem: it (presumably) was a firmware upgrade in a Wifi Access Point that worked as a wireless backhaul.
I deployed the new v7.XX branch for Zyxel NWA220AX-6E roughly at the same time. I performed multiple firmware upgrades on that device and even got it swapped via an RMA afterward, so I didn't think this was related.
Today I DOWNgraded it to a 6.XX firmware that I still had - 'poof' - all issues seem to be gone. I will continue to monitor the situation, but I believe that firmware for that device is borked. This leads to packet loss and in turn a closing of TCP states.
Quote from: TheDJ on October 21, 2024, 05:34:31 PM
Today I DOWNgraded it to a 6.XX firmware that I still had - 'poof' - all issues seem to be gone. I will continue to
Fingers crossed ;-)