[solved] Log flooded with "pf: ICMP error message too short (ip6)"

Started by cloudz, September 03, 2024, 04:36:12 PM

Previous topic - Next topic
At this moment, my logs are flooded with the "pf: ICMP error message too short (ip6)" message.
Grepping & counting the latest.log gives me 53k entries and it spams at a rate of 10/s.

Does anyone know/understand where this comes from and what I need to do to stop it?

Going back a few days gives me numbers of up to 350k/day.

You should look at the details and see which MAC those originate from. Seems like a misbehaving device that sends invalid ICMPv6 messages.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A

I've done a tcpdump on the internal interface (LAN) and it's a 100% match with the ND process NS/NA on that interface.

Firewall > Settings > Advanced : Debug - Generate debug messages for various errors

This was the culprit. Still means there is something wrong in the ND/PF story. Might this be worth looking at, @Franco?

I see the same (and many) of these errors 'pf: ICMP error message too short (ip6)' in my logs. But even more of:

2024-09-05T18:59:15 Notice kernel pf: loose state match: TCP out wire: 198.18.24.0:443 198.18.178.160:53556 stack: - [lo=1349263766 high=1349262665 win=63 modulator=0] [lo=0 high=63 win=1 modulator=0] 2:0 A seq=1349263766 (1349263766) ack=0 len=0 ackskew=0 pkts=10:0 dir=out,fwd

2024-09-05T18:59:15 Notice kernel pf: loose state match: TCP in wire: 198.18.178.160:53556 198.18.24.0:443 stack: - [lo=1349263766 high=1349262665 win=63 modulator=0] [lo=0 high=63 win=1 modulator=0] 2:0 A seq=1349263766 (1349263766) ack=0 len=0 ackskew=0 pkts=10:0 dir=in,fwd


198.18.178.160 and 198.18.24.0 are in different, but direct connected internal, VLANs on the same LAGG (lacp igc0+igc1). But same errors are displayed on the pppoe interface, which also is on the LAGG.

I have problems with TCP connections, lots of retransmissions. Sometimes (once every 5 minutes) I loose all packets on an existing TCP connection (iperf3) for a few (2-3) seconds.

It is a virtual proxmox'ed opnsense (24.7.3_1); the network interfaces (2x Intel I226-V 2.5, N100) are passed to opnsense as raw pci(-express) devices. I have already disabled all hardware offloads, but not flow control, yet.

I have read through https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=280701, but in case of this error ... hmm, for me it's not only the ICMPv6 handling, because I also loose IPv4 TCP connections.
@Franco: I'm not afraid of one icmp echo going out ;-)

br
Reza
Best regard / besten Gruß
Reza

---
"Es irrt der Mensch solang er strebt" (Goethe)

Quote from: rkube on September 05, 2024, 07:36:49 PM

I have problems with TCP connections, lots of retransmissions. Sometimes (once every 5 minutes) I loose all packets on an existing TCP connection (iperf3) for a few (2-3) seconds.

[...]

I have read through https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=280701, but in case of this error ... hmm, for me it's not only the ICMPv6 handling, because I also loose IPv4 TCP connections.
@Franco: I'm not afraid of one icmp echo going out ;-)

br
Reza

I will check the ICMP status once more later today. But the TCP symptoms completely match my observations: https://forum.opnsense.org/index.php?topic=42657.0
I don't know what the underlying issue is.

EDIT: My log also contains constant/thousands of both types of log messages.

You can easily check if the SA is the culprit by trying the kernel with the SA completely removed via

opnsense-update -zkr 24.7.3-no_sa

and reboot, see this.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A

I tried the experimental kernel beforehand and it did not seem to have an effect (however, I did not check the logs, I only noticed that the TCP was not getting better).

If I have the time, I will do some more detailed testing.

Hello cloudz,

I know I've just posted on your topic.
But as it concerns the same symptom, I don't want to start another topic of my own.

But I think the cause is not solved. Would you mind removing the [Solved] from the title again?

Br
Reza
Best regard / besten Gruß
Reza

---
"Es irrt der Mensch solang er strebt" (Goethe)

The underlying cause is indeed not solved -- but changing the logging level stopped it from creating the entries in my log files. Removed the [solved] tag for now.

Turning the logging lower also stopped my latency spikes.

Quote from: meyergru on September 06, 2024, 10:13:04 AM
You can easily check if the SA is the culprit by trying the kernel with the SA completely removed via

opnsense-update -zkr 24.7.3-no_sa

and reboot, see this.

With that kernel and the logging set to various errors, the issue is gone. I do get a lot of


<13>1 2024-09-06T15:34:20+02:00 opn.x100.be kernel - - [meta sequenceId="68"] pf: dropping packet with ip options
<13>1 2024-09-06T15:34:21+02:00 opn.x100.be kernel - - [meta sequenceId="69"] pf: dropping packet with ip options

September 06, 2024, 03:45:23 PM #11 Last Edit: September 06, 2024, 03:55:03 PM by doktornotor
Quote from: cloudz on September 06, 2024, 03:35:35 PM
Quote from: meyergru on September 06, 2024, 10:13:04 AM
You can easily check if the SA is the culprit by trying the kernel with the SA completely removed via

opnsense-update -zkr 24.7.3-no_sa

and reboot, see this.

With that kernel and the logging set to various errors, the issue is gone. I do get a lot of

Well... no comment. Pretty sure it's downstream issue @franco  ::) ::) ::)

I regularly get this
Quotepf: dropping packet with ip options
(hundreds per 5 Minutes) also with "downstream-vanilla" 24.7.3_1. Not yet applied 24.7.3-no_sa as cloudz already did.

Maybe a "normal" message with IP-options (MagentaTV?)...

Quote from: cloudz on September 06, 2024, 03:35:35 PM
Quote from: meyergru on September 06, 2024, 10:13:04 AM
You can easily check if the SA is the culprit by trying the kernel with the SA completely removed via

opnsense-update -zkr 24.7.3-no_sa

and reboot, see this.

With that kernel and the logging set to various errors, the issue is gone. I do get a lot of


<13>1 2024-09-06T15:34:20+02:00 opn.x100.be kernel - - [meta sequenceId="68"] pf: dropping packet with ip options
<13>1 2024-09-06T15:34:21+02:00 opn.x100.be kernel - - [meta sequenceId="69"] pf: dropping packet with ip options

Best regard / besten Gruß
Reza

---
"Es irrt der Mensch solang er strebt" (Goethe)

Quote from: rkube on September 06, 2024, 03:58:22 PM
I regularly get this
Quotepf: dropping packet with ip options
(hundreds per 5 Minutes) also with "downstream-vanilla" 24.7.3_1. Not yet applied 24.7.3-no_sa as cloudz already did.

Maybe a "normal" message with IP-options (MagentaTV?)...

You can get rid of that - see the "allow options" hint here if needed (for IGMP / IPTV etc.)


Quote from: doktornotor on September 06, 2024, 04:02:54 PM
You can get rid of that - see the "allow options" hint here if needed (for IGMP / IPTV etc.)
Thanks for the hint, but I have already activated "Allow options" on the interfaces/rules involved in multicast for IPTV.

Unfortunately - as far as I know - the debug log doesn't tell on which interface the dropped packet was received. So I can't tell if it received "ip options" on an interface that doesn't have an matching allow rule for ip options. In that case, the logging of "dropping a packet with ip options" could just be a normal and desired debug messages. "works as designed", maybe ;-)
Best regard / besten Gruß
Reza

---
"Es irrt der Mensch solang er strebt" (Goethe)