Pings through WAN interface not working (broken in 24.7.1-.3)

Started by motoridersd, August 30, 2024, 12:51:49 AM

Previous topic - Next topic
I don't often use IPV4 pings to diagnose internet issues, so I don't know exactly when this broke, but I know for sure this used to work.

After the update to 24.7 I noticed that pings weren't working. I can't ping remote hosts (like 8.8.8.8 or 1.1.1.1) from my LAN, nor can I do it from the OPNSense console using the WAN interface as a source.

tcpdump captures on the console don't show the packets going out on the WAN interface, but I see the requests going out from the LAN interface.

I thought the issue with the FreeBSD kernel changes could be breaking this, so I updated to the test icmp2 kernel while on 24.7.1 with no change. 24.7.2 and .3 were installed and I have noticed no change.

My firewall rules allow this traffic. I added specific ICMP rules to track this, and I can see the allow rule on the LAN side showing up on the Firewall Log, as well as the WAN rule.

Despite the allow rule log showing up, I don't see the request going out on the WAN interface when doing a tcpdump filtered by ICMP.

I'm using a Hybrid NAT with the default NAT all to the WAN interface. All other traffic works fine. Traceroutes work without an issue.

I am also unable to ping the WAN IP of the firewall (my ISP assigns a public IPV4 address via DHCP). I can see the requests coming in in a tcpdump, as well as seeing the allowed log, but a reply never goes out.

I don't know what else I could be missing, it just seems that the firewall is not actually sending the ICMP requests to the internet or receiving the requests to be able to provide a reply.

Thanks for your report. I expect ever since the fix for CVE-2024-6640 in 24.7.1 we had at least 2 separate issues with it, but the general theory about this over 500 lines of code changed patch from mostly 2009 via OpenBSD still has more issues now in 2024 than anyone involved imagined.

I'll make a note in the other ticket. It looks a bit like the traceroute amendments were not 100% correct since the neighbour discovery side seems to have no effect on your end (24.7.3 specifically).


Cheers,
Franco

Just to be sure: the 24.7 works for you in this regard?

Made the note here: https://github.com/opnsense/src/issues/218#issuecomment-2320210439

Might be worth inspecting your setup a bit closer. Do you use any explicit rules for that ping to pass? And it comes from where and goes to a public Internet server?


Cheers,
Franco

Ugh, not again...  ::) Also, "nice" radio silence on the upstream bug. Seems even the actionable item is not actionable any more.

I'd suggest an ultimate solution for the entire "security improvement" if the SA is indeed involved here. I don't expect things to move any further until this mess is dumped on + users in the project run by the involved upstream actors - and then their user base starts complaining about downstream issues.  ::)

But the OP talks about IPv4, not IPv6?

And for the record: I do not see IPv4 pings unanswered from any 24.7.x - my monitoring would have alarmed me if it were otherwise. I would guess a bug like that would not have gone unnoticed and think this is a configuration problem.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A

Same here,

Currently I am on 24.7.1 pinging (ICMP PING) the a specific IPv4 private or public destination from LAN is working.


Quote
ping 8.8.8.8

Pinging 8.8.8.8 with 32 bytes of data:
Reply from 8.8.8.8: bytes=32 time=15ms TTL=114
Reply from 8.8.8.8: bytes=32 time=17ms TTL=114
Reply from 8.8.8.8: bytes=32 time=22ms TTL=114
Reply from 8.8.8.8: bytes=32 time=13ms TTL=114

Ping statistics for 8.8.8.8:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 13ms, Maximum = 22ms, Average = 16ms

ping 1.1.1.1

Pinging 1.1.1.1 with 32 bytes of data:
Reply from 1.1.1.1: bytes=32 time=14ms TTL=55
Reply from 1.1.1.1: bytes=32 time=15ms TTL=55
Reply from 1.1.1.1: bytes=32 time=13ms TTL=55
Reply from 1.1.1.1: bytes=32 time=12ms TTL=55

Ping statistics for 1.1.1.1:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 12ms, Maximum = 15ms, Average = 13ms

Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
APU2D2 - deceased
N5105 - i226-V | Patriot 2x8G 3200 DDR4 | L 790 512G - VM HA(SOON)
N100   - i226-V | Crucial 16G  4800 DDR5 | S 980 500G - PROD

The patches refactor ICMP and ICMPv6. I wouldn't rule out more issues.


Cheers,
Franco

Quote from: meyergru on August 30, 2024, 09:46:54 AM
But the OP talks about IPv4, not IPv6?

And for the record: I do not see IPv4 pings unanswered from any 24.7.x - my monitoring would have alarmed me if it were otherwise. I would guess a bug like that would not have gone unnoticed and think this is a configuration problem.

...same idea here, had a look at WAN quality, as no functional ping would mean 100% package lost -> WAN down...
kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....

Quote from: meyergru on August 30, 2024, 09:46:54 AM
But the OP talks about IPv4, not IPv6?

And for the record: I do not see IPv4 pings unanswered from any 24.7.x - my monitoring would have alarmed me if it were otherwise. I would guess a bug like that would not have gone unnoticed and think this is a configuration problem.

+1

I run pinginfoview from Nirsoft on IPv4 towards all 4 GWs with policy routing, so I'll know instantly if any VPN or ISP GW is down.

I'm 110% confident we haven't had an issue with ICMP this year that was kernel related, and I've been on all the test kernels as well (apart from a couple that dealt with specific HW issues like the ixl or Sierra ones)


What can happen sometimes is that the IP you're pinging may stop responding, temporarily or permanently - so I'd try changing that first.

no issues here with this version - and IPv4 / IPv6 is monitored constantly, on 2 WAN lines.


Quote from: franco on August 30, 2024, 08:29:53 AM
Just to be sure: the 24.7 works for you in this regard?

Made the note here: https://github.com/opnsense/src/issues/218#issuecomment-2320210439

Might be worth inspecting your setup a bit closer. Do you use any explicit rules for that ping to pass? And it comes from where and goes to a public Internet server?


Cheers,
Franco

I never tested on 24.7, I went from 24.1.10 to 24.7, I wanted to wait until there was a .1 release of 24.7 before upgrading.

I'm also not sure if this was working on 24.1.10 before I upgraded because I don't often try to do ICMP4 pings to the internet.

I added the specific rules so I could look at the firewall logs to make sure they were getting allowed, but before I started troubleshooting I didn't have specific ICMP rules configured (and the pings weren't working then)


I think this problem may be related to NAT/port forwarding?  I have set port forwarding for SIP/RTP (on IPv4  only) and the last few 24.7.x updates and hot fix stopped the phones from dialing out.  Packet capture shows SIP INVITEs from LAN & WAN go out to the SIP provider's gateway, but no responses are seen coming back.  Eventually the phone times out with a "network failure".  There is no problem with (re)authenticating with the SIP gateway.  Also, incoming calls work fine, it's only outgoing calls which fail to receive packets.

Reverting to 24.7 restores the dialout functionality.

Quote from: doktornotor on August 30, 2024, 04:43:27 PM
Try with this kernel as well.

https://github.com/opnsense/src/issues/218#issuecomment-2321096627

Just tried it and there's no change.

I use Hybrid Outbound NAT on my deployment, so not sure if that's different from what those with no issues have.

In this case it's probably a subtle change in the pf rules and you need to find out why your ruleset is no longer working as before. We've had two or three similar reports where rules didn't work as before, but fixed by changing rule parameters. There's been some changes in pf from 13 to 14 for better or worse and some may lack test cases on the FreeBSD end or the previous behaviour was deemed buggy and changed.


Cheers,
Franco