"No route to host": need to reload WAN every ~24 hour

Started by teapot9, February 01, 2023, 09:42:31 PM

Previous topic - Next topic
February 01, 2023, 09:42:31 PM Last Edit: February 01, 2023, 09:45:07 PM by teapot9
Configuration:

WAN is connected to internet through vlan832 of igb2.
IPs are received with DHCPv4 and DHCPv6 (/56 prefix delegation).

WAN:

igb2_vlan832: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: WAN (wan)
        options=4000000<NOMAP>
        ether xx:xx:xx:xx:xx:xx
        inet6 fe80::3eec:efff:fe22:3ec4%igb2_vlan832 prefixlen 64 scopeid 0xa
        inet6 xxxx:xxxx:xxxx:xx00::1 prefixlen 64
        inet x.x.x.x netmask 0xfffffc00 broadcast x.x.x.x
        groups: vlan
        vlan: 832 vlanproto: 802.1q vlanpcp: 0 parent interface: igb2
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>


LAN has a /24 IPv4 subnet, and a /64 IPv6 subnet (from the prefix).

LAN:

lagg0_vlan50: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: ADMIN (lan)
        options=4000000<NOMAP>
        ether xx:xx:xx:xx:xx:xx
        inet 172.16.72.254 netmask 0xffffff00 broadcast 172.16.72.255
        inet6 fe80::8261:5fff:fe08:642%lagg0_vlan50 prefixlen 64 scopeid 0xe
        inet6 xxxx:xxxx:xxxx:xx48:xxxx:xxxx:xxxx:xxxx prefixlen 64
        groups: vlan ADMIN_GROUP
        vlan: 50 vlanproto: 802.1q vlanpcp: 0 parent interface: lagg0
        media: Ethernet autoselect
        status: active
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>


I do have multiple interfaces other than LAN for DMZ and users, but their configuration is very similar to LAN.

Issue

After having a working internet connection for a bit more than 24 hour, I lose all access to the internet.


  • Internal networking between subnets works fine
  • WAN interface still have its IPv4 and IPv6
  • Some Wireguard tunnels keeps working, although the client is on the internet
  • Pinging 1.1.1.1 from the firewall returns "ping: sendto: No route to host"
  • Pinging from LAN returns a similar error
  • Console shows IPv6 errors: "cannot forward src fe80:x:x:x:x:x:x:x, dst [some internet IPv6], nxt 17, rcvif lagg0_vlan50, outif igb2_vlan832"
  • Web UI: clicking on "Reload" for the WAN interface DHCP (in the "overview" tab) fixes the issue instantly for ~24 hour

A workaround would be to setup a crontab to reload the interface automatically every 24 hour, but It woundn't fix the issue itself.

The issue only occurs since I updated to 23.1.

Thank you

I had the same issue on one of two boxes running in HA except I'm only ipv4.  I ended up reverting back to 22 to resolve the issue.

Two pieces of info to hopefully help the devs:
1. I run BGP between my opnsense boxes and the LAN instead of using CARP.  The box that would fail passing traffic to the WAN continued to advertise the default route via BGP and still had 0.0.0.0/0 as a zebra (K) route installed in the RIB.  Like teapot, pinging an internet address from the opnsense box results in "no route to host" even though the default is in the RIB.  Rebooting fixed the issue for a short period of time...24 hours sounds close enough but I didn't track it that close.

2.  I upgraded the day 23.1 came out and I applied the additional updates immediately when they were available.  I plan on waiting a couple weeks before upgrading again wondering if there was an issue with the original 23.1.0 upgrade path.

February 02, 2023, 08:36:26 PM #2 Last Edit: February 02, 2023, 08:42:43 PM by teapot9
The problem occured again today, I did some experiments:


  • The internet can access the firewall: I can access my webserver through HAProxy
  • Restarting the "routing" service also solves the issue
  • Unbound can still resolve DNS properly (tried with random queries to avoid cache) (configured with DNS over TLS)

For a short time after restarting the "routing" service, I could ping any IP except 8.8.8.8 ->"ping: sendto: No buffer space available".

Occurred again, nothing new except:


  • IPv6 still works, only IPv4 is affected by the problem
  • I restarted the "routing" service in anticipation of the issue 4 hour ago, still occured

Fixed after restarting "routing" service.

Hello,

I got the same issue after upgrading to 23.1 from 22.7.11.

I first the upgrade on Feb. 1st but downgraded to 22.7 when I got this error.

Then tried the upgrade yesterday Feb. 15th but got the same error so downgraded to 22.7.11 again.

I noticed this post is about the same issue: https://forum.opnsense.org/index.php?topic=32454.msg156919#msg156919

I am running Opnsense on Protecli FW2B and got Unbound Blocklist enabled.

I guess I will try again in a month or so.

Regards
Lars


Problem still occurs as of v23.1.1.

I have configured a workaround: a cron job from the UI: every 12 hours "Periodic interface reset" with parameter "wan".
This seems to prevent the issue from occuring.

Also, I do have another issue that was already occuring before upgrading to v23.1: radvd stops sending router advertisements after some time. Restarting the radvd daemon fixes the issue. Same workaround: cron job every 12 hours with a custom command "pluginctl -s radvd restart". I mention this as it could be related.

Also, I did some measurement of when the issue occurs: from 22 to 26 hours after reloading WAN or rebooting, which is why I configured the cron job every 12 hours.

I'm getting the same issue now. How are you doing the cron job? Is that defined in the WebUI or through cli?

Quote from: BurningSky on April 11, 2023, 12:13:54 PM
I'm getting the same issue now. How are you doing the cron job? Is that defined in the WebUI or through cli?

I have the following crontab in /usr/local/etc/cron.d/custom.cron:

# Reload WAN interface
30      */12    *       *       *       root    sleep 10; /usr/local/etc/rc.configure_interface wan


The sleep is to prevent interfering with Monit ping monitoring.

If you don't need the sleep, you should be able to achieve the same with a cronjob in the web UI by selecting "Periodic interface reset".

I have a similar issue, but I tried finding the way OP described to reload WAN, but no luck.

I am also trying to downgrade/revert my version, as this happened when I updated from 23.1.3_4 to 23.1.10_1, but can't figure out how to. Can someone share how they downgraded their OPNsense?

An alternative way of "restarting" I found is to re-assign the interfaces and change the Static WAN to a DHCP WAN.