OPNsense Forum

Archive => 20.7 Legacy Series => Topic started by: Dunky13 on September 07, 2020, 10:51:25 am

Title: Every N-hours no more routing
Post by: Dunky13 on September 07, 2020, 10:51:25 am
Hi there,
I'm new to this forum, so I hope to structure this post properly. If I make mistakes, or unintentionally omitted information, please let me know, I'll try to supply it as soon as possible!

I recently (September 3rd) upgraded from 20.1 to 20.7.2. All seems to be running fine, except that every, approximately, 26 hours I lose internet connection in my home network.

Now my setup is as follows:
Internet is Fiber 1GBs UP/DOWN, it goes through a fiber to ethernet converter supplied by the ISP.
The router is a i5-7200k nuc-like system with 6 ethernet ports. The core OS is Proxmox, so I can run Opnsense as a VM and take snapshots with every upgrade, just in case. I had bare-bone Opnsense crash on me :/
Out of the 6 ethernet ports 3 are plugged in as follows:
1: WAN
2: LAN
3: Control for Proxmox

1 & 2 are passed through to Opnsense.

I also run a Pi-Hole VM on the same machine for my DNS Ad-blocking.

Now to the issue:
Yesterday  my system wasn't passing internet along anymore. So when I tested in terminal with `ping google.com` and `ping 1.1.1.1` neither worked.
Then I SSH-ed into the Opnsense VM, and from terminal I pinged Google and 1.1.1.1 as well, and those worked.
So Opnsense still had internet, but everything behind it didn't anymore.

I went to System->Log Files->Backend and found at that time the following log:
Code: [Select]
2020-09-06T23:37:15 configd.py[72382] message d509d1d6-69e3-4137-b444-566e22c7de81 [filter.refresh_aliases] returned {"status": "ok"}
2020-09-06T23:37:15 configd.py[72382] [d509d1d6-69e3-4137-b444-566e22c7de81] refresh url table aliases
2020-09-06T23:37:15 configd.py[72382] OPNsense/Filter generated //usr/local/etc/filter_geoip.conf
2020-09-06T23:37:15 configd.py[72382] OPNsense/Filter generated //usr/local/etc/filter_tables.conf
2020-09-06T23:37:15 configd.py[72382] generate template container OPNsense/Filter
2020-09-06T23:37:15 configd.py[72382] [a742472d-72e1-497a-9db6-d56129efbbe9] generate template OPNsense/Filter
2020-09-06T23:35:00 configd.py[72382] [b73dd5bd-81e2-4946-94ff-d9ddadc64f87] updating dyndns
2020-09-06T23:30:00 configd.py[72382] [dbbf3c18-a059-475a-9da4-39cb50a03da0] updating dyndns
I have set DynDNS as the only cron job every 5 minutes. ( I know it's often, but can't hurt right?)
I can't find anything on the OPNsense filters and stuff, I have the feeling the firewall starts blocking everything on (V)LANs

Do you have any suggestion?

Thanks!
Dunky
Title: Re: Every N-hours no more routing
Post by: Fright on September 07, 2020, 05:01:01 pm
Hi!
what's in the firewall log?
are all daemons running?
what's in the General log?
Title: Re: Every N-hours no more routing
Post by: Dunky13 on September 09, 2020, 10:36:25 am
Hi Fright,

Sorry to reply so late.
The firewall log I don't know, as it gets filled to quickly and couldn't backtrace it.
The same goes for the general log.

And yes, all daemons were running, and my OPNsense box still had internet access, just everything behind it (which are all my devices) didn't. Really weird.

It happened again yesterday, but was out of the house, so I restarted the router (I can VPN into the machine as it still has internet), and all worked again.

If it happens today (which I expect) I'll try to grab all the logs. This started happening since the upgrade to 20.7.2

Thanks for your help!
Title: Re: Every N-hours no more routing
Post by: Fright on September 09, 2020, 11:14:04 am
ok.
since we want to understand what is blocking traffic, check that logging is enabled in blocking fw rules
Title: Re: Every N-hours no more routing
Post by: Dunky13 on September 09, 2020, 05:49:20 pm
This is the general log, right at 17.20:17 hell broke loose again... I was in the middle of a meeting when I got kicket out, and thus was able to press reboot fairly quickly.

I redacted some information but the gist is:
WAN comes in over fiber with VLAN300 with IP y.y.y.y
We have had some outages so have a 4g failover (Netgear thing) in a separate LAN port with IP x.x.x.x

I set up a cron job for dyndns to update both IPs mobile.myhome.net to x.x.x.x and {www,vpn}.myhome.net to y.y.y.y

Not sure why, and from where the resolv.conf is updated. Maybe because I have "Allow DNS server list to be overridden by DHCP/PPP on WAN" enabled?

General log:
Code: [Select]
2020-09-09T17:20:25 shutdown[80691] reboot by root:
2020-09-09T17:20:23 mdns-repeater[73461] exit.
2020-09-09T17:20:23 mdns-repeater[73461] shutting down...
2020-09-09T17:20:23 kernel wg0: link state changed to DOWN
2020-09-09T17:20:23 kernel wg0: deletion failed: 3
2020-09-09T17:20:17 dhclient[43930] bound to y.y.y.y -- renewal in 450 seconds.
2020-09-09T17:20:17 dhclient[35087] Creating resolv.conf
2020-09-09T17:20:17 dhclient[43930] DHCPACK from y.y.y.1
2020-09-09T17:20:17 dhclient[43930] DHCPREQUEST on vtnet0_vlan300 to y.y.y.1 port 67
2020-09-09T17:20:12 opnsense[40947] /usr/local/etc/rc.dyndns: Dynamic DNS (mobile.myhome.net via Cloudflare): _update() starting.
2020-09-09T17:20:12 opnsense[40947] /usr/local/etc/rc.dyndns: Dynamic DNS (mobile.myhome.net): running dyndns_failover_interface for opt8. found vtnet3
2020-09-09T17:20:12 opnsense[40947] /usr/local/etc/rc.dyndns: Dynamic DNS (mobile.myhome.net): x.x.x.x extracted
2020-09-09T17:20:12 opnsense[40947] /usr/local/etc/rc.dyndns: Dynamic DNS: updatedns() starting
2020-09-09T17:20:11 opnsense[40947] /usr/local/etc/rc.dyndns: Dynamic DNS: (Success) vpn.myhome.net updated to y.y.y.y
2020-09-09T17:20:11 opnsense[40947] /usr/local/etc/rc.dyndns: Dynamic DNS: updating cache file /var/cache/dyndns_wan_vpn.myhome.net_2.cache: y.y.y.y
2020-09-09T17:20:07 opnsense[40947] /usr/local/etc/rc.dyndns: Dynamic DNS: (Success) www.myhome.net updated to y.y.y.y
2020-09-09T17:20:07 opnsense[40947] /usr/local/etc/rc.dyndns: Dynamic DNS: updating cache file /var/cache/dyndns_wan_www.myhome.net_1.cache: y.y.y.y
2020-09-09T17:20:03 opnsense[40947] /usr/local/etc/rc.dyndns: Dynamic DNS: (Success) myhome.net updated to y.y.y.y
2020-09-09T17:20:03 opnsense[40947] /usr/local/etc/rc.dyndns: Dynamic DNS: updating cache file /var/cache/dyndns_wan_myhome.net_0.cache: y.y.y.y
2020-09-09T17:19:48 opnsense[32851] /status_services.php: The command '/usr/local/sbin/ntpd -g -c '/var/etc/ntpd.conf' -p '/var/run/ntpd.pid'' returned exit code '1', the output was ''
2020-09-09T17:19:13 dhclient[61237] bound to x.x.x.x -- renewal in 54000 seconds.
2020-09-09T17:19:13 opnsense[1205] plugins_configure newwanip (execute task : webgui_configure_do(,opt8))
2020-09-09T17:19:13 opnsense[1205] plugins_configure newwanip (execute task : vxlan_configure_interface())
2020-09-09T17:19:13 opnsense[1205] plugins_configure newwanip (execute task : unbound_configure_do(,opt8))
2020-09-09T17:19:13 opnsense[1205] plugins_configure newwanip (execute task : openssh_configure_do(,opt8))
2020-09-09T17:19:13 opnsense[1205] plugins_configure newwanip (execute task : opendns_configure_do())
2020-09-09T17:19:13 opnsense[1205] plugins_configure newwanip (execute task : ntpd_configure_defer())
2020-09-09T17:19:12 opnsense[1205] /usr/local/etc/rc.newwanip: Curl error occurred: Operation timed out after 15008 milliseconds with 0 out of 0 bytes received
2020-09-09T17:19:12 opnsense[1205] /usr/local/etc/rc.newwanip: Dynamic DNS (mobile.myhome.net): Current Service: cloudflare
2020-09-09T17:19:12 opnsense[1205] /usr/local/etc/rc.newwanip: Dynamic DNS (mobile.myhome.net): _checkStatus() starting.
2020-09-09T17:18:42 opnsense[1205] /usr/local/etc/rc.newwanip: Dynamic DNS (mobile.myhome.net via Cloudflare): _update() starting.
2020-09-09T17:18:42 opnsense[1205] /usr/local/etc/rc.newwanip: Dynamic DNS (mobile.myhome.net): running dyndns_failover_interface for opt8. found vtnet3
2020-09-09T17:18:42 opnsense[1205] /usr/local/etc/rc.newwanip: Dynamic DNS (mobile.myhome.net): x.x.x.x extracted
2020-09-09T17:18:42 opnsense[1205] /usr/local/etc/rc.newwanip: Dynamic DNS: updatedns() starting
2020-09-09T17:18:42 opnsense[1205] plugins_configure newwanip (execute task : dyndns_configure_do(,opt8))
2020-09-09T17:18:42 opnsense[1205] plugins_configure newwanip (,opt8)
2020-09-09T17:18:42 opnsense[1205] /usr/local/etc/rc.newwanip: Resyncing OpenVPN instances for interface MOBILE.
2020-09-09T17:18:42 opnsense[1205] plugins_configure vpn (execute task : openvpn_configure_do(,opt8))
2020-09-09T17:18:42 opnsense[1205] plugins_configure vpn (execute task : ipsec_configure_do(,opt8))
2020-09-09T17:18:42 opnsense[1205] plugins_configure vpn (,opt8)
2020-09-09T17:18:42 kernel pflog0: promiscuous mode enabled
2020-09-09T17:18:42 kernel pflog0: promiscuous mode disabled
Title: Re: Every N-hours no more routing
Post by: Fright on September 10, 2020, 10:16:01 am
hmm. looks like that connection lost before "right at 17.20: 17"
right after your 4G failover LAN change IP and firewal been reloaded:
"Curl error occurred: Operation timed out after 15008 milliseconds with 0 out of 0 bytes received"
have you set "Dynamic state reset" to "on"?
have you tried to wait some time for connection reestabilsh?
--

Quote
Not sure why, and from where the resolv.conf is updated.
its just part of dhclient-script
it write this to log before ALLOWOVERRIDE check
Title: Re: Every N-hours no more routing
Post by: Dunky13 on September 10, 2020, 09:12:46 pm
I'm not sure where to find the Dynamic State reset?

This started happening after the 20.7.2 update.
It seems to happen whenever my 4g gets a new public IP allocated.

Any clue what I should/could do?
Title: Re: Every N-hours no more routing
Post by: Dunky13 on September 10, 2020, 09:17:06 pm
Code: [Select]
2020-09-10T20:09:23 opnsense[72519] plugins_configure newwanip (execute task : webgui_configure_do(,opt8))
2020-09-10T20:09:23 opnsense[72519] plugins_configure newwanip (execute task : vxlan_configure_interface())
2020-09-10T20:09:23 opnsense[72519] plugins_configure newwanip (execute task : unbound_configure_do(,opt8))
2020-09-10T20:09:23 opnsense[72519] plugins_configure newwanip (execute task : openssh_configure_do(,opt8))
2020-09-10T20:09:23 opnsense[72519] plugins_configure newwanip (execute task : opendns_configure_do())
2020-09-10T20:09:23 opnsense[72519] plugins_configure newwanip (execute task : ntpd_configure_defer())
2020-09-10T20:09:22 opnsense[6980] /usr/local/etc/rc.dyndns: Dynamic DNS (mobile.myhome.net via Cloudflare): _update() starting.
2020-09-10T20:09:22 opnsense[6980] /usr/local/etc/rc.dyndns: Dynamic DNS (mobile.myhome.net): running dyndns_failover_interface for opt8. found vtnet3
2020-09-10T20:09:22 opnsense[6980] /usr/local/etc/rc.dyndns: Dynamic DNS (mobile.myhome.net): new.ip.4g.failover extracted
2020-09-10T20:09:22 opnsense[72519] /usr/local/etc/rc.newwanip: Curl error occurred: Operation timed out after 15028 milliseconds with 0 out of 0 bytes received
2020-09-10T20:09:22 opnsense[72519] /usr/local/etc/rc.newwanip: Dynamic DNS (mobile.myhome.net): Current Service: cloudflare
2020-09-10T20:09:22 opnsense[72519] /usr/local/etc/rc.newwanip: Dynamic DNS (mobile.myhome.net): _checkStatus() starting.
2020-09-10T20:08:54 opnsense[6980] /usr/local/etc/rc.dyndns: Dynamic DNS: updatedns() starting
2020-09-10T20:08:54 kernel pflog0: promiscuous mode enabled
2020-09-10T20:08:54 kernel pflog0: promiscuous mode disabled
2020-09-10T20:08:52 opnsense[72519] /usr/local/etc/rc.newwanip: Dynamic DNS (mobile.myhome.net via Cloudflare): _update() starting.
2020-09-10T20:08:52 opnsense[72519] /usr/local/etc/rc.newwanip: Dynamic DNS (mobile.myhome.net): running dyndns_failover_interface for opt8. found vtnet3
2020-09-10T20:08:52 opnsense[72519] /usr/local/etc/rc.newwanip: Dynamic DNS (mobile.myhome.net): new.ip.4g.failover extracted
2020-09-10T20:08:52 opnsense[72519] /usr/local/etc/rc.newwanip: Dynamic DNS: updatedns() starting
2020-09-10T20:08:52 opnsense[72519] plugins_configure newwanip (execute task : dyndns_configure_do(,opt8))
2020-09-10T20:08:52 opnsense[72519] plugins_configure newwanip (,opt8)
2020-09-10T20:08:52 opnsense[72519] /usr/local/etc/rc.newwanip: Resyncing OpenVPN instances for interface MOBILE.
2020-09-10T20:08:52 opnsense[72519] plugins_configure vpn (execute task : openvpn_configure_do(,opt8))
2020-09-10T20:08:52 opnsense[72519] plugins_configure vpn (execute task : ipsec_configure_do(,opt8))
2020-09-10T20:08:52 kernel pflog0: promiscuous mode enabled
2020-09-10T20:08:52 kernel pflog0: promiscuous mode disabled
2020-09-10T20:08:52 opnsense[72519] plugins_configure vpn (,opt8)
2020-09-10T20:08:52 opnsense[72519] /usr/local/etc/rc.newwanip: Adding static route for monitor 1.0.0.1 via new.ip.4g.1
2020-09-10T20:08:51 opnsense[72519] /usr/local/etc/rc.newwanip: Removing static route for monitor 1.0.0.1 via new.ip.4g.1
2020-09-10T20:08:51 opnsense[72519] /usr/local/etc/rc.newwanip: The WAN_DHCP6 monitor address is empty, skipping.
2020-09-10T20:08:51 opnsense[72519] /usr/local/etc/rc.newwanip: Adding static route for monitor 1.1.1.1 via normal.ip.wan.1
2020-09-10T20:08:51 opnsense[72519] /usr/local/etc/rc.newwanip: Removing static route for monitor 1.1.1.1 via normal.ip.wan.1
2020-09-10T20:08:51 opnsense[72519] plugins_configure monitor (execute task : dpinger_configure_do())
2020-09-10T20:08:51 opnsense[72519] plugins_configure monitor ()
2020-09-10T20:08:51 opnsense[72519] /usr/local/etc/rc.newwanip: ROUTING: skipping IPv6 default route
2020-09-10T20:08:51 opnsense[72519] /usr/local/etc/rc.newwanip: ROUTING: IPv6 default gateway set to wan
2020-09-10T20:08:51 opnsense[72519] /usr/local/etc/rc.newwanip: ROUTING: skipping IPv4 default route
2020-09-10T20:08:51 opnsense[72519] /usr/local/etc/rc.newwanip: ROUTING: IPv4 default gateway set to wan
2020-09-10T20:08:51 opnsense[72519] /usr/local/etc/rc.newwanip: ROUTING: entering configure using 'opt8'
2020-09-10T20:08:51 opnsense[72519] plugins_configure hosts (execute task : unbound_hosts_generate())
2020-09-10T20:08:51 opnsense[72519] plugins_configure hosts (execute task : dnsmasq_hosts_generate())
2020-09-10T20:08:51 opnsense[72519] plugins_configure hosts ()
2020-09-10T20:08:51 opnsense[72519] /usr/local/etc/rc.newwanip: On (IP address: new.ip.4g.failover) (interface: MOBILE[opt8]) (real interface: vtnet3).
2020-09-10T20:08:51 opnsense[72519] /usr/local/etc/rc.newwanip: IPv4 renewal is starting on 'vtnet3'
2020-09-10T20:08:51 dhclient[38635] Creating resolv.conf
2020-09-10T20:08:51 dhclient[5808] New Static Routes (vtnet3): 192.168.5.1 new.ip.4g.1
2020-09-10T20:08:51 dhclient[86343] route add default new.ip.4g.1
2020-09-10T20:08:51 dhclient[81154] New Routers (vtnet3): new.ip.4g.1
2020-09-10T20:08:51 dhclient[76408] New Broadcast Address (vtnet3): 95.255.255.255
2020-09-10T20:08:51 dhclient[58347] New Subnet Mask (vtnet3): 255.255.255.0
2020-09-10T20:08:51 dhclient[42879] New IP Address (vtnet3): new.ip.4g.failover
2020-09-10T20:08:51 dhclient[87628] Removing states from old IP 'old.ip.4g.failover' (new IP 'new.ip.4g.failover')
2020-09-10T20:08:51 dhclient[69498] Comparing IPs: Old: old.ip.4g.failover New: new.ip.4g.failover
2020-09-10T20:08:51 dhclient[50422] Starting delete_old_states()
2020-09-10T20:08:51 dhclient[49360] DHCPACK from new.ip.4g.1
2020-09-10T20:08:51 dhclient[49360] DHCPREQUEST on vtnet3 to 255.255.255.255 port 67
2020-09-10T20:08:49 dhclient[49360] DHCPOFFER from new.ip.4g.1
2020-09-10T20:08:49 dhclient[49360] DHCPOFFER from new.ip.4g.1
2020-09-10T20:08:48 dhclient[49360] DHCPDISCOVER on vtnet3 to 255.255.255.255 port 67 interval 4
2020-09-10T20:08:46 dhclient[49360] DHCPDISCOVER on vtnet3 to 255.255.255.255 port 67 interval 2
2020-09-10T20:08:46 dhclient[49360] DHCPNAK from new.ip.4g.1
2020-09-10T20:08:46 dhclient[49360] DHCPREQUEST on vtnet3 to 255.255.255.255 port 67

Todays log.

I have 2 gateways setup. WAN (fiber internet) with priority 1, WAN2 (4g failover) with priority 255. Both are enabled
Title: Re: Every N-hours no more routing
Post by: Supermule on September 10, 2020, 09:36:55 pm
Hi.

Pfsense has this option under system -> advanced -> networking

Reset all states if WAN IP changes. It should solve your problem.

I dont have a OPNsense mackine at hand here to see if the setting is there as well
Title: Re: Every N-hours no more routing
Post by: Dunky13 on September 10, 2020, 09:55:48 pm
I found this option (after going through https://github.com/opnsense/core/commit/c5b9f672ab9914977a3a0c50a9b63145f77bcb7d (https://github.com/opnsense/core/commit/c5b9f672ab9914977a3a0c50a9b63145f77bcb7d)) under:

Code: [Select]
Firewall -> Settings -> Advanced -> Dynamic state reset
I'll test this feature, I think within the next 28 hours, I'll know whether or not it gets killed again, or not. :fingerscrossed:
Title: Re: Every N-hours no more routing
Post by: Fright on September 11, 2020, 12:32:30 pm
and please try to not reboot opnsense right away  ;)
look at route table, gateway status, fw logs etc..
it would be good to understand what exactly stops working as expected after ip change
Title: Re: Every N-hours no more routing
Post by: sfu420 on September 12, 2020, 09:51:00 pm
This is my first post, so Hi Everyone :)

I'm using OPNsense since 2 years now without any issue, but after uprading from 20.1.9 to 20.7.2 i have exactly the same issue. Unfortunately i don't have a monitor to plug in to see what happens and the UI is unreachable. Everytime it occurs i need to push reset on the device.
I'm running it bare metal on a Zotac CI323 Nano with Intel Celeron N3150.

My IP address is not fix IP, but it's changing only 1 time a year.

I will test out this "Dynamic state reset" idea, as well. Will see...

Code: [Select]
2020-09-12T18:38:24   kernel   FreeBSD is a registered trademark of The FreeBSD Foundation.
2020-09-12T18:38:24   kernel   The Regents of the University of California. All rights reserved.
2020-09-12T18:38:24   kernel   Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
2020-09-12T18:38:24   kernel   Copyright (c) 1992-2019 The FreeBSD Project.
2020-09-12T18:38:24   kernel   Copyright (c) 2013-2019 The HardenedBSD Project.
2020-09-12T18:38:24   kernel   ---<<BOOT>>---
2020-09-12T18:38:24   syslogd   kernel boot file is /boot/kernel/kernel
2020-09-12T17:59:10   dhclient[43863]   bound to x.x.x.22 -- renewal in 43200 seconds.
2020-09-12T17:59:10   dhclient[99656]   Creating resolv.conf
2020-09-12T17:59:10   dhclient[43863]   DHCPACK from x.x.x.254
2020-09-12T17:59:10   dhclient[43863]   DHCPREQUEST on re1 to x.x.x.254 port 67
2020-09-12T05:59:10   dhclient[43863]   bound to x.x.x.22 -- renewal in 43200 seconds.
2020-09-12T05:59:10   dhclient[76364]   Creating resolv.conf
2020-09-12T05:59:10   dhclient[43863]   DHCPACK from x.x.x.254
2020-09-12T05:59:10   dhclient[43863]   DHCPREQUEST on re1 to x.x.x.254 port 67
2020-09-11T17:59:11   dhclient[43863]   bound to x.x.x.22 -- renewal in 43199 seconds.
2020-09-11T17:59:11   dhclient[88042]   Creating resolv.conf
2020-09-11T17:59:11   dhclient[43863]   DHCPACK from x.x.x.254
2020-09-11T17:59:11   dhclient[43863]   DHCPREQUEST on re1 to x.x.x.254 port 67
2020-09-11T17:59:09   dhclient[43863]   DHCPREQUEST on re1 to x.x.x.254 port 67
Title: Re: Every N-hours no more routing
Post by: Dunky13 on September 12, 2020, 09:56:34 pm
So far, the dynamic state reset, seems to have helped, and I haven't had any outages. But I'll keep you updated to see if that stays for longer periods.
Title: Re: Every N-hours no more routing
Post by: sfu420 on September 13, 2020, 11:04:46 am
Unfortunately it didn't help. It just happened again :(

Any idea?
Title: Re: Every N-hours no more routing
Post by: Fright on September 13, 2020, 05:22:19 pm
@sfu420
i am not sure if your problem is the same.
your dhcp client renews lease every 12hr. and in logs ip not changed, just renewed lease.
and you say that opnsense stops responding completely
I think you'd better open a new topic