OPNsense Forum

Archive => 23.7 Legacy Series => Topic started by: lar.hed on November 04, 2023, 09:56:57 AM

Title: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: lar.hed on November 04, 2023, 09:56:57 AM
So after my Unbound DNS issue/challenge, I have stumbled onto something completely different:

After reboot and everything working last knight, I started my PC (direct connected to firewall hardware - I am running OPNsense on baremetal here) and yet again NO connection with the outside. I can log into my OPNsense web front (ip address) and Home Assistant (ip address) - but there is NO connection to the outside of the firewall. Since I can access stuff on my intranet I do know I have some communication working - but nothing, not even 1.1.1.1 on the outside. Now I do know about the default gateway challenge some have - I have not done anything to that. And the reason is: I have not changed anything on my Dual WAN  (fiber and a LTE connection over a Netgear M5 mobile router that is connected thru ethernet cable), so I know this setup has been working very good for at least a year since I set it up. I can not find any reason why default gateway should be a problem. Do also note that ALL other connections actually DOES WORK. There is only this PC connection that fails outside communication. And this is after link down late last evening and now link up. Here is part of the log to follow:
2023-11-04T09:17:54 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())
2023-11-04T09:17:54 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns (execute task : dnsmasq_configure_do())
2023-11-04T09:17:54 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns ()
2023-11-04T09:17:53 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dhcp (execute task : dhcpd_dhcp_configure())
2023-11-04T09:17:53 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dhcp ()
2023-11-04T09:17:53 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure ipsec (execute task : ipsec_configure_do(,opt2))
2023-11-04T09:17:53 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure ipsec (,opt2)
2023-11-04T09:17:53 Notice opnsense /usr/local/etc/rc.linkup: ROUTING: entering configure using 'opt2'
2023-11-04T09:17:53 Notice opnsense /usr/local/etc/rc.linkup: DEVD: Ethernet attached event for opt2(igb2)
2023-11-04T09:17:53 Notice kernel <6>igb2: link state changed to UP


2023-11-03T21:08:46 Notice kernel <6>igb2: link state changed to DOWN
2023-11-03T21:08:46 Notice opnsense /usr/local/etc/rc.linkup: DEVD: Ethernet detached event for opt2(igb2)


igb2 is the port for the PC that I am writing this on. The only way to get my PC and that port (igb2) working, that I have found out is to reboot the OPNsense box - that I have never ever had to do before upgrade.

So the solution seems to be: Do not turn the PC off - that way the link is up all the time, but that is just bandaid on the real problem....

I would love to know why a link up does not work after this upgrade - what is missing on the link up that the reboot fixes? Any ideas?
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: lar.hed on November 04, 2023, 10:29:52 AM
I am also a bit confused by why Unbound DNS is trying to get configured after link up (first line in the log above) - Unbound DNS is disabled (well, it is not enabled, but it was once after the upgrade, so it may still think it is enabled, but it is not) ???
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: adk20 on November 06, 2023, 08:07:44 PM
Hi forum,

I can completely confirm this issue.

After switching on a PC that is directly connected to one of the OPN ports, it takes approx. 10 minutes for the external network connection to become available. Connection to the OPN interface works, as does e.g. DNS. So it's not an issue on the "pc side" of the network.

All other physical OPN ports are not affected and continue to function as normal.

The only log entries that appear around the time the network starts to work are those:

SYSTEM/LOG/GENERAL
Notice   root   reload filter for configured schedules   
Notice   kernel   <6>igb1: promiscuous mode disabled   
Notice   kernel   <6>igb1: promiscuous mode enabled

This has only started after upgrading to 23.7.7_3.

Any ideas are much appreciated.
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: franco on November 06, 2023, 08:14:20 PM
Could it be this one? https://github.com/opnsense/core/commit/b0830803

# opnsense-patch b0830803


Cheers,
Franco
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: adk20 on November 06, 2023, 08:20:41 PM
Thanks, franco, for the hint. I will check at my earliest convenience.
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: lar.hed on November 07, 2023, 10:04:45 AM
franco, I would not be able to say since I never get my port up again for internet connection. What I have to do, to get the port working is one of two things:
1) Reboot OPNsense
2) Disable/Enable interface

Of course I now days prefer option 2 - it is by far the best option...

Still confused though why Unbound still seems to be alive though I have disabled Unbound. And I get some really confusing log messages about Unbound... How do I stop Unbound from running since it is disabled (do note it has been enabled, but since Unbound seems not to work after latest patch I use DNSmasq instead)? Remove it from config xml file and hope for the best?
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: cookiemonster on November 07, 2023, 10:50:44 AM
At a guess Iar.hed (and franco surely will correct me) unbound_configure_do() is just a task in the plugins_configure dns () function. It doesn't mean it starts up Unbound.
If you give it time to start up all services, do you see it running? sudo ps -aux | grep -i unbound should do it
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: lar.hed on November 07, 2023, 11:22:44 AM
Well No I do not see Unbound running under services (I show all services under the Lobby overview). And I did SSH into my OPNsense installation and run your suggested ps command:

ps -aux | grep -i unbound
root   17569   0.0  0.0   12720   2264  0  S+   11:17       0:00.00 grep -i unbound


So no, Unbound is not running, and this was after 3 days of uptime - as mentioned above I now restart the interface by disable/enable the interface...

So I think you are (of course) correct, it got to be part of some sort of startup sequence for Unbound and all related to Unbound (for example, since I had Unbound enabled earlier and I used block lists, I guess it might download all thoose at any restart or so).

Thanks for your help!
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: franco on November 07, 2023, 12:12:58 PM
I'd appreciate trying the patch and see if it works. It's 100% harmless.


Cheers,
Franco
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: franco on November 07, 2023, 12:13:24 PM
Quote from: cookiemonster on November 07, 2023, 10:50:44 AM
At a guess Iar.hed (and franco surely will correct me) unbound_configure_do() is just a task in the plugins_configure dns () function. It doesn't mean it starts up Unbound.

Correct!  8)
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: lar.hed on November 07, 2023, 12:30:46 PM
Quote from: franco on November 06, 2023, 08:14:20 PM
Could it be this one? https://github.com/opnsense/core/commit/b0830803

# opnsense-patch b0830803


Cheers,
Franco

Just did a fast & fuggly test of this (I edited the file in question since well 2 rows of "filter_configure(false, false);" was way to easy to enter...).

So I say, with a bit of a reservation, that this solved my problem.
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: franco on November 07, 2023, 01:06:17 PM
Easier than running "opnsense-patch"? ;)

So ok, I can bring that back but it's a bit odd, because that wasn't the purpose of why the lines were there.


Cheers,
Franco
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: lar.hed on November 07, 2023, 01:08:10 PM
Sorry, but when you link me to a gitpage where the code is, and well I used to do a lot of software development, then I got curios.... So yes it was easier for me. Maybe not for everyone else.
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: franco on November 07, 2023, 01:09:18 PM
As a policy we post the link and opnsense-patch because if we don't the commit hash could be anything. This way people can double-check that they actually want the patch.

(not an issue, only want to explain)


Cheers,
Franco
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: lar.hed on November 07, 2023, 01:11:09 PM
Next time I will behave :o
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: franco on November 07, 2023, 01:14:33 PM
Nah, all good.

So we talked about this just now and would like to know what changes in /tmp/rules.debug when this happens... It sounds like something is going on in the file between having the filter reload lines (as it was on > 23.7.7) and how it is now (since 23.7.7).

Could you make copy in both cases and diff against it?


Thanks,
Franco
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: franco on November 07, 2023, 01:17:47 PM
Maybe I should describe the process:

Comment out filter_configure(false, false); lines. Provoke error case.

Copy /tmp/rules.debug to e.g. /root/rules.bad

run

# /usr/local/etc/rc.filter_configure

(problem should be fixed)

Copy /tmp/rules.debug to e.g. /root/rules.good

And then let us know what this returns:

# diff -u /root/rules.bad /root/rules.good
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: lar.hed on November 07, 2023, 01:32:40 PM
And here is the result:

The result is removed since I seemed to have a WAN down in the middle of all. Jikes.
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: lar.hed on November 07, 2023, 01:34:19 PM
NO don't trust that - this is my WAN-LTE failover working - in the middle of all, my WAN connection was dropped. I like this..... NOT!
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: lar.hed on November 07, 2023, 01:51:19 PM
Okay now I have kind of a inverted problem: I can not recreate the problem.

And to be very clear: The filter lines are commented, so they are not executed, and YES I have rebooted my OPNsense Bare metal firewall hardware. And now it works, and yes, WAN is back up. ???
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: lar.hed on November 07, 2023, 02:51:39 PM
Nope, I have no way of triggering the problem anymore. :'(

I partly like to have this problem gone, but I also like to know why/what and so on. So even if I am partly okay with everything is back to normal, I would very much like to know what and why.

What I have tried is reboot, cold restart, all cables out, and some more. There is nothing I can do to trigger this.
Except maybe reinstall everything from 23.7 and then upgrade, restore config - maybe that might re-trigger this. I might have to look into that, just need some more time....
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: lar.hed on November 08, 2023, 09:32:42 AM
Okay, so this morning OPNsense was back in order - I had the same problem as before. I also am running the UNmodified file, the lines was completely removed. Just as it was last night when I rebooted, so why this extra time between link down - bunch of 12 hours or so - link up -> no connection to outside world on this particular direct connected PC (1.1.1.1 works, so raw IP traffic works perfect).

So I directly brought up my MobaXterm, and logged into OPNsense and cp the file. The I run the command suggested "/usr/local/etc/rc.filter_configure" - and Internet connection restored. I then cp the file again, and here is the result - it looks a bit like the one before (no there is no WAN or LTE down - all traffic goes over WAN):

diff -u /root/rules.bad /root/rules.good
--- /root/rules.bad     2023-11-08 09:14:25.069074000 +0100
+++ /root/rules.good    2023-11-08 09:15:13.266804000 +0100
@@ -68,6 +68,7 @@
no nat proto carp all
no rdr proto carp all
# [prio: 200]
+nat on igb7 inet from (igb2:network) to any port 500 -> (igb7:0) static-port # Automatic outbound rule
nat on igb7 inet from (vlan01:network) to any port 500 -> (igb7:0) static-port # Automatic outbound rule
nat on igb7 inet from (igb0:network) to any port 500 -> (igb7:0) static-port # Automatic outbound rule
nat on igb7 inet from (igb5:network) to any port 500 -> (igb7:0) static-port # Automatic outbound rule
@@ -76,6 +77,7 @@
nat on igb7 inet from (igb4:network) to any port 500 -> (igb7:0) static-port # Automatic outbound rule
nat on igb7 inet from (igb6:network) to any port 500 -> (igb7:0) static-port # Automatic outbound rule
nat on igb7 inet from 127.0.0.0/8 to any port 500 -> (igb7:0) static-port # Automatic outbound rule
+nat on igb7 inet from (igb2:network) to any -> (igb7:0) port 1024:65535 # Automatic outbound rule
nat on igb7 inet from (vlan01:network) to any -> (igb7:0) port 1024:65535 # Automatic outbound rule
nat on igb7 inet from (igb0:network) to any -> (igb7:0) port 1024:65535 # Automatic outbound rule
nat on igb7 inet from (igb5:network) to any -> (igb7:0) port 1024:65535 # Automatic outbound rule
@@ -84,6 +86,7 @@
nat on igb7 inet from (igb4:network) to any -> (igb7:0) port 1024:65535 # Automatic outbound rule
nat on igb7 inet from (igb6:network) to any -> (igb7:0) port 1024:65535 # Automatic outbound rule
nat on igb7 inet from 127.0.0.0/8 to any -> (igb7:0) port 1024:65535 # Automatic outbound rule
+nat on igb1 inet from (igb2:network) to any port 500 -> (igb1:0) static-port # Automatic outbound rule
nat on igb1 inet from (vlan01:network) to any port 500 -> (igb1:0) static-port # Automatic outbound rule
nat on igb1 inet from (igb0:network) to any port 500 -> (igb1:0) static-port # Automatic outbound rule
nat on igb1 inet from (igb5:network) to any port 500 -> (igb1:0) static-port # Automatic outbound rule
@@ -92,6 +95,7 @@
nat on igb1 inet from (igb4:network) to any port 500 -> (igb1:0) static-port # Automatic outbound rule
nat on igb1 inet from (igb6:network) to any port 500 -> (igb1:0) static-port # Automatic outbound rule
nat on igb1 inet from 127.0.0.0/8 to any port 500 -> (igb1:0) static-port # Automatic outbound rule
+nat on igb1 inet from (igb2:network) to any -> (igb1:0) port 1024:65535 # Automatic outbound rule
nat on igb1 inet from (vlan01:network) to any -> (igb1:0) port 1024:65535 # Automatic outbound rule
nat on igb1 inet from (igb0:network) to any -> (igb1:0) port 1024:65535 # Automatic outbound rule
nat on igb1 inet from (igb5:network) to any -> (igb1:0) port 1024:65535 # Automatic outbound rule


Some interface info:
igb1 = WAN (Primary)
igb7 = LTE (failover for WAN that is)

igb2 = PC that has this link-down/link-up problem

igb0 = Home Assistant server
igb5 = Laser printer with built in scanner
igb4 = Extra server interface, currently not connected at all
igb6 / vland1 = Unifi AP, where vlan1 is IoT
igb3 = Media with things like Kef speakers, Chromecast and projector

I find some strange things in the above. Like well any of the "Automatic outbound rule". Why do they appear when the WAN link is stable? Do note that the box has been rebooted after WAN problem, and well the WAN has been up since then...

Anyways, the thing to accept is that the command:
/usr/local/etc/rc.filter_configure

Solves my problem with link-down/<a large amount of time it seems>/link-up and no internet connection (which looks a lot like DNS problem, but since all other interfaces has DNS resolution it is more likely to be something not DNS related - like filter....)
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: lar.hed on November 08, 2023, 09:34:52 AM
Oh and now I have behaved so I have also reapplied the patch (not edited the file) in a correct manner....
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: franco on November 08, 2023, 01:45:23 PM
Thanks for the debugging. Highly appreciated. igb2 is static IPv4, right?

> Oh and now I have behaved so I have also reapplied the patch (not edited the file) in a correct manner....

Hehe, that made me happy <3


Cheers,
Franco
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: lar.hed on November 08, 2023, 05:06:07 PM
Quote from: franco on November 08, 2023, 01:45:23 PM
Thanks for the debugging. Highly appreciated. igb2 is static IPv4, right?

igb2 is DHCP.

igb2 is actually my work PC (Microsoft Surface Book 2, connected over USB-C<->Thunderbolt to my Dell 4021Q screen, which has a Ethernet port connected to the igb2 interface). igb2 interface has DHCP since well from time to another I actually do use a Dlink switch when I need more connections at my work desk. So it needs to be DHCP for those very very limited and few occasions.
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: lar.hed on November 08, 2023, 05:08:37 PM
Quote from: franco on November 08, 2023, 01:45:23 PM> Oh and now I have behaved so I have also reapplied the patch (not edited the file) in a correct manner....

Hehe, that made me happy <3

Just for the record: I just returned back home, and the link has been down for at least 5 hours. No problem after reapplied the patch - works like it always has.
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: lar.hed on November 09, 2023, 09:26:28 AM
For what it is worth: Still working after that patch.

I have also done a few more diff on rules.debug - the one last night returned zero, this morning returned a lot more rows but some of those lines are not interesting (state, block country and stuff). Let me know if anyone needs them, but I say they do not bring any news to the table.
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: franco on November 09, 2023, 09:35:00 AM
> igb2 is DHCP

Are you sure? In the interface settings IPv4 mode is set to "DHCP"? We were pondering over it but would make an educated guess that you mean it runs a DHCP server (which also requires a static IPv4 address) since you plug in clients...


Cheers,
Franco
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: lar.hed on November 09, 2023, 10:17:57 AM
Okay, if you phrase it like that I need to change my answer:

Yes the interface is static (10.168.2.1/24 - Upstream GW = Auto detect) - however the client that is connected to that port aka "Surface Booke 2 PC with Windows 10" is DHCP (10.168.2.20). So yes the interface is static - I just assumed (assumption is the mother of all f*ckups and all that) you referernced my PC and not the interface port on my h/w running OPNsense. This is clearly my mistake, sorry for the confusion.
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: franco on November 09, 2023, 10:35:33 AM
Ok, thanks for clarifying. So we know what the problem is but the fix is really really tricky to pull off.

The idea is simple: leave the static addresses on the interface when rc.linkup pulls it down.

The reality is overly complex: this pertains to virtual IPs as well, CARPs are already an exception and multiple code points calling the offending interface_bring_down() either do too much or too little in the scope of what is happening. interface_bring_down() is a convoluted piece of code that does historic things for historic reasons but without a real plan of action. I'll try to unwind this in the coming days.

The good news is that this is an edge case that has nothing to do with why the filter reload was removed from the file and that decision stands. It actually would bring a lot more stability to the system if we manage to unwind interface_bring_down() behaviour and fix all callers.

But that also means when 23.7.8 hits today and you eventually install it you need to reapply the patch for now to keep it from breaking on your end.

So far I'm only aware of your report further indicating that we are going in the right direction. Thanks for all your help so far!


Cheers,
Franco
Title: Re: After upgrade to 23.7.7_3 - link down/up - and after that NO connection outside
Post by: lar.hed on January 16, 2024, 10:03:54 AM
Quote from: franco on November 06, 2023, 08:14:20 PM
Could it be this one? https://github.com/opnsense/core/commit/b0830803

# opnsense-patch b0830803


Cheers,
Franco

Franco, is there any plan if/when this patch might be integrated into the installer files?
(I just re-installed OPNsense, moved from UFS to ZFS, and got this challenge back again).

Oh and while I am it: After the install from scratch and update all, restore of config file I did NOT apply the patch. The rc.linkup did NOT have the two filter_configure(false, false) rows. Since I did not know why PC stopped working I did the old trick, stop/start interface = up and running again. So I have now re-applied the patch, which added the two lines of filter_configure(false, false). I have not tried if this solves my challenge yet so to speak, I will later today I guess....