[SOLVED] Multi-WAN

Started by MenschAergereDichNicht, January 17, 2022, 04:49:34 PM

Previous topic - Next topic
January 17, 2022, 04:49:34 PM Last Edit: January 20, 2022, 12:13:52 AM by MenschAergereDichNicht
Hi,

i am not sure if my problem is specific to the release candidate or works as it should as this is the first time i try a multi-wan configuration (using the RC1 obviously).

That said i'll try to describe the situation.

I have a multi-wan setup with one WAN interface connected by means of fibre wire. This WAN interface uses a static IPv4 and DHCP for IPv6.
I have a second WAN interface that is intended to work together with a nano-router that connects to the LTE connection of my mobile by means of Wifi. This interface gets a dynamic IPv4 only.

If all is connected it seems to work fine. But if i power off the nano-router and the corresponding WAN2 connection loses the ethernet connection to the nano-router the WAN2_GW is "defunct" and all references to the WAN2_GW are removed from e.g. the "System-Settings-General" DNS-Server settings where i had specified the Gateway for each entry.
I guess that the gateway is re-activated if i enable the nano-router. But what about the DNS gateway settings? I exported(backup) the configuration and looked into the .xml. It seems that the entries are completely wiped out... .

Do i have to make those changes (assign WAN2_GW to DNS server entries) every time i activate the nano-router?

Update:

I tried to reproduce the behaviour. But i am currently not able to do so. Because i try lots of things i am not sure about the exact workflow where the problem occured.
If it happens again i report back.

I think this might be related to https://forum.opnsense.org/index.php?topic=26341.0

Can you add a system: settings: tunable "net.route.multipath" with value "0"? Best to reboot to avoid the situation when you already have two default routes stuck in the system.


Cheers,
Franco

Thanks for the feedback. I will try the tunable when i can reboot the router without getting my wife mad at me.

> without getting my wife mad at me

If you manage to find a way please do tell.  :)


Cheers,
Franco

Quote from: franco on January 17, 2022, 07:49:28 PM
> without getting my wife mad at me

If you manage to find a way please do tell.  :)


Cheers,
Franco

Reboot time 03:00 a.m. is frequently a good choice in this situation...
kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....

Well... Actually i was able to reboot a little bit earlier than that :-)

Ok. I did the following:


  • Set the tunable mentioned above
  • Rebooted
  • Cleared the unbound and System/General log
  • Started a speedtest (bufferbloat test page)

After that the load went up and Unbound was leading the CPU usage.
So it seems not to help.

The Unbound log file is still empty afterwards but the general system log contains the following entries:

2022-01-17T19:48:46 Error opnsense /usr/local/etc/rc.newwanipv6: Choose to bind WAN2_DHCP on 0.0.0.0 since we could not find a proper match.
2022-01-17T19:48:46 Error opnsense /usr/local/etc/rc.newwanipv6: Adding static route for monitor 1.1.1.1 via 192.168.69.1
2022-01-17T19:48:46 Error opnsense /usr/local/etc/rc.newwanipv6: Removing static route for monitor 1.1.1.1 via 192.168.69.1
2022-01-17T19:48:46 Error opnsense /usr/local/etc/rc.newwanipv6: Adding static route for monitor 2606:4700:4700::1111 via fe80::eadf:70ff:fe7a:23da%igb3
2022-01-17T19:48:46 Error opnsense /usr/local/etc/rc.newwanipv6: Removing static route for monitor 2606:4700:4700::1111 via fe80::eadf:70ff:fe7a:23da%igb3
2022-01-17T19:48:46 Error opnsense /usr/local/etc/rc.newwanipv6: ROUTING: keeping current default gateway 'fe80::eadf:70ff:fe7a:23da%igb3'
2022-01-17T19:48:46 Error opnsense /usr/local/etc/rc.newwanipv6: ROUTING: setting IPv6 default route to fe80::eadf:70ff:fe7a:23da
2022-01-17T19:48:46 Error opnsense /usr/local/etc/rc.newwanipv6: ROUTING: IPv6 default gateway set to wan
2022-01-17T19:48:46 Error opnsense /usr/local/etc/rc.newwanipv6: ROUTING: keeping current default gateway '192.168.69.1'
2022-01-17T19:48:46 Error opnsense /usr/local/etc/rc.newwanipv6: ROUTING: setting IPv4 default route to 192.168.69.1
2022-01-17T19:48:46 Error opnsense /usr/local/etc/rc.newwanipv6: ROUTING: IPv4 default gateway set to wan
2022-01-17T19:48:46 Error opnsense /usr/local/etc/rc.newwanipv6: ROUTING: entering configure using 'wan'
2022-01-17T19:48:46 Error opnsense /usr/local/etc/rc.newwanipv6: The command '/sbin/route add -host -'inet6' '2606:4700:4700::1111' 'fe80::eadf:70ff:fe7a:23da%'' returned exit code '71', the output was 'route: fe80::eadf:70ff:fe7a:23da%: Name does not resolve'
2022-01-17T19:47:59 Error opnsense /usr/local/etc/rc.newwanipv6: The command '/sbin/route add -host -'inet6' '2a05:fc84::42' 'fe80::eadf:70ff:fe7a:23da%'' returned exit code '71', the output was 'route: fe80::eadf:70ff:fe7a:23da%: Name does not resolve'
2022-01-17T19:47:59 Error opnsense /usr/local/etc/rc.newwanipv6: On (IP address: <IPv6-Address>) (interface: WAN[wan]) (real interface: igb3).
2022-01-17T19:47:59 Error opnsense /usr/local/etc/rc.newwanipv6: IPv6 renewal is starting on 'igb3'
2022-01-17T19:47:56 Error opnsense /usr/local/etc/rc.linkup: Warning! dhcpd_radvd_configure(auto) found no suitable IPv6 address on igb1_vlan13
2022-01-17T19:47:55 Error opnsense /usr/local/etc/rc.linkup: ROUTING: skipping IPv6 default route
2022-01-17T19:47:55 Error opnsense /usr/local/etc/rc.linkup: ROUTING: IPv6 default gateway set to wan
2022-01-17T19:47:55 Error opnsense /usr/local/etc/rc.linkup: ROUTING: creating /tmp/igb3_defaultgw using '192.168.69.1'
2022-01-17T19:47:55 Error opnsense /usr/local/etc/rc.linkup: ROUTING: creating /tmp/igb3_defaultgw using '192.168.69.1'
2022-01-17T19:47:55 Error opnsense /usr/local/etc/rc.linkup: ROUTING: removing /tmp/igb3_defaultgw
2022-01-17T19:47:55 Error opnsense /usr/local/etc/rc.linkup: ROUTING: setting IPv4 default route to 192.168.69.1
2022-01-17T19:47:55 Error opnsense /usr/local/etc/rc.linkup: ROUTING: IPv4 default gateway set to wan
2022-01-17T19:47:55 Error opnsense /usr/local/etc/rc.linkup: ROUTING: entering configure using 'wan'
2022-01-17T19:47:55 Error opnsense /usr/local/etc/rc.linkup: Accept router advertisements on interface igb3
2022-01-17T19:47:54 Error opnsense /usr/local/etc/rc.linkup: DEVD: Ethernet attached event for dynamic wan(igb3)
2022-01-17T19:47:51 Error dhcp6c transmit failed: Network is down
2022-01-17T19:47:49 Error dhcp6c transmit failed: Network is down
2022-01-17T19:47:49 Error dhcp6c transmit failed: Network is down
2022-01-17T19:47:48 Error opnsense /usr/local/etc/rc.linkup: Clearing states for stale wan route on igb3
2022-01-17T19:47:48 Error dhcp6c transmit failed: Network is down
2022-01-17T19:47:48 Error dhcp6c transmit failed: Network is down
2022-01-17T19:47:48 Error opnsense /usr/local/etc/rc.linkup: DEVD: Ethernet detached event for dynamic wan(igb3)



Do you have Unbound set to listen on specific interface? Do you have RSS turned on manually?

The tunable was mainly for the concern about mult-WAN not working since FreeBSD 13 by default let's you create multiple default routes now in the same routing table, but obviously will only use the latest one.


Cheers,
Franco

> Choose to bind WAN2_DHCP on 0.0.0.0 since we could not find a proper match.

FWIW, this looks like the router or modem doesn't want to give you a lease. Does it want a different MAC address?


Cheers,
Franco

January 17, 2022, 08:21:01 PM #9 Last Edit: January 17, 2022, 08:31:09 PM by MenschAergereDichNicht
Unbound is set to listen on all interfaces (default).

Regarding RSS i guess the answer is no. The only other tunables i adapted are "vm.pmap.pti" and "hw.ibrs_disable" because the APU does not have any spare resources and i don't think it is that important on a router.

Regarding the lease: The WAN2 is not available all the time. It is a nano-router that i would only activate (power on) in case the WAN interface goes down. Therefore it is "normal" that currently there is no IP-Address available. But it should automatically be configured when i plug-in the power cable.

For Multi-WAN i created a gateway group that consists of the WAN_GW (Tier 1) and WAN2_DHCP (Tier 2). I created only a IPv4 Multi-WAN setup because the nano-router (or my Android hotspot) does not hand out IPv6.

In the above situation the Tier 2 gateway is deactivated because the nano-router has no power.

January 17, 2022, 08:42:45 PM #10 Last Edit: January 18, 2022, 12:17:38 AM by MenschAergereDichNicht
I also have the following entry in the General system log:

2022-01-18T00:04:55 Error opnsense /usr/local/etc/rc.linkup: The command '/usr/local/opnsense/scripts/dns/unbound_dhcpd.py --domain 'localdomain'' returned exit code '1', the output was 'Unable to lock on the pidfile. Traceback (most recent call last): File "/usr/local/opnsense/site-python/daemonize.py", line 91, in start fcntl.flock(lockfile, fcntl.LOCK_EX | fcntl.LOCK_NB) BlockingIOError: [Errno 35] Resource temporarily unavailable During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/opnsense/scripts/dns/unbound_dhcpd.py", line 237, in <module> daemon.start() File "/usr/local/opnsense/site-python/daemonize.py", line 96, in start pidfile.write(old_pid) UnboundLocalError: local variable 'old_pid' referenced before assignment'


Maybe a concurrency problem?

January 17, 2022, 11:39:16 PM #11 Last Edit: January 18, 2022, 12:17:51 AM by MenschAergereDichNicht
After an attempt to start the bufferbloat browser test the WAN connection loses packets and i see the following entries inside the General log:


...
2022-01-17T23:31:15 Error opnsense /usr/local/etc/rc.linkup: DEVD: Ethernet attached event for dynamic wan(igb3)
2022-01-17T23:31:10 Error dhcp6c transmit failed: Network is down
2022-01-17T23:31:09 Error dhcp6c transmit failed: Network is down
2022-01-17T23:31:09 Error dhcp6c transmit failed: Network is down
2022-01-17T23:31:08 Error opnsense /usr/local/etc/rc.linkup: Clearing states for stale wan route on igb3
2022-01-17T23:31:08 Error dhcp6c transmit failed: Network is down
2022-01-17T23:31:07 Error opnsense /usr/local/etc/rc.linkup: DEVD: Ethernet detached event for dynamic wan(igb3)


It looks like somehow a detach and attach is triggered by inducing some load.
Regular traffic (browsing the forum, youtube,...) can work perfectly fine for some time.


I found the following inside the dmesg log. Maybe someone knows if this is important/critical or not:

igb0: link state changed to UP
debugnet_any_ifnet_update: Bad dn_init result from igb0 (ifp 0xfffff8000367a800), ignoring.
igb1: link state changed to UP
debugnet_any_ifnet_update: Bad dn_init result from igb1 (ifp 0xfffff800034ba000), ignoring.
igb3: link state changed to UP
debugnet_any_ifnet_update: Bad dn_init result from igb3 (ifp 0xfffff800038ef800), ignoring.

arpresolve: can't allocate llinfo for 192.168.69.1 on igb3

What helps is removing all DNS Blocklist entries. Afterwards i still get those strange "rc.linkup: DEVD: " detach messages when running the bufferbloat test but the system recovers faster and does not disrupt the regular WAN access so much.

January 19, 2022, 02:40:59 PM #14 Last Edit: January 19, 2022, 03:05:02 PM by MenschAergereDichNicht
Finally i found something that seem to really improve my stability problems.

Before installing the RC1 i also upated the Bios to the latest version (4.15.0.2 at this time).

Because the "dn_init"-messages above made me think about a hardware/bios related problem i tried an older Bios version.

After installing 4.15.0.1 until now the Wan connection is stable.
There are still those strange "DEVD: Ethernet detached event" messages inside the log but it does not interrupt the system (at least not to the extend before the bios downgrade).

Some things that are still "strange"" are:


  • DEVD: Ethernet detached event" messages
  • Running the bufferbloat test inside the browser does not work anymore. With the old firewall up- and download test worked. Currently only the first latency measurement. I re-wrote the rules manually. Maybe there is something different which i am currently not seeing.
  • There are still some packet losses and reconnects when running the speedtests. But it is not causing massive CPU load anymore and the new connection is available very fast.

Takeaway:
For a APU4D4 BIOS version 4.15.0.1 is better than version 4.15.0.2.
And a classic (epic) failure of changing too many things at once.

Todo: Try an older Bios version and check if it works even better.