OPNsense Forum

Archive => 24.1, 24.4 Legacy Series => Topic started by: apunkt on May 21, 2024, 06:03:06 PM

Title: MultiWAN LoadBalancing w/ Starlink stopped working
Post by: apunkt on May 21, 2024, 06:03:06 PM
Hi have MultiWAN configured with one of the WANs being Starlink.

Setup worked for several months, configuration was unchanged.
All of a sudden on Friday morning Starlink WAN is not being used by policy based routing anymore.
Gateway Group is both WAN Tier1 weighted.

Starlink WAN -> OPT1 -> em0

Since then I see this in the logs:

2024-05-21T17:49:23 Notice dhclient dhclient-script: Reason ARPSEND on em0 executing
2024-05-21T17:49:23 Notice dhclient dhclient-script: Reason PREINIT on em0 executing
2024-05-21T17:49:23 Notice dhclient dhclient-script: Reason EXPIRE on em0 executing
2024-05-21T17:44:22 Notice dhclient dhclient-script: Creating resolv.conf
2024-05-21T17:44:22 Notice dhclient dhclient-script: Reason RENEW on em0 executing
2024-05-21T17:39:58 Notice dhclient dhclient-script: Creating resolv.conf
2024-05-21T17:39:58 Notice dhclient dhclient-script: Reason RENEW on em0 executing


like every 5 min

also this every now an then:
2024-05-21T17:49:26 Notice opnsense /usr/local/etc/rc.newwanip: ROUTING: configuring inet default gateway on wan
2024-05-21T17:49:26 Notice opnsense /usr/local/etc/rc.newwanip: ROUTING: entering configure using 'opt1'
2024-05-21T17:49:26 Notice opnsense /usr/local/etc/rc.newwanip: IP renewal starting (new: 100.113.107.143, old: 100.113.107.143, interface: opt1, device: em0, force: yes)
2024-05-21T17:49:26 Notice dhclient dhclient-script: Creating resolv.conf
2024-05-21T17:49:26 Notice dhclient dhclient-script: New Classless Static Routes (em0): 192.168.100.1/32 0.0.0.0 34.120.255.244/32 0.0.0.0 default 100.64.0.1
2024-05-21T17:49:25 Notice dhclient dhclient-script: New Routers (em0): 100.64.0.1
2024-05-21T17:49:25 Notice dhclient dhclient-script: New Broadcast Address (em0): 100.127.255.255
2024-05-21T17:49:25 Notice dhclient dhclient-script: New Subnet Mask (em0): 255.192.0.0
2024-05-21T17:49:25 Notice dhclient dhclient-script: New IP Address (em0): 100.113.107.143
2024-05-21T17:49:25 Notice dhclient dhclient-script: Reason BOUND on em0 executing
2024-05-21T17:49:25 Notice dhclient dhclient-script: Reason ARPCHECK on em0 executing
2024-05-21T17:49:25 Notice kernel <7>arpresolve: can't allocate llinfo for 100.64.0.1 on em0
2024-05-21T17:49:24 Notice kernel <7>arpresolve: can't allocate llinfo for 100.64.0.1 on em0


the gateway is marked correctly as UP ! (so no dpinger issue)
Nothing is being routed this way, unfortunately.

I tried to arp -S 100.64.0.1 MACADDR which then makes it work again as usual for about ~5-50m, but then it stops again.
Also tried static ARP with os-wol, which is not working.
The WAN get's IP from Starlink DHCP Router in bypass mode.
Was working for months on  23.7 (upgraded now to 24.1), then suddenly stopped. Very likely due to one of the frequent Starlink updates, but I am running out of options to make it work again.



Any suggestions/ideas on where to look into?
Highly appreciated.
Title: Re: MultiWAN LoadBalancing w/ Starlink stopped working
Post by: apunkt on May 22, 2024, 10:56:07 AM
So,

it seems only the LoadBalancing in MultiWAN stopped working somehow, because:

I can flag the other WAN Gateway as down -> Traffic fails over to StarlinkWAN -> working as expected.
I can add a fw rule for specific MAC to be routed via StarlinkWAN -> working as expected. also stops working after couple of min.

Title: Re: MultiWAN LoadBalancing w/ Starlink stopped working
Post by: apunkt on May 23, 2024, 06:50:34 PM
I just observed, that since this problem exists, I also see another connectivity problem that occured at the same time.

I am ping checking the non StarlinkWAN from a LAN Host regularily. When this problem started I also see that sometimes, I cannot ping the DSLWAN temporarily from the LAN Host. Gateway is always up, Connectivity is ok.

apunkt@relion1801:~$ ping 192.168.2.2
PING 192.168.2.2 (192.168.2.2) 56(84) Bytes Daten.
^C
--- 192.168.2.2 ping-Statistik ---
2 Pakete übertragen, 0 empfangen, 100% Paketverlust, Zeit 1003ms

apunkt@relion1801:~$ traceroute 192.168.2.2
traceroute to 192.168.2.2 (192.168.2.2), 30 hops max, 60 byte packets
1  fritz.box (192.168.2.2)  0.659 ms  0.997 ms  1.530 ms



After a couple of min I can ping again.
::)



Title: Re: MultiWAN LoadBalancing w/ Starlink stopped working
Post by: apunkt on May 24, 2024, 09:20:00 AM
Did more analysis with exact timing things...

It's breaking when this happens:
024-05-24T08:30:48 Notice kernel <7>arpresolve: can't allocate llinfo for 100.64.0.1 on em0
2024-05-24T08:30:48 Notice kernel <7>arpresolve: can't allocate llinfo for 100.64.0.1 on em0
2024-05-24T08:30:48 Notice kernel <7>arpresolve: can't allocate llinfo for 100.64.0.1 on em0
2024-05-24T08:30:47 Notice kernel <7>arpresolve: can't allocate llinfo for 100.64.0.1 on em0
2024-05-24T08:30:47 Notice kernel <7>arpresolve: can't allocate llinfo for 100.64.0.1 on em0
2024-05-24T08:30:47 Notice kernel <7>arpresolve: can't allocate llinfo for 100.64.0.1 on em0
2024-05-24T08:30:47 Notice dhclient dhclient-script: Reason ARPSEND on em0 executing
2024-05-24T08:30:47 Notice dhclient dhclient-script: Reason PREINIT on em0 executing
2024-05-24T08:30:47 Notice kernel <7>arpresolve: can't allocate llinfo for 100.64.0.1 on em0
2024-05-24T08:30:46 Notice dhclient dhclient-script: Reason EXPIRE on em0 executing


which makes me think that this is somehow related to:
https://github.com/opnsense/core/issues/7191
https://github.com/opnsense/core/issues/7224
even though both issues are closed already.
:-\
Title: Re: MultiWAN LoadBalancing w/ Starlink stopped working
Post by: apunkt on May 26, 2024, 10:06:00 AM
Although my error messages are a little different compared to
https://forum.opnsense.org/index.php?topic=40664.0 (https://forum.opnsense.org/index.php?topic=40664.0)
https://forum.opnsense.org/index.php?topic=38603.msg199209 (https://forum.opnsense.org/index.php?topic=38603.msg199209)
It indeed IS dpinger in combination with StarLink problem. Dpinger on DSL WAN works as expected.

Workaround: deactivate SL gateway monitoring
Title: Re: MultiWAN LoadBalancing w/ Starlink stopped working
Post by: franco on May 26, 2024, 12:54:00 PM
> 2024-05-24T08:30:47   Notice   kernel   <7>arpresolve: can't allocate llinfo for 100.64.0.1 on em0

If you want my technical assessment here and you haven't seen this error with SL before they changed how their addressing works and now your configuration is simply incompatible.


Cheers,
Franco
Title: Re: MultiWAN LoadBalancing w/ Starlink stopped working
Post by: apunkt on May 26, 2024, 04:55:29 PM
Thanks franco for your reply!
Highly appreciated.

RE SL: This was my concern, too. Unfortunately you get no information, about their frequent changes. I however tried to manage from my end with arp -S, which was not successful anyway. The only way to have it working for now is deactivating Gateway monitoring. No other setting works around.
I don't like SL that much, but there is no alternative where I live when you demand more bandwidth.
Title: Re: MultiWAN LoadBalancing w/ Starlink stopped working
Post by: apunkt on June 06, 2024, 09:39:10 AM
Resolved:

Latest Starlink Update fixed the situation. Everything went back to normal, confirming the issue on SL side.