Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - RedVortex

#1
Quote from: dwasifar on August 09, 2024, 04:39:45 PM
I have two networks defined in the UniFi controller, one for the main subnet and another for a VLAN subnet (to isolate IOT devices).

After the 24.7.1 upgrade, nothing on either wi-fi network can reach the internet.  Wired connections are fine.

I can't spare the network downtime to troubleshoot it right now, so I reverted to 24.7 and reloaded the same configuration, and everything works again.  If anyone has any thoughts, it'd be welcome for when I can look at it.

Not sure if related or not but I had a similar issue that was caused by Unbound not able to start anymore. I was caused by my Google Home generating a IPv6 network temporarily during opnsense reboot. Once opnsense had rebooted, I saw a ULA IPv6 address on my Google Home IoT network assigned to opnsense (interface / overview). This happens even though this interface IPv6 configuration is "None". This feels related to SLAAC which is impossible to disable it seems.

For some reason, that prevented Unbound from being able to start (I'm binding Unbound to specific interfaces, not ALL as they recommend). When that happens, there are a few things I can do

- Manually remove the ULA IPv6 in command line from the Interface where my Google homes are (it usually doesn't come back once they have internet access, I suppose they do this to talk to each other temporarily during outage)
- Enable dhcpv6 on the interface, save/apply. Re-disable IPv6 (set it back to none), save/apply. (This makes the IPv6 ULA go away and Unbound is now able to start)
- Remove specific interface binding from Unbound so it binds to everything, for some reason this makes Unbound able to start even with this problem.

This is reproducible every time I reboot opnsense and only happens on my Google Home interface (which is linked to Unifi Access points which have their own SSID for my Google Homes).

Next time you upgrade or reinstall, run ifconfig in command line or check in interfaces/overview to see if you don't have an IPv6 on an interface that shouldn't be there and check is Unbound is running or not. You should have IP address access to everything even without DNS running (to access opnsense UI or command line or even ping 8.8.8.8)

Like I said... Could be related or not to your issue but this is my case since the last few updates and I thought I could share in case it helps.
#2
Quote from: franco on July 27, 2024, 10:06:27 AM
https://github.com/opnsense/core/commit/287c13beb

# opnsense-patch 287c13beb

That seems to have helped for Starlink, problem remains to Hurricane Electric GIF tunnel.

I patched, and rebooted 2 times and both times the dpinger for Starlink was up and monitoring. I removed the patch, rebooted a 3rd time and I saw dpinger started on startup and then stopped and did not restart by itself. I then proceeded to enable manually and it remained up.

I wonder if this could also be related to a problem I started having on 24.1 in the very latest updates. After reboot I see a weird IPv6 assigned to the interface (ULA fd9c:xxxxxxxx) where I have my Google Homes. As if the interface would get itself an IPv6 from somewhere. The IPv6 configuration on the interface is "None" so I would not expect the interface to end up having an IPv6 in any way, ever. I wonder if something like SLAAC is enabled at all times now or something like that even though IPv6 is not enabled on the interface.

I need to manually remove the ULA IPv6 from the interface or put the interface in DHCPv6, save/apply, then go back to None and save/apply to get rid of the IPv6 address on the interface. This situation also prevents Unbound from starting, it remains off until I get rid of this IPv6 or remove the interface from the Unbound list of bounded interfaces.

I know this sounds like outside this thread but since it affects IPv6 only and we're talking about weird SLAAC issues, I prefer to let you know about this as well in case it is related or it helps.

EDIT1: Added info to specify that the weird IPv6 I'm getting in an interface is FD9C:xxx which is a ULA IPv6, probably some device created its own ipv6 network and is broadcasting it. Not sure why OpnSense uses it though since I have IPV6 configured to none on the interface. But this definitely prevents Unbound from starting.

EDIT2: It really seems like it is a RA coming from my Google Home devices or Unifi AP maybe and opnsense picks it up by autoconf. Something must have changed recently (likely kernel) that now autoconf ipv6 even if disabled.
inet6 fd9c:85da:835d:8696:92e2:baff:feb0:efeb prefixlen 64 detached autoconf pltime 1800 vltime 1800
#3
I use 2 providers for IPv6

Starlink (DHCPv6) and Hurricane Electric (GIF tunnel).

After reboot, I have an IPv6 on both interface but gateway monitoring (dpinger) is not enabled on any of the ipv6 interfaces so both interfaces are marked as down. All IPv4 gateways are ok and dpinger is running for them, only IPv6 are affected.

I can manually start dpinger on both gatways and then they get marked as up and dpinger continues to run.

If I reboot, the same situation happens again. It also happens if I go on the interface of Starlink and click save to refresh IPs. Seems dpinger gets disabled while the interface flaps but it never gets re-enabled automatically unless I manually start it.

This is a new behaviour since I upgraded to 24.7, this was working fine in 24.1
#4
This is not a bug or problem, this is basically just a FYI...

Careful, I made the mistake of thinking that this 24.1.4 release note was DHCP option 121 to send static routes to dhcp clients (I need this option before moving to kea) but it is not.

o kea-dhcp: add domain-search, time-servers and static-routes client options to subnet configuration


It's DHCP option 33, which is a single IP to a router IP. You cannot use this to route 192.168.30.0/24 to 192.168.31.1 for instance. You can only route 1 ip to a router, like 192.168.30.12 to 192.168.31.1 for instance.

This is basic and can still serve some purpose for some people but most of us use DHCP option 121 in ISC which is totally different than the less used DHCP option 33 which basically enforce /32 on your static route because you cannot define a subnet/CIDR on the IP/network you pass.

In other words, what we likely want, is the support for this KEA feature in OPNSense (DHCP option 121) which encompass and overrides when it is present (per RFC) option 33. Option 121 also enables you to do exactly the same as option 33 since you can specify /32 on a subnet if you want but the important part is that it allows you to specify subnets, not only individual IP addresses to be routed to a router.

https://gitlab.isc.org/isc-projects/kea/-/merge_requests/2135/diffs

Now, since this has been implemented like that. This will likely create a breaking change when option 121 is implemented unless the upgrade process converts what was specified as ip,router to ip/32,router so that we can keep the same field for the better DHCP option 121 without breaking the kea config for people that started using it already in the option 33 format. Unless someone decides to support both options in the UI and configs and then we would have 2 separate fields for static routes in the kea UI. One for single IPs (DHCP option 33) and another one for networks (DHCP option 121)

I'm very glad that a lot of work is being done on KEA and I'm almost to the point of being able to move away from ISC. Once Option 121 is implemented in OPNSense and that KEA also registers its DHCP leases hosts in Unbound, I'll be good to migrate.

Thanks everyone !
#5
Problem is still present in 24.1.2

Bad state

No ALTQ support in kernel
ALTQ related functions disabled
all icmp 100.79.101.92:33064 -> 1.1.1.1:33064       0:0
   age 08:41:39, expires in 00:00:10, 30734:0 pkts, 891286:0 bytes, rule 104
   id: d928da6500000003 creatorid: d7e1a47d gateway: 192.168.100.1
   origif: igb0


Killing it

root@opnsense:~ # pfctl -k id -k d928da6500000003
killed 1 states


State is now back to what it should and gateway is now recovering

root@opnsense:~ # pfctl -ss -vvv | grep "1\.1\.1\.1" -A 3
No ALTQ support in kernel
ALTQ related functions disabled
all icmp 100.79.101.92:33064 -> 1.1.1.1:33064       0:0
   age 00:00:05, expires in 00:00:09, 5:5 pkts, 145:145 bytes, rule 104
   id: 7698db6500000002 creatorid: d7e1a47d gateway: 100.64.0.1
   origif: igb0
#6
24.1, 24.4 Legacy Series / Re: KEA DHCP
February 22, 2024, 05:41:20 PM
Quote from: cprsn on February 22, 2024, 04:45:02 PM

It seems to me this is still an unresolved issue.  I have disabled ISC on all but one interface and migrated the rest to Kea.  For this to work, I found I had to stop ISC entirely, restart Kea, then restart ISC.  Otherwise, the Kea log reports "Address already in use - is another DHCP server running?" errors.  If I then have to reboot opnsense (e.g. after firmware updates), it seems ISC will start before Kea and I will not have DHCP servers active on any of the interfaces except the one that I still have on ISC (Kea will report "address already in use" for the other interfaces).

Is the intent for now to support running ISC on some interfaces and Kea on others or are uses expected to migrate all interfaces to Kea?

This is my experience, it is impossible to run both. franco also confirmed this earlier. ISC gets a hold of all interfaces and prevents KEA from binding to it, as you saw.

In my case, kea was missing too many features that I need before migrating (dhcp custom options for additional routes and also unbound DNS registration) which I rely on heavily thus preventing me from migrating the subnets that I could right away and keep the others on ISC.

For now, it's unfortunately all or nothing, not because of kea, but because ISC bind to all IPs as the output for sockstat shows and from what I read on ISC, it seems to be by design.

Kea however worked well in my case when I tested but unfortunately is missing too many things for me to migrate, yet.
#7
Quote from: axsdenied on February 14, 2024, 06:44:42 PM
I don't have Starlink so I don't have firsthand experience, but out of curiosity, when the Starlink network is up and everything is working with a Starlink network IP, can you still access the dish via the 192.168.100.x network?

Yes, the dish still keeps this IP but the DHCP IP that it will hand you will not be in this range anymore when it gets a SL IP properly.

opnsense usually handles this properly because SL still sends this IP range in the DHCP options (Classless-Static-Route Option 121) that says other networks that can be reached through it and it includes this range (and some other public IPs too I guess for they services in AWS through them).

Here's a DHCP reply when the SL dish is connected to the SL network.

You can see in the dhcp reply default gateway being 100.64.0.1 (which is when SL is UP). The SL dish still uses 192.168.100.1 and in fact, when you use the SL app to manage the antenna, it connects to this IP.

13:08:23.480240 xx:xx:xx:xx:xx:xx > xx:xx:xx:xx:xx:xx, ethertype IPv4 (0x0800), length 350: (tos 0x0, ttl 64, id 49857, offset 0, flags [DF], proto UDP (17), length 336)
    100.64.0.1.67 > 100.79.101.92.68: [no cksum] BOOTP/DHCP, Reply, length 308, xid 0x12b7a4ac, Flags [none] (0x0000)
  Your-IP 100.79.101.92
  Server-IP 10.10.10.10
  Gateway-IP 192.168.100.100
  Client-Ethernet-Address xx:xx:xx:xx:xx:xx
  Vendor-rfc1048 Extensions
    Magic Cookie 0x63825363
    DHCP-Message Option 53, length 1: ACK
    Subnet-Mask Option 1, length 4: 255.192.0.0
    Server-ID Option 54, length 4: 100.64.0.1
    Default-Gateway Option 3, length 4: 100.64.0.1
    Lease-Time Option 51, length 4: 300
    Domain-Name-Server Option 6, length 8: 1.1.1.1,8.8.8.8
    Classless-Static-Route Option 121, length 23: (192.168.100.1/32:0.0.0.0),(34.120.255.244/32:0.0.0.0),(default:100.64.0.1)
    MTU Option 26, length 2: 1500
    END Option 255, length 0
    PAD Option 0, length 0


If I put the SL dish in stow mode (flipped down to not talk to satellites, or when SL is down, maintenance, whatever) the DHCP reply becomes this. The GW is .1 and it gives me .100 in the 192.168.100.0/24 range

13:11:42.957591 xx:xx:xx:xx:xx:xx > xx:xx:xx:xx:xx:xx, ethertype IPv4 (0x0800), length 320: (tos 0x0, ttl 255, id 0, offset 0, flags [none], proto UDP (17), length 306)
    192.168.100.1.67 > 192.168.100.100.68: [no cksum] BOOTP/DHCP, Reply, length 278, xid 0xae69f181, Flags [none] (0x0000)
  Your-IP 192.168.100.100
  Client-Ethernet-Address xx:xx:xx:xx:xx:xx
  Vendor-rfc1048 Extensions
    Magic Cookie 0x63825363
    DHCP-Message Option 53, length 1: ACK
    Subnet-Mask Option 1, length 4: 255.255.255.0
    Server-ID Option 54, length 4: 192.168.100.1
    Default-Gateway Option 3, length 4: 192.168.100.1
    Lease-Time Option 51, length 4: 5
    Domain-Name-Server Option 6, length 4: 192.168.100.1
    MTU Option 26, length 2: 1500
    END Option 255, length 0


And now I have the same problem, the gateway is now marked as down even though SL is back up.

It's weird because for a few seconds when SL comes back up. I see 2 states, one of which would be the right one but it ends up disappearing and the bad state remains

root@opnsense:~ # pfctl -ss -vvv | grep "1\.1\.1\.1" -A 3
No ALTQ support in kernel
ALTQ related functions disabled
all icmp 100.79.101.92:59388 -> 1.1.1.1:59388       0:0
   age 00:02:54, expires in 00:00:09, 171:0 pkts, 4959:0 bytes, rule 104
   id: 9512dc6500000002 creatorid: 5f0e2da3 gateway: 192.168.100.1
   origif: igb0
--
all icmp 100.79.101.92:63965 (192.168.22.14:14148) -> 1.1.1.1:63965       0:0
   age 00:00:11, expires in 00:00:00, 2:2 pkts, 168:168 bytes, rule 104
   id: e113dc6500000002 creatorid: 5f0e2da3 gateway: 100.64.0.1
   origif: igb0


And after a few seconds... The bad one remains and the gateway remains marked as down

root@opnsense:~ # pfctl -ss -vvv | grep "1\.1\.1\.1" -A 3
No ALTQ support in kernel
ALTQ related functions disabled
all icmp 100.79.101.92:59388 -> 1.1.1.1:59388       0:0
   age 00:05:58, expires in 00:00:09, 353:0 pkts, 10237:0 bytes, rule 104
   id: 9512dc6500000002 creatorid: 5f0e2da3 gateway: 192.168.100.1
   origif: igb0


and dpinger is configured to use the right interface (100.64.0.1) but doesn't work likely because of the bad state

root@opnsense:~ # pluginctl -r host_routes
{
    "core": {
        "8.8.8.8": null,
        "8.8.4.4": null
    },
    "dpinger": {
        "8.8.4.4": "10.50.45.70",
        "1.1.1.1": "100.64.0.1",
        "2001:4860:4860::8844": "fe80::200:xxxx:xxxx:xxx%igb0",
        "149.112.112.112": "192.168.2.1",
        "192.168.170.2": "192.168.170.2",
        "192.168.171.2": "192.168.171.2",
        "2620:fe::9": "2001:470:xx:x::x"
    }
}


While SL was down, dpinger updated itself to use the DISH IP properly, so it seems dpinger is doing his job but something else with the states is not working well

Here's how it looks when SL is down

root@opnsense:~ # pluginctl -r host_routes
{
    "core": {
        "8.8.8.8": null,
        "8.8.4.4": null
    },
    "dpinger": {
        "8.8.4.4": "10.50.45.70",
        "1.1.1.1": "192.168.100.1",
        "2001:4860:4860::8844": "fe80::200:xxxx:xxxx:xxx%igb0",
        "149.112.112.112": "192.168.2.1",
        "192.168.170.2": "192.168.170.2",
        "192.168.171.2": "192.168.171.2",
        "2620:fe::9": "2001:470:xx:x::x"
    }
}
#8
Quote from: axsdenied on February 13, 2024, 04:37:16 PM
After your reply I re-read your post.  I actually block DCHP leases from 192.168.100.1 on the WAN interface so that the modem can't temporarily assign an address from that block to opnsense.

If you don't you could also have issues like what you're describing as it's technically a valid network config, it just can't route anywhere and sometimes when the real network is available it doesn't swap.

I do block it on other interfaces than Starlink. The reason I keep it enable on Starlink is to be able to access the dish in case there is an issue like snow on the dish, firmware going bad, ability to access the antenna when it is stowed. In all those cases, the dish falls back on its 192.168.100.1 IP and that's the only way to access it. As soon as it comes back up, it re-issues an IP in the Starlink network. When that happens, I expect the state to be cleared and/or the dpinger to be reloaded/restarted which should also clear the state.

But yes, as as workaround I could block those or even do a cronjob that flushes the state every now and then when it finds it is stucked on 192.168.100.1 or something... But in theory the dhcp, interfaces, gateways scripts should all automatically handles this. It was working fine in 23.x when it was fixed (it was buggy at some point in 22.x or early 23.x, I can't remember exactly when it started to happen but it was around the time the devs were working on the scripts that handle gateways, interfaces, etc...).

Thanks for the idea, I may give it a try if not bugfix is made soon. I did not open a new one since this is regression but maybe I should...  :-\

Most if this was discussed, patched and all in this other thread: https://forum.opnsense.org/index.php?topic=33831.0
#9
Quote from: axsdenied on February 06, 2024, 08:36:11 PM
I used to have this issue in the past as well but hasn't been a problem in a bit.  Currently still on 23.7.12_5 as I was waiting for a few patches before upgrading.  However, if this issue is now back in 24.x I'll be waiting a bit longer :)

Yeah... This is definitely a regression. Almost every day I need to reset the state or the gateway, like this so the state goes back to the right gateway instead of being stucked on the Starlink temporary IP/gateway when it reboots or updates itself. The temporary gateway on which it gets stuck is: 192.168.100.1 but the gateway once it is really up is: 100.64.0.1.

Killing the state, resets it properly.

It is likely something that happens (or doesn't happen in this case) during the interface flap and/or the DHCP address issuance by Starlink to opnsense so the states never reset to the new gateway...

Bad state (My gateway monitoring is configured to ping 1.1.1.1 on Starlink)

root@opnsense:~ # pfctl -ss -vvv | grep "1\.1\.1\.1" -A 3
No ALTQ support in kernel
ALTQ related functions disabled
all icmp 100.79.101.92:28961 -> 1.1.1.1:28961       0:0
   age 08:33:39, expires in 00:00:10, 30306:0 pkts, 878874:0 bytes, rule 102
   id: ec7de16500000001 creatorid: 5f0e2da3 gateway: 192.168.100.1
   origif: igb0


Killing the bad state

root@opnsense:~ # pfctl -k id -k ec7de16500000001
killed 1 states


The right state after killing the bad one. Gateway is now marked as up.


root@opnsense:~ # pfctl -ss -vvv | grep "1\.1\.1\.1" -A 3
No ALTQ support in kernel
ALTQ related functions disabled
all icmp 100.79.101.92:28961 -> 1.1.1.1:28961       0:0
   age 00:00:02, expires in 00:00:10, 3:3 pkts, 87:87 bytes, rule 104
   id: 3564d96500000002 creatorid: 5f0e2da3 gateway: 100.64.0.1
   origif: igb0
#10
Same situation this morning

root@opnsense:~ # pluginctl -r host_routes
{
    "core": {
        "8.8.8.8": null,
        "8.8.4.4": null
    },
    "dpinger": {
        "8.8.4.4": "10.50.45.70",
        "1.1.1.1": "100.64.0.1",
        "2001:4860:4860::8844": "fe80::200:xxxx:xxxx:xxx%igb0",
        "149.112.112.112": "192.168.2.1",
        "192.168.170.2": "192.168.170.2",
        "192.168.171.2": "192.168.171.2",
        "2620:fe::9": "2001:470:xx:x::x"
    }
}


root@opnsense:~ # pfctl -ss -vvv | grep "1\.1\.1\.1" -A 3
No ALTQ support in kernel
ALTQ related functions disabled
all icmp 100.79.101.92:47956 -> 1.1.1.1:47956       0:0
   age 08:02:44, expires in 00:00:09, 28494:0 pkts, 826326:0 bytes, rule 102
   id: ba64cd6500000000 creatorid: 5f0e2da3 gateway: 192.168.100.1
   origif: igb0


After killing the state, dpinger now sees the state as up (I did not restart/reload dpinger, I only cleared the state above)

root@opnsense:~ # pfctl -k id -k ba64cd6500000000
killed 1 states

root@opnsense:~ # pfctl -ss -vvv | grep "1\.1\.1\.1" -A 3
No ALTQ support in kernel
ALTQ related functions disabled
all icmp 100.79.101.92:47956 -> 1.1.1.1:47956       0:0
   age 00:00:17, expires in 00:00:09, 17:17 pkts, 493:493 bytes, rule 104
   id: 7168ce6500000000 creatorid: 5f0e2da3 gateway: 100.64.0.1
   origif: igb0


<165>1 2024-02-06T01:36:14-05:00 opnsense dpinger 53072 - [meta sequenceId="75"] ALERT: STARLINK_DHCP (Addr: 1.1.1.1 Alarm: loss -> down RTT: 0.0 ms RTTd: 0.0 ms Loss: 100.0 %)
<12>1 2024-02-06T01:37:03-05:00 opnsense dpinger 4447 - [meta sequenceId="76"] exiting on signal 15
<12>1 2024-02-06T01:37:03-05:00 opnsense dpinger 47956 - [meta sequenceId="77"] send_interval 1000ms  loss_interval 4000ms  time_period 60000ms  report_interval 0ms  data_len 1  alert_interval 1000ms  latency_alarm 0ms  loss_alarm 0%  alarm_hold 10000ms  dest_addr 1.1.1.1  bind_addr 100.79.101.92  identifier "STARLINK_DHCP "
<165>1 2024-02-06T01:37:03-05:00 opnsense dpinger 53072 - [meta sequenceId="78"] Reloaded gateway watcher configuration on SIGHUP
<165>1 2024-02-06T01:37:21-05:00 opnsense dpinger 53072 - [meta sequenceId="79"] Reloaded gateway watcher configuration on SIGHUP
<12>1 2024-02-06T01:38:19-05:00 opnsense dpinger 35161 - [meta sequenceId="80"] send_interval 1000ms  loss_interval 4000ms  time_period 60000ms  report_interval 0ms  data_len 1  alert_interval 1000ms  latency_alarm 0ms  loss_alarm 0%  alarm_hold 10000ms  dest_addr 2001:4860:4860::8844  bind_addr 2605:59c8:2300:98f9:xxxx:xxxx:xxxx:xxxx  identifier "STARLINK_DHCP6 "
<165>1 2024-02-06T01:38:19-05:00 opnsense dpinger 53072 - [meta sequenceId="81"] Reloaded gateway watcher configuration on SIGHUP
<165>1 2024-02-06T01:38:20-05:00 opnsense dpinger 53072 - [meta sequenceId="82"] ALERT: STARLINK_DHCP6 (Addr: 2001:4860:4860::8844 Alarm: down -> none RTT: 51.3 ms RTTd: 3.9 ms Loss: 0.0 %)
<165>1 2024-02-06T09:41:27-05:00 opnsense dpinger 53072 - [meta sequenceId="1"] ALERT: STARLINK_DHCP (Addr: 1.1.1.1 Alarm: down -> loss RTT: 30.7 ms RTTd: 3.7 ms Loss: 75.0 %)
<165>1 2024-02-06T09:41:57-05:00 opnsense dpinger 53072 - [meta sequenceId="2"] ALERT: STARLINK_DHCP (Addr: 1.1.1.1 Alarm: loss -> none RTT: 30.6 ms RTTd: 5.4 ms Loss: 25.0 %)

#11
Quote from: newsense on February 02, 2024, 07:30:12 PM
Quote from: franco on February 01, 2024, 05:19:56 PM
Quote from: bimbar on February 01, 2024, 10:37:37 AM
DHCPd opens a raw interface on all network interfaces. I don't think it is possible (at least with ISC DHCPd) to use two different DHCP daemons on one host simultaneously.

Correct for ISC-DHCP.

As previously stated, ISC-DHCP and KEA can run in parallel on different interfaces. I've done the transition on production systems with no downtime - as follows:


1) Create Subnet and Reservations for VLAN X in Kea

2) Go to ISC DHCP and disable it on VLAN X -- leaving it running on the other VLANs

3) Go to Kea and enable VLAN X in Settings

4) Validate and continue with the next VLAN in scope were Kea can run without missing any ISC functionality


QED :)

Unfortunately this isn't true. You were simply lucky that your dhcp leases continued to work while you transition.

KEA and ISC cannot coexists. ISC can only bind to *:67. While that is happening either you're unable to start KEA (it will show as green but will not run in reality) of if you are able to start both (you need to start KEA first and then ISC), they will start conflicting and you will not be able to reload/restart KEA after ISC has started anyways.

Here's what you'll get if you are able to run both at the same time

root@opnsense:~ # sockstat -4l -p 67
USER     COMMAND    PID   FD PROTO  LOCAL ADDRESS         FOREIGN ADDRESS
dhcpd    dhcpd      61078 13 udp4   *:67                  *:*
root     kea-dhcp4  964   14 udp4   192.168.22.1:67       *:*
root     kea-dhcp4  964   16 udp4   192.168.42.1:67       *:*
root     kea-dhcp4  964   18 udp4   192.168.62.1:67       *:*
root     kea-dhcp4  964   20 udp4   192.168.63.1:67       *:*


This will prevent both from working properly.

And if you look into your KEA logs, even if the process shows as green, in reality it is not working and you'll see this, for each interface you are trying to start in KEA, even if you disabled it first in ISC.

WARN [kea-dhcp4.dhcpsrv.0x833712000] DHCPSRV_OPEN_SOCKET_FAIL failed to open socket: Failed to open socket on interface ix1_vlan630, reason: failed to bind fallback socket to address 192.168.63.1, port 67, reason: Address already in use - is another DHCP server running?
#12
24.1, 24.4 Legacy Series / Re: KEA DHCP
February 05, 2024, 03:46:25 AM
Here's the situation you have even if you are able to start both services

root@opnsense:~ # sockstat -4l -p 67
USER     COMMAND    PID   FD PROTO  LOCAL ADDRESS         FOREIGN ADDRESS
dhcpd    dhcpd      61078 13 udp4   *:67                  *:*
root     kea-dhcp4  964   14 udp4   192.168.22.1:67       *:*
root     kea-dhcp4  964   16 udp4   192.168.42.1:67       *:*
root     kea-dhcp4  964   18 udp4   192.168.62.1:67       *:*
root     kea-dhcp4  964   20 udp4   192.168.63.1:67       *:*


As you can see, this will not work. ISC always bind to *:67 whatever you do. They cannot co-exists.
#13
24.1, 24.4 Legacy Series / Re: KEA DHCP
February 05, 2024, 03:34:12 AM
Quote from: newsense on February 05, 2024, 12:12:31 AM
!!! One server per interface, there's no way around it !!!

If you want to run KEA on an interface you need to disable ISC DHCP first on that interface

FWIW, I tried running KEA and ISC side-by-side, each on his own interface and I wasn't able to.

ISC bind to 0.0.0.0:67 whatever you do and that prevents KEA from starting. If you start KEA first, ISC will not complain and start anyways but it will take precedence and KEA will stop working and also not be able to restart once ISC is started.

From my tests, either you switch everything or you don't, I wasn't able to run them both properly. Even if you are able to run both, you'll conflict because ISC runs on 0.0.0.0:67.

https://forum.opnsense.org/index.php?topic=38215.msg187335#msg187335

https://forum.opnsense.org/index.php?topic=38215.msg188537#msg188537
#14
Quote from: Jetro on November 14, 2023, 11:01:43 PM
Same problem on 23.7.8.
I have 4 Gateways (FTTH, FTTC, FWA, SAT) and Starlink is the only one presenting this problem.

For me the problem was not happening since the last 23.1.x patches on Starlink but started to appear again in 24.1-rc1 and still ongoing on 24.1 final

Here's the link to the issue in the 24.1 forum if you feel like troubleshooting it with us: https://forum.opnsense.org/index.php?topic=38603.0
#15
@franco FYI since we worked on this together last time including @xaxero which I suspect might be affected as well too since he was also using multi-wan and Starlink like me.

Seems a regression or similar of this old issue that was fixed until 24.1

https://forum.opnsense.org/index.php?topic=33831.msg163808#msg163808

After Starlink updates itself during the night, the gateway sometimes gets flagged as down and never comes back up by itself even though it is up and working

When Starlink goes down, it temporarily also assigns itself an ip of 192.168.100.1/24 range (and gives opnsense .100) and then when it comes back up it goes back on his normal IPs or 100.64.x.x

For some reasons, dpinger has a hard adjusting itself when that happens. Not sure if it is begin restarted properly on the gateway change or the temporarily network flap. It seems to remain stucked on the 192.168.100.1 gateway

You can also wee that the state of the dpinger process is kinda stuck to the 192.168.100.1 IP and that if I manually clear the state it will then change to a right gateway but that's not enough to bring the IP UP I need to reload dpinger.

root@opnsense:~ # pfctl -ss -vvv | grep "1\.1\.1\.1" -A 3
No ALTQ support in kernel
ALTQ related functions disabled
all icmp 100.79.101.92:924 -> 1.1.1.1:924       0:0
   age 14:00:37, expires in 00:00:09, 49623:0 pkts, 1439067:0 bytes, rule 102
   id: adb0c36500000002 creatorid: 5f0e2da3 gateway: 192.168.100.1
   origif: igb0


root@opnsense:~ # pfctl -k id -k adb0c36500000002
killed 1 states



root@opnsense:~ # pfctl -ss -vvv | grep "1\.1\.1\.1" -A 3
No ALTQ support in kernel
ALTQ related functions disabled
all icmp 100.79.101.92:924 -> 1.1.1.1:924       0:0
   age 00:00:06, expires in 00:00:09, 6:6 pkts, 174:174 bytes, rule 104
   id: 7518c56500000002 creatorid: 5f0e2da3 gateway: 100.64.0.1
   origif: igb0


igb0 is my Starlink interface, you can see before the state clear that packet are going out but not coming back

root@opnsense:~ # tcpdump -i igb0 icmp and host 1.1.1.1 -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on igb0, link-type EN10MB (Ethernet), capture size 262144 bytes
15:45:36.973765 IP 100.79.101.92 > 1.1.1.1: ICMP echo request, id 924, seq 49523, length 9
15:45:37.974749 IP 100.79.101.92 > 1.1.1.1: ICMP echo request, id 924, seq 49524, length 9
15:45:38.994204 IP 100.79.101.92 > 1.1.1.1: ICMP echo request, id 924, seq 49525, length 9
15:45:40.033544 IP 100.79.101.92 > 1.1.1.1: ICMP echo request, id 924, seq 49526, length 9


After I clear the state, they are coming back to normal now.

root@opnsense:~ # tcpdump -i igb0 icmp and host 1.1.1.1 -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on igb0, link-type EN10MB (Ethernet), capture size 262144 bytes
15:55:44.088583 IP 100.79.101.92 > 1.1.1.1: ICMP echo request, id 924, seq 50121, length 9
15:55:44.125365 IP 1.1.1.1 > 100.79.101.92: ICMP echo reply, id 924, seq 50121, length 9
15:55:45.092572 IP 100.79.101.92 > 1.1.1.1: ICMP echo request, id 924, seq 50122, length 9
15:55:45.146659 IP 1.1.1.1 > 100.79.101.92: ICMP echo reply, id 924, seq 50122, length 9
15:55:46.139495 IP 100.79.101.92 > 1.1.1.1: ICMP echo request, id 924, seq 50123, length 9
15:55:46.191996 IP 1.1.1.1 > 100.79.101.92: ICMP echo reply, id 924, seq 50123, length 9
15:55:47.157407 IP 100.79.101.92 > 1.1.1.1: ICMP echo request, id 924, seq 50124, length 9
15:55:47.210673 IP 1.1.1.1 > 100.79.101.92: ICMP echo reply, id 924, seq 50124, length 9


Logs of gateway/dpinger

root@opnsense:~ # tail -100 /var/log/gateways/latest.log | grep -v DHCP6
<12>1 2024-02-04T01:45:05-05:00 opnsense dpinger 57453 - [meta sequenceId="1"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:06-05:00 opnsense dpinger 57453 - [meta sequenceId="3"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:07-05:00 opnsense dpinger 57453 - [meta sequenceId="5"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:08-05:00 opnsense dpinger 57453 - [meta sequenceId="7"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:09-05:00 opnsense dpinger 57453 - [meta sequenceId="9"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:10-05:00 opnsense dpinger 57453 - [meta sequenceId="11"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:11-05:00 opnsense dpinger 57453 - [meta sequenceId="13"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:12-05:00 opnsense dpinger 57453 - [meta sequenceId="15"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:13-05:00 opnsense dpinger 57453 - [meta sequenceId="17"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:14-05:00 opnsense dpinger 57453 - [meta sequenceId="19"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:15-05:00 opnsense dpinger 57453 - [meta sequenceId="21"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:16-05:00 opnsense dpinger 57453 - [meta sequenceId="24"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:17-05:00 opnsense dpinger 57453 - [meta sequenceId="26"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:18-05:00 opnsense dpinger 57453 - [meta sequenceId="28"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:19-05:00 opnsense dpinger 57453 - [meta sequenceId="30"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:20-05:00 opnsense dpinger 57453 - [meta sequenceId="32"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:21-05:00 opnsense dpinger 57453 - [meta sequenceId="35"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:22-05:00 opnsense dpinger 57453 - [meta sequenceId="37"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:23-05:00 opnsense dpinger 57453 - [meta sequenceId="39"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:24-05:00 opnsense dpinger 57453 - [meta sequenceId="41"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:25-05:00 opnsense dpinger 57453 - [meta sequenceId="43"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:26-05:00 opnsense dpinger 57453 - [meta sequenceId="45"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:27-05:00 opnsense dpinger 57453 - [meta sequenceId="47"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:28-05:00 opnsense dpinger 57453 - [meta sequenceId="48"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:29-05:00 opnsense dpinger 57453 - [meta sequenceId="49"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:30-05:00 opnsense dpinger 57453 - [meta sequenceId="50"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:31-05:00 opnsense dpinger 57453 - [meta sequenceId="51"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<165>1 2024-02-04T01:45:31-05:00 opnsense dpinger 53072 - [meta sequenceId="52"] ALERT: STARLINK_DHCP (Addr: 1.1.1.1 Alarm: none -> loss RTT: 43.7 ms RTTd: 9.1 ms Loss: 42.0 %)
<12>1 2024-02-04T01:45:32-05:00 opnsense dpinger 57453 - [meta sequenceId="53"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:33-05:00 opnsense dpinger 57453 - [meta sequenceId="54"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:34-05:00 opnsense dpinger 57453 - [meta sequenceId="55"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:35-05:00 opnsense dpinger 57453 - [meta sequenceId="56"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:36-05:00 opnsense dpinger 57453 - [meta sequenceId="57"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:37-05:00 opnsense dpinger 57453 - [meta sequenceId="58"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:38-05:00 opnsense dpinger 57453 - [meta sequenceId="59"] STARLINK_DHCP 1.1.1.1: sendto error: 22
<12>1 2024-02-04T01:45:38-05:00 opnsense dpinger 57453 - [meta sequenceId="60"] exiting on signal 15
<12>1 2024-02-04T01:45:38-05:00 opnsense dpinger 1844 - [meta sequenceId="61"] send_interval 1000ms  loss_interval 4000ms  time_period 60000ms  report_interval 0ms  data_len 1  alert_interval 1000ms  latency_alarm 0ms  loss_alarm 0%  alarm_hold 10000ms  dest_addr 1.1.1.1  bind_addr 192.168.100.100  identifier "STARLINK_DHCP "
<12>1 2024-02-04T01:45:38-05:00 opnsense dpinger 59623 - [meta sequenceId="62"] exiting on signal 15
<165>1 2024-02-04T01:45:38-05:00 opnsense dpinger 53072 - [meta sequenceId="63"] Reloaded gateway watcher configuration on SIGHUP
<12>1 2024-02-04T01:45:38-05:00 opnsense dpinger 1844 - [meta sequenceId="64"] exiting on signal 15
<165>1 2024-02-04T01:45:38-05:00 opnsense dpinger 53072 - [meta sequenceId="65"] Reloaded gateway watcher configuration on SIGHUP
<12>1 2024-02-04T01:45:38-05:00 opnsense dpinger 11941 - [meta sequenceId="66"] send_interval 1000ms  loss_interval 4000ms  time_period 60000ms  report_interval 0ms  data_len 1  alert_interval 1000ms  latency_alarm 0ms  loss_alarm 0%  alarm_hold 10000ms  dest_addr 1.1.1.1  bind_addr 192.168.100.100  identifier "STARLINK_DHCP "
<165>1 2024-02-04T01:45:39-05:00 opnsense dpinger 53072 - [meta sequenceId="67"] Reloaded gateway watcher configuration on SIGHUP
<165>1 2024-02-04T01:45:43-05:00 opnsense dpinger 53072 - [meta sequenceId="68"] ALERT: STARLINK_DHCP (Addr: 1.1.1.1 Alarm: loss -> down RTT: 0.0 ms RTTd: 0.0 ms Loss: 100.0 %)
<12>1 2024-02-04T01:46:41-05:00 opnsense dpinger 11941 - [meta sequenceId="69"] exiting on signal 15
<12>1 2024-02-04T01:46:41-05:00 opnsense dpinger 66460 - [meta sequenceId="70"] send_interval 1000ms  loss_interval 4000ms  time_period 60000ms  report_interval 0ms  data_len 1  alert_interval 1000ms  latency_alarm 0ms  loss_alarm 0%  alarm_hold 10000ms  dest_addr 1.1.1.1  bind_addr 100.79.101.92  identifier "STARLINK_DHCP "
<165>1 2024-02-04T01:46:41-05:00 opnsense dpinger 53072 - [meta sequenceId="71"] Reloaded gateway watcher configuration on SIGHUP
<165>1 2024-02-04T01:46:44-05:00 opnsense dpinger 53072 - [meta sequenceId="73"] Reloaded gateway watcher configuration on SIGHUP
<12>1 2024-02-04T15:38:41-05:00 opnsense dpinger 36972 - [meta sequenceId="1"] send_interval 1000ms  loss_interval 4000ms  time_period 60000ms  report_interval 0ms  data_len 1  alert_interval 1000ms  latency_alarm 0ms  loss_alarm 0%  alarm_hold 10000ms  dest_addr 1.1.1.1  bind_addr 100.79.101.92  identifier "STARLINK_DHCP "
<165>1 2024-02-04T15:38:43-05:00 opnsense dpinger 53072 - [meta sequenceId="2"] ALERT: STARLINK_DHCP (Addr: 1.1.1.1 Alarm: down -> none RTT: 52.6 ms RTTd: 8.4 ms Loss: 0.0 %)
<12>1 2024-02-04T15:38:44-05:00 opnsense dpinger 36972 - [meta sequenceId="3"] exiting on signal 2


The process before I reload it

root    66460   0.0  0.0  13340  2508  -  Is   01:46       0:02.54 /usr/local/bin/dpinger -f -S -r 0 -i STARLINK_DHCP -B 100.79.101.92 -p /var/run/dpinger_STARLINK_DHCP.pid -u /var/run/dpinger_STARLINK_DHCP.sock -s 1s -l 4s -t 60s -d 1 1.1.1.1

And after I reload it and it marks the interface as up now

root@opnsense:~ # ps aux | grep LINK_DHCP\
root    91462   0.0  0.0  13340  2512  -  Is   16:07       0:00.03 /usr/local/bin/dpinger -f -S -r 0 -i STARLINK_DHCP -B 100.79.101.92 -p /var/run/dpinger_STARLINK_DHCP.pid -u /var/run/dpinger_STARLINK_DHCP.sock -s 1s -l 4s -t 60s -d 1 1.1.1.1


They are the same...

And state after dpinger reload is still normal like the one after I manually forced state kill to reset it

root@opnsense:~ # pfctl -ss -vvv | grep "1\.1\.1\.1" -A 3
No ALTQ support in kernel
ALTQ related functions disabled
all icmp 100.79.101.92:25926 -> 1.1.1.1:25926       0:0
   age 00:01:53, expires in 00:00:10, 112:112 pkts, 3248:3248 bytes, rule 104
   id: ba21c56500000002 creatorid: 5f0e2da3 gateway: 100.64.0.1
   origif: igb0


Seems like the state isn't cleared properly and/or dpginger isn't resetting properly after interface flag or gateway change.

I forgot to take the output during the outage of

pluginctl -r host_routes

But here it is after everything is good. Next outage I'll take it before fixing it.

root@opnsense:~ # pluginctl -r host_routes
{
    "core": {
        "8.8.8.8": null,
        "8.8.4.4": null
    },
    "dpinger": {
        "8.8.4.4": "10.50.45.70",
        "1.1.1.1": "100.64.0.1",
        "2001:4860:4860::8844": "fe80::200:xxxx:xxxx:xxx%igb0",
        "149.112.112.112": "192.168.2.1",
        "192.168.170.2": "192.168.170.2",
        "192.168.171.2": "192.168.171.2",
        "2620:fe::9": "2001:470:xx:x::x
    }
}