Dear Forum members,
I am facing DHCP relay issues since updated to 24.1.6 - in earlier versions all worked as expected and I observed no issues.
Currently it seems that the DHCP relays suddenly stop to work and the clients are not longer able to receive DHCP addresses or renew the lease. Until lease expiry, the clients work as expected, but when the has expired, they are disconnected.
Also I am observing "BAD_ADDRESS" entries in the DHCP server, which I didn't notice before.
In some cases a restart of the DHCP relays solves the issue for some time, but not in every case.
Any suggestions if it is possible to log the activity of the relays? I did not find a regarding log yet.
I am using three DHCP relays to forward the DHCP requests of three subnets to one central DHCP server.
OPNsense is vitualized based on ESXi hypervisor - each OPNsense subnet uses its own virtual NIC provided by ESXi.
Any hints, how to enclose the issue are highly appreciated.
Many thanks in advance!
Best regards,
WM54
As a follow up, I am also experiencing issues with DHCP relay since the upgrade to 24.1.6. Previously, DHCP relay was working.
This is the current status:
- I am not using Hyper-V
- Everything is on physical devices.
- DHCP guard is not enabled.
- As a workaround, I have enabled DHCPv4 on Opnsense
Thank you.
For some vague reason it sounds like a state issue to me, but that's easier said than proven.
How long does the instance keep working on average?
Cheers,
Franco
I am seeing the same issue. DHCPrelay worked fine on previous versions but is not working in 24.1.6. I use DHCPrelay to forward all dhcp requests to a single instance of Technitium DNS/DHCP server. The logs show that I am receiving the DHCP request and offering a valid IP, but clients keep asking every few seconds.
Seems to only be a problem for DHCP renewals as OP mentioned.
[2024-05-02 14:20:44 Local] [10.99.200.1:67] DHCP Server offered IP address [10.99.200.79] to [40-3F-8C-XX-XX-XX] for scope: IOT
[2024-05-02 14:20:44 Local] [10.99.200.1:67] DHCP Server offered IP address [10.99.200.11] to [54-AF-97-XX-XX-XX] for scope: IOT
[2024-05-02 14:20:45 Local] [10.99.200.1:67] DHCP Server offered IP address [10.99.200.35] to [E4-C3-2A-XX-XX-XX] for scope: IOT
[2024-05-02 14:20:45 Local] [10.99.200.1:67] DHCP Server offered IP address [10.99.200.38] to [A4-2B-B0-XX-XX-XX] for scope: IOT
[2024-05-02 14:20:45 Local] [10.99.200.1:67] DHCP Server offered IP address [10.99.200.6] to [40-3F-8C-XX-XX-XX] for scope: IOT
[2024-05-02 14:20:45 Local] [10.99.200.1:67] DHCP Server offered IP address [10.99.200.28] to [14-EB-B6-XX-XX-XX] for scope: IOT
[2024-05-02 14:20:46 Local] [10.99.200.1:67] DHCP Server offered IP address [10.99.200.36] to [E4-C3-2A-XX-XX-XX] for scope: IOT
[2024-05-02 14:20:46 Local] [10.99.200.1:67] DHCP Server offered IP address [10.99.200.78] to [74-40-BE-XX-XX-XX] for scope: IOT
[2024-05-02 14:20:46 Local] [10.99.200.1:67] DHCP Server offered IP address [10.99.200.11] to [54-AF-97-XX-XX-XX] for scope: IOT
[2024-05-02 14:20:46 Local] [10.99.200.1:67] DHCP Server offered IP address [10.99.200.79] to [40-3F-8C-XX-XX-XX] for scope: IOT
[2024-05-02 14:20:47 Local] [10.99.200.1:67] DHCP Server offered IP address [10.99.200.19] to [74-40-BE-XX-XX-XX] for scope: IOT
[2024-05-02 14:20:47 Local] [10.99.200.1:67] DHCP Server offered IP address [10.99.200.35] to [E4-C3-2A-XX-XX-XX] for scope: IOT
[2024-05-02 14:20:47 Local] [10.99.200.1:67] DHCP Server offered IP address [10.99.200.38] to [A4-2B-B0-XX-XX-XX] for scope: IOT
[2024-05-02 14:20:47 Local] [10.99.200.1:67] DHCP Server offered IP address [10.99.200.34] to [14-EB-B6-XX-XX-XX] for scope: IOT
[2024-05-02 14:20:48 Local] [10.99.200.1:67] DHCP Server offered IP address [10.99.200.36] to [E4-C3-2A-XX-XX-XX] for scope: IOT
[2024-05-02 14:20:48 Local] [10.99.200.1:67] DHCP Server offered IP address [10.99.200.79] to [40-3F-8C-XX-XX-XX] for scope: IOT
[2024-05-02 14:20:49 Local] [10.99.200.1:67] DHCP Server offered IP address [10.99.200.35] to [E4-C3-2A-XX-XX-XX] for scope: IOT
[2024-05-02 14:20:49 Local] [10.99.200.1:67] DHCP Server offered IP address [10.99.200.35] to [E4-C3-2A-XX-XX-XX] for scope: IOT
[2024-05-02 14:20:49 Local] [10.99.200.1:67] DHCP Server offered IP address [10.99.200.31] to [14-EB-B6-XX-XX-XX] for scope: IOT
[2024-05-02 14:20:49 Local] [10.99.200.1:67] DHCP Server offered IP address [10.99.200.33] to [54-AF-97-XX-XX-XX] for scope: IOT
[2024-05-02 14:20:49 Local] [10.99.200.1:67] DHCP Server offered IP address [10.99.200.6] to [40-3F-8C-XX-XX-XX] for scope: IOT
Similar issue here as well. I use dhcrelay to a freeradius box as dhcp server. Since last few days (probably after 24.1.6 update) dhcrelay is boken. Here is a snap of running dhcrelay stat -
# ps aux | grep dhcrelay
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
_dhcp 33776 100.0 0.0 12644 788 - R 21:33 526:09.32 /usr/local/sbin/dhcrelay -d -i vlan01 192.168.20.1
root 31008 0.0 0.0 12724 588 - Is 21:33 0:00.00 daemon: /usr/local/sbin/dhcrelay[31060] (daemon)
_dhcp 31060 0.0 0.0 12644 788 - I 21:33 0:00.00 /usr/local/sbin/dhcrelay -d -i vlan04 192.168.20.1
root 31789 0.0 0.0 12724 644 - Is 21:33 0:00.01 daemon: /usr/local/sbin/dhcrelay[32464] (daemon)
_dhcp 32464 0.0 0.1 12644 1076 - I 21:33 0:00.07 /usr/local/sbin/dhcrelay -d -i vlan03 192.168.20.1
root 32873 0.0 0.0 12724 592 - Is 21:33 0:00.01 daemon: /usr/local/sbin/dhcrelay[33776] (daemon)
root 93171 0.0 0.0 12724 652 - Is 21:35 0:00.00 daemon: /usr/local/sbin/dhcrelay[93896] (daemon)
_dhcp 93896 0.0 0.1 12644 1096 - I 21:35 0:00.00 /usr/local/sbin/dhcrelay -d -i vlan02 192.168.20.1
root 28592 0.0 0.0 436 256 0 R+ 08:20 0:00.00 grep dhcrelay
after I hit restart vlan01 dhcrelay service, things started working again. Stats are now showing like -
# ps aux | grep dhcrelay
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
root 4195 0.0 0.1 12724 2016 - Is 08:47 0:00.00 daemon: /usr/local/sbin/dhcrelay[4460] (daemon)
_dhcp 4460 0.0 0.1 12644 2144 - I 08:47 0:00.00 /usr/local/sbin/dhcrelay -d -i vlan01 192.168.20.1
root 31008 0.0 0.0 12724 588 - Is 21:33 0:00.00 daemon: /usr/local/sbin/dhcrelay[31060] (daemon)
_dhcp 31060 0.0 0.0 12644 788 - I 21:33 0:00.00 /usr/local/sbin/dhcrelay -d -i vlan04 192.168.20.1
root 31789 0.0 0.0 12724 644 - Is 21:33 0:00.02 daemon: /usr/local/sbin/dhcrelay[32464] (daemon)
_dhcp 32464 0.0 0.0 12644 1028 - I 21:33 0:00.07 /usr/local/sbin/dhcrelay -d -i vlan03 192.168.20.1
root 93171 0.0 0.0 12724 652 - Is 21:35 0:00.00 daemon: /usr/local/sbin/dhcrelay[93896] (daemon)
_dhcp 93896 0.0 0.0 12644 1000 - I 21:35 0:00.00 /usr/local/sbin/dhcrelay -d -i vlan02 192.168.20.1
root 53374 0.0 0.1 12720 2196 0 S+ 08:47 0:00.00 grep dhcrelay
Note: I doubt vlan02, vlan03, vlan04 are working all the time, but they cannot connect because APs are on vlan01 and most of my clients are wireless.
Just want to share an observation from my side:
I tried to revert to 24.1.5, but this did not work, so I decided to do a fresh install of 24.1, upgrade this to 24.1.5. and restore my config afterwards. But after having installed 24.1 the installation of the needed plugins was not possible - I received the message that my installation is outdatet and the installation of 24.1.6 is required. So I was forced to upgrade to 24.1.6. again :-( So I did the upgrade and restored my configuration.
And surprisingly, since the fresh install I had no issues until now. Crossing fingers, that this behaviour remains! :-)
If the current status remains and no further issues occur, I would assume, that the problem could be a result of a malfunction in the upgrade process of my previous installation to 24.1.6.
Let's see what happens in the next few days! :-)
Seeing the same thing on Business 24.4
We have 2 Relays, 1 for VLAN 10 and 1 for VLAN 50. VLAN 50 continues to work, VLAN 10 drops randomly.
A quick disable/enable fixes it until it fails again
Finally, the issue occured again. :'(
Interestingly, only one relay seemed to have these issues, but in the end I stopped to investigate and gave up. I migrated the DHCP task to the OPNsense, hoping that my OPNsense becomes as stable and reliable as it was before.
Cheers,
WM54
Been looking into this. Do we get any detached events here?
# opnsense-log | grep rc.linkup
or
# dmesg | grep link.state
Maybe the new dhcrelay client cannot cope with address rewrite on the interface and loses the ability to forward from a "nonexistent" address. The same address set twice appears to be a new address in the kernel causing the mismatch.
Cheers,
Franco
Are there further commands I can issue to troubleshoot? The ones below don't seem to show much.
In my case, igc0 = WAN, igc1 = LAN (trunk)
Thanks.
root@OPNsense:~ # opnsense-log | grep rc.linkup
root@OPNsense:~ #
root@OPNsense:~ # dmesg | grep link.state
igc0: link state changed to UP
igc1: link state changed to UP
lo0: link state changed to UP
igc1: link state changed to DOWN
igc0: link state changed to DOWN
igc1: link state changed to UP
igc1_vlan50: link state changed to UP
igc1_vlan20: link state changed to UP
igc1_vlan40: link state changed to UP
igc0: link state changed to UP
igc0: link state changed to DOWN
igc0: link state changed to UP
I am facing the same issue since 24.1.6
Well, so far what I did was moved the dhcp server to different vlan, and I didnt enable relay on that vlan.
So far config is working since last 15 days. My doubt is, packets were looping (I read from someones advise on the forum) for some reason while server and relay are on same vlan. But the catch is, I didnt have to do this up till earlier version, and it was working no issues.
Hopefully, will not broke in near future.
Hello,
I have the same issue since 24.1.6 (I just upgraded to 24.1.7).
Everything works for a while and I get 100% CPU usage from one of the dhcp_relay processes which blocks the whole DHCP service.
My setup :
Edge sites (x2) :
- ESXi 8
- OpenSense VM as main gateway
- OpenSense VM as "helper" with DHCP relay for multiple VLANs
- Multiple VLANs
- Unifi switches
- Windows Server VM with DHCP server (as standby)
Central site :
- ESXi 8
- OpenSense VM as main gateway
- OpenSense VM as "helper" with DHCP relay for multiple VLANs
- Multiple VLANs
- Unifi switches
- Windows Server VM with DHCP server (as standby)
Site-to-site Wireguard VPN
No DHCP guarding whatsoever on Unifi side.
Opnsense VMs (router and helper) all have an interface in each VLAN.
Target DHCP servers on edge sites are both the local and the central Windows DHCP server.
This setup worked flawlessly for months (if not years) before 24.1.6.
My Windows DHCP servers also serve the very VLAN where my Opnsense VM and my Windows servers have their management interface.
I tried deactivating the DHCP relay for this management VLAN as per https://forum.opnsense.org/index.php?topic=40126.0 thinking it would solve it (I could live with that workaround even if not ideal and degraded compared to before 24.1.6).
But the issue still occurs now and then, i have to restart dhcp relay for some other VLANs to have CPU come down to normal.
My latest workaround is to make a daily reboot of the helper VM. Definitly not a bulletproof approach.
1°/ Would you know if the developer team is aware of the situation and working on it ?
2°/ Would you know where i could find useful logs for the new dhcp relay service ?
I've been very happy with the tremendous work around OpnSense.
It's the first time in years that I encounter such a blocking issue after an upgrade.
Thank you in advance.
PS : I submitted a bug report : https://github.com/opnsense/core/issues/7471
https://github.com/opnsense/core/issues/7471#issuecomment-2133085251
feedback on the latest binary is welcome...
I have just noticed that a device from the manufacturer EQ3 (Homematic IP Accesspoint [HmIP-HAP]) no longer works with the dhcrelay. I'm not sure when this changed, but I've just noticed that a power failure must have caused the device to want a new lease.
Thank for the patch. Unfortunately, DHCP relay is still not working for me.
I can see the DHCP responses from the server but these are not appearing on the correct interfaces.
root@OPNsense:/var/log/system # tcpdump -i igc1_vlan20 udp port 67 or port 68
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on igc1_vlan20, link-type EN10MB (Ethernet), capture size 262144 bytes
16:23:01.335025 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from b6:b7:dc:11:41:9a (oui Unknown), length 300
16:23:09.951942 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from b6:b7:dc:11:41:9a (oui Unknown), length 300
16:23:18.588958 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from b6:b7:dc:11:41:9a (oui Unknown), length 300
16:23:26.376540 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from b6:b7:dc:11:41:9a (oui Unknown), length 300
16:23:34.825601 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from b6:b7:dc:11:41:9a (oui Unknown), length 300
16:23:44.910245 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from b6:b7:dc:11:41:9a (oui Unknown), length 300Quote from: franco on May 27, 2024, 01:16:37 PM
https://github.com/opnsense/core/issues/7471#issuecomment-2133085251
feedback on the latest binary is welcome...
The patch fixes an endless loop in the packet capture. It's not a "functional" fix or it could also mean your setup is incorrect.
Cheers,
Franco
Thanks. I have put a SNAT workaround on my DHCP servers so DHCPOFFER messages appear to come from the DHCPrelay address. DHCPOFFER messages were being dropped because the source didn't match the DHCPrelay address config.
Quote from: franco on May 29, 2024, 08:42:27 AM
The patch fixes an endless loop in the packet capture. It's not a "functional" fix or it could also mean your setup is incorrect.
You were the one on GitHub with the iptables workaround?
In any case I'm not sure the older client handled this differently and or which behaviour is the correct one.
We will probably chase down one or two more problems in the mid-term so any data point is appreciated.
Cheers,
Franco