DHCP issues and significant data load

Started by geden, January 05, 2024, 02:38:19 PM

Previous topic - Next topic
I've been running Opnsense on a small Celeron box for about a year now and have a persistent issue I cannot seem to figure out. This is the first experience with proper networking since token ring was a thing, so Im still learning.

The box is running Opnsense 23.7.10 with AdGuardHome, Unbound, and Unifi controller plugins from mimugmails repository. Network consists of two clients on ethernet and a number of devices wireless through an AP. Connection to the ISP is through an IPv6 VLAN connection mapped to the WAN port. The only specification here is vlan 101, all other values are default. (For Adguard I used this approach: https://forum.opnsense.org/index.php?topic=22162.120)

The key issue is that whenever a signigicant load is placed on the connection from the wired clients. Wireless clients does not cause issues. When this happens, the internet connection is dropped for the whole network, and restarts relatively quick afterwards. It doesnt take much, a graphics driver download or similar is usually enough.
I have tried with both Unbound and Adguard enabled, and disabling Adguard and just using googles DNS. The issue persists in both cases, but with different log entries.


With Unbound and AdGuardHome these are the errors:

2024-01-05T14:20:28 Error opnsense /usr/local/sbin/pluginctl: The command '/bin/kill -'TERM' '1572''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 1572: No such process'
2024-01-05T14:20:18 Error opnsense /usr/local/sbin/pluginctl: The command '/bin/kill -'TERM' '78123''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 78123: No such process'
2024-01-05T14:19:47 Error opnsense /usr/local/sbin/pluginctl: The command '/bin/kill -'TERM' '34221''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 34221: No such process'



Without unbound, using google DNS:

2024-01-05T14:04:50 Critical dhclient exiting.
2024-01-05T14:04:50 Error dhclient connection closed
2024-01-05T14:04:39 Error dhclient send_packet: No buffer space available
2024-01-05T13:59:27 Error opnsense /usr/local/etc/rc.newwanipv6: The command '/bin/kill -'TERM' '47346''(pid:/var/dhcpd/var/run/dhcpdv6.pid) returned exit code '1', the output was 'kill: 47346: No such process'
2024-01-05T13:59:24 Error opnsense /usr/local/etc/rc.newwanip: The command '/bin/kill -'TERM' '76719''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 76719: No such process'
2024-01-05T13:59:22 Error opnsense /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '47346''(pid:/var/dhcpd/var/run/dhcpdv6.pid) returned exit code '1', the output was 'kill: 47346: No such process'
2024-01-05T13:59:19 Error dhcp6c transmit failed: No buffer space available
2024-01-05T13:59:18 Error dhcp6c transmit failed: No buffer space available
2024-01-05T13:59:17 Critical dhclient exiting.
2024-01-05T13:59:17 Error dhclient connection closed
2024-01-05T13:59:17 Error opnsense /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '47346''(pid:/var/dhcpd/var/run/dhcpdv6.pid) returned exit code '1', the output was 'kill: 47346: No such process'


Im a bit at a loss here. With googles DNS it seems that the firewall has an issue getting an IP from the ISP dhcp server, while with unbound, it seems as unbound is causing the issues. I am probably overlooking something very basic, but I havent been able to search up a potential solution unfortunately.

Hi,

first question: Your WAN is IPv6 only?
I doubt it is related to a DNS issue. If you do not have IPv4 I guess it is trying to use IPv4 but fails and therefore struggles. But just a wild guess.

If you have IPv6 only make sure to enter all values as IPv6 only, do not use IPv4 at all in this case.

IF you have IPv6 on WAN only additional it might bedifferent.

/KNEBB

January 05, 2024, 03:05:27 PM #2 Last Edit: January 05, 2024, 03:10:45 PM by geden
Hi, and thank you.

Currently WAN interface only shows IPv4 address, though in the past it would display IPv6 as well. I recently received a global dynamic IPv4 from my ISP. Prior to this I would only run IPv6 on WAN interface.

Unfortunately supprot from the ISP is non-existent, and information is relatively sparse. I hope you bear with me.

Edit: to clarify, on WAN there is both a DHCP and DHCP6 gateway. I am receiving both IPv4 and IPv6 adresses from my ISP.

Ok, in this case I suggest to try to check with `ping`.

See eif ping goest to the receivedd DNS server and to 1.1.1.1
Same with ping6 for IPv6- use IPv6 addresses instead, obviously.

If ping runs fine we can check further.

/KNEBB

Both IPv4 and IPv6 DNS servers (using Cloudflare servers) are reachable with ping requests. Running each multiple times, IPv6 had a slight packet loss even with limited traffic on the network.
This is consistent with the gateway monitor reporting a few percentage packet loss on the IPv6 connection. This is only sporadically though and not persistent.

I've tried to replicate and isolate the current errors, and this is a good example of these:
2024-01-05T22:57:33 Error opnsense /usr/local/etc/rc.newwanipv6: The command '/usr/sbin/daemon -f -p '/var/run/dhcpleases6.pid' '/usr/local/opnsense/scripts/dhcp/prefixes.sh'' returned exit code '3', the output was 'daemon: process already running, pid: 61299'
2024-01-05T22:57:33 Error opnsense /usr/local/etc/rc.newwanipv6: The command '/usr/local/sbin/dhcpd -6 -user dhcpd -group dhcpd -chroot /var/dhcpd -cf /etc/dhcpdv6.conf -pf /var/run/dhcpdv6.pid re1' returned exit code '1', the output was 'Internet Systems Consortium DHCP Server 4.4.3-P1 Copyright 2004-2022 Internet Systems Consortium. All rights reserved. For info, please visit https://www.isc.org/software/dhcp/ Config file: /etc/dhcpdv6.conf Database file: /var/db/dhcpd6.leases PID file: /var/run/dhcpdv6.pid There's already a DHCP server running. If you think you have received this message due to a bug rather than a configuration issue please read the section on submitting bugs on either our web page at www.isc.org or in the README file before submitting a bug. These pages explain the proper process and the information we find helpful for debugging. exiting.'
2024-01-05T22:57:18 Error opnsense /usr/local/etc/rc.newwanipv6: The command '/bin/kill -'TERM' '65153''(pid:/var/dhcpd/var/run/dhcpdv6.pid) returned exit code '1', the output was 'kill: 65153: No such process'
2024-01-05T22:57:13 Critical dhclient exiting.
2024-01-05T22:57:13 Error dhclient connection closed
2024-01-05T22:56:51 Error opnsense /usr/local/etc/rc.newwanipv6: The command '/bin/kill -'TERM' '17408''(pid:/var/dhcpd/var/run/dhcpdv6.pid) returned exit code '1', the output was 'kill: 17408: No such process'
2024-01-05T22:56:49 Error opnsense /usr/local/etc/rc.newwanip: The command '/bin/kill -'TERM' '93905''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 93905: No such process'
2024-01-05T22:56:47 Error opnsense /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '17408''(pid:/var/dhcpd/var/run/dhcpdv6.pid) returned exit code '1', the output was 'kill: 17408: No such process'
2024-01-05T22:56:44 Error dhcp6c transmit failed: No buffer space available
2024-01-05T22:56:43 Error dhcp6c transmit failed: No buffer space available
2024-01-05T22:56:42 Critical dhclient exiting.
2024-01-05T22:56:42 Error dhclient connection closed
2024-01-05T22:56:42 Error opnsense /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '17408''(pid:/var/dhcpd/var/run/dhcpdv6.pid) returned exit code '1', the output was 'kill: 17408: No such process'
2024-01-05T22:55:20 Critical dhclient exiting.
2024-01-05T22:55:20 Error dhclient connection closed
2024-01-05T22:55:06 Error dhclient send_packet: No buffer space available

Ok, try to `ping`the gateways (my fault, I initially meant the gateways, not the DNS servers).

And make sure with packet capture if the ICMP requests leave on the right WAN interface.

January 06, 2024, 05:31:08 PM #6 Last Edit: January 06, 2024, 05:34:13 PM by geden
Please, Im just happy for the feedback!

Both IPv6 and IPv4 gateways are responding with relatively normal latency. Of course when the fault is occuring (e.g. large steam download), the gateways are not responsive.

Edit: and by ensuring the correct WAN interface, I assume you mean this for IPv6
PING6(56=40+8+8 bytes) myIPv6Address%vlan01 --> GatewayIPAddressFromISP

In case anyone stumbles upon this, it turned out to be a hardware issues. Realtek NICs could apparently not handle the high throughput of a fibre connection. Switched to a different box running dual intel 82574L NICs and the issue went away.