Unbound TCP drops every 7.5min due to DHCP RENEW calling add_new_resolv_conf

Started by Multiplex, Today at 08:58:52 AM

Previous topic - Next topic
System Admin here, looking for a permanent fix for the following issue (without manual system file edits). Since networking isn't my primary focus, I've used AI to help document the technical details, which I've pasted below.

## Environment

- **OPNsense version:** 26.1.6_2-amd64
- **FreeBSD:** 14.3-RELEASE-p10
- **OpenSSL:** 3.0.20
- **ISP:** Optus NBN Australia
- **WAN:** DHCP with 900 second lease (renews every 7.5 minutes)
- **Unbound:** DNS over TLS (DoT) to 1.1.1.1:853 and 9.9.9.9:853
- **Downstream:** Two Pi-hole instances forwarding to Unbound at 192.168.9.254:53
  - Pi-hole Core v6.4.1
  - FTL v6.6
  - Web Interface v6.5

---

## The Problem

Both Pi-hole instances intermittently log the following warning:

```
WARNING Connection error (192.168.9.254#53): TCP connection failed while
receiving payload length from upstream (Connection prematurely closed by
remote server)
```

The errors occur with precise regularity — every 7.5 minutes — matching exactly the Optus DHCP lease renewal interval.

---

## Investigation

### Step 1 — Ruled out Unbound restarts

Checked the resolver log for Unbound stop/start events during the error windows:

```
grep "start of service\|service stopped" /var/log/resolver/latest.log
```

Unbound was **not restarting** during the error windows. The TCP connection was being closed without Unbound being killed.

---

### Step 2 — Found the DHCP correlation

Cross-referencing the Pi-hole FTL log against the OPNsense system log revealed a consistent pattern:

**System log:**
```
2026-05-09T15:33:59 dhclient-script: Reason RENEW on igc0 executing
2026-05-09T15:33:59 dhclient-script: Creating resolv.conf
2026-05-09T15:41:29 dhclient-script: Reason RENEW on igc0 executing
2026-05-09T15:41:29 dhclient-script: Creating resolv.conf
2026-05-09T15:48:59 dhclient-script: Reason RENEW on igc0 executing
2026-05-09T15:48:59 dhclient-script: Creating resolv.conf
```

**Pi-hole FTL log:**
```
2026-05-09 15:38:15 WARNING Connection error (192.168.9.254#53): TCP connection failed
2026-05-09 15:41:46 WARNING Connection error (192.168.9.254#53): TCP connection failed
2026-05-09 15:50:07 WARNING Connection error (192.168.9.254#53): TCP connection failed
```

Every single TCP error correlates with a preceding DHCP RENEW. No RENEW = no error.

---

### Step 3 — Confirmed IP is not changing

The DHCP lease file confirms the WAN IP is stable across renewals:

```
option dhcp-lease-time 900;
```

The cached IP file also confirms no change:
```
cat /tmp/igc0_oldip
175.32.32.48
```

The renewal is a simple lease extension — the IP, gateway, and DNS servers do not change.

---

### Step 4 — Traced the dhclient script

Reading `/usr/local/opnsense/scripts/interfaces/dhclient-script`, the `BOUND|RENEW|REBIND|REBOOT` block contains the following:

```sh
changes="no"
if [ -n "$old_ip_address" ]; then
    if [ "$old_ip_address" != "$new_ip_address" ]; then
        delete_old_address
        delete_old_routes
        changes="yes"
    fi
fi
if [ "$reason" = BOUND ] || \
   [ "$reason" = REBOOT ] || \
   [ -z "$old_ip_address" ] || \
   [ "$old_ip_address" != "$new_ip_address" ]; then
    add_new_address
    add_new_routes
    changes="yes"
fi
add_new_resolv_conf        # <-- called unconditionally on every RENEW
if [ "$changes" = "yes" ] ; then
    /usr/local/sbin/configctl -d interface newip $interface force
fi
```

`add_new_resolv_conf` is called **unconditionally** on every RENEW, regardless of whether anything has actually changed. When the IP has not changed, `changes` remains `"no"` and `configctl interface newip` is correctly skipped — but `add_new_resolv_conf` still runs every time.

---

### Step 5 — Traced what add_new_resolv_conf does

```sh
add_new_resolv_conf()
{
    $LOGGER "Creating resolv.conf"
    ARGS="-i ${interface} -4nd"
    for nameserver in ${new_domain_name_servers}; do
        ARGS="${ARGS} -a ${nameserver}"
    done
    /usr/local/sbin/ifctl ${ARGS}
    /usr/local/sbin/ifctl -i ${interface} -4sd ${new_domain_name:+"-a ${new_domain_name}"}
    return 0
}
```

This calls `ifctl` to update the interface nameserver state on every RENEW. Even though the nameservers haven't changed, this briefly disrupts Unbound's **outgoing DNS over TLS connections** to 1.1.1.1:853 and 9.9.9.9:853.

---

### Step 6 — Connected to Pi-hole FTL behaviour

Pi-hole FTL v6 holds TCP connections open for reuse as a performance feature. During the DoT disruption window caused by `add_new_resolv_conf`, Unbound closes the inbound TCP connection from Pi-hole. FTL doesn't detect the closure until it tries to reuse the connection — at which point it logs the warning.

The Pi-hole dnsmasq log confirms the connection fails and then immediately succeeds on retry with a new connection:

```
11:44:13 dnsmasq[928]: TCP connection failed: Connection prematurely closed by remote server
11:44:13 dnsmasq[928]: config error is REFUSED (EDE: network error)
11:44:13 dnsmasq[929]: forwarded rumt-zh.com to 192.168.9.254
11:44:13 dnsmasq[929]: reply rumt-zh.com is 113.240.76.236
```

DNS resolution continues normally — this is a reliability and logging noise issue rather than a complete outage.

---

## Things Tried That Did Not Fix It

- `edns-tcp-keepalive: yes` on Unbound — Pi-hole FTL does not send the EDNS keepalive option in requests so Unbound never includes it in responses
- `tcp-idle-timeout: 120000` on Unbound — does not address DoT disruption triggered by RENEW
- Outgoing Network Interfaces → All — stopped Unbound from fully restarting on RENEW but `add_new_resolv_conf` still disrupts DoT
- `serve-expired-client-timeout` adjustments — no effect
- `edns-packet-max=1232` on Pi-hole — reduces TCP usage but does not eliminate it

---

## Suggested Fix

Wrap `add_new_resolv_conf` with a condition so it only runs when the IP actually changes, or on first `BOUND`/`REBOOT` — not on every `RENEW` when nothing has changed:

```sh
# Current (runs unconditionally on every RENEW):
add_new_resolv_conf

# Suggested (only runs when something has actually changed):
if [ "$changes" = "yes" ] || [ "$reason" = "BOUND" ] || [ "$reason" = "REBOOT" ]; then
    add_new_resolv_conf
fi
```

This prevents `ifctl` from unnecessarily updating the interface nameserver state on RENEW events where the IP, gateway, and nameservers are identical to what was already configured.

---

## Note

I have not applied this fix directly as the file `/usr/local/opnsense/scripts/interfaces/dhclient-script` is owned by an OPNsense package and would be overwritten on upgrades. Ideally this would be addressed in the script itself in a future release.

Happy to provide any additional logs or information if helpful.