Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - Multiplex

#1
System Admin here, looking for a permanent fix for the following issue (without manual system file edits). Since networking isn't my primary focus, I've used AI to help document the technical details, which I've pasted below.

## Environment

- **OPNsense version:** 26.1.6_2-amd64
- **FreeBSD:** 14.3-RELEASE-p10
- **OpenSSL:** 3.0.20
- **ISP:** Optus NBN Australia
- **WAN:** DHCP with 900 second lease (renews every 7.5 minutes)
- **Unbound:** DNS over TLS (DoT) to 1.1.1.1:853 and 9.9.9.9:853
- **Downstream:** Two Pi-hole instances forwarding to Unbound at 192.168.9.254:53
  - Pi-hole Core v6.4.1
  - FTL v6.6
  - Web Interface v6.5

---

## The Problem

Both Pi-hole instances intermittently log the following warning:

```
WARNING Connection error (192.168.9.254#53): TCP connection failed while
receiving payload length from upstream (Connection prematurely closed by
remote server)
```

The errors occur with precise regularity — every 7.5 minutes — matching exactly the Optus DHCP lease renewal interval.

---

## Investigation

### Step 1 — Ruled out Unbound restarts

Checked the resolver log for Unbound stop/start events during the error windows:

```
grep "start of service\|service stopped" /var/log/resolver/latest.log
```

Unbound was **not restarting** during the error windows. The TCP connection was being closed without Unbound being killed.

---

### Step 2 — Found the DHCP correlation

Cross-referencing the Pi-hole FTL log against the OPNsense system log revealed a consistent pattern:

**System log:**
```
2026-05-09T15:33:59 dhclient-script: Reason RENEW on igc0 executing
2026-05-09T15:33:59 dhclient-script: Creating resolv.conf
2026-05-09T15:41:29 dhclient-script: Reason RENEW on igc0 executing
2026-05-09T15:41:29 dhclient-script: Creating resolv.conf
2026-05-09T15:48:59 dhclient-script: Reason RENEW on igc0 executing
2026-05-09T15:48:59 dhclient-script: Creating resolv.conf
```

**Pi-hole FTL log:**
```
2026-05-09 15:38:15 WARNING Connection error (192.168.9.254#53): TCP connection failed
2026-05-09 15:41:46 WARNING Connection error (192.168.9.254#53): TCP connection failed
2026-05-09 15:50:07 WARNING Connection error (192.168.9.254#53): TCP connection failed
```

Every single TCP error correlates with a preceding DHCP RENEW. No RENEW = no error.

---

### Step 3 — Confirmed IP is not changing

The DHCP lease file confirms the WAN IP is stable across renewals:

```
option dhcp-lease-time 900;
```

The cached IP file also confirms no change:
```
cat /tmp/igc0_oldip
175.32.32.48
```

The renewal is a simple lease extension — the IP, gateway, and DNS servers do not change.

---

### Step 4 — Traced the dhclient script

Reading `/usr/local/opnsense/scripts/interfaces/dhclient-script`, the `BOUND|RENEW|REBIND|REBOOT` block contains the following:

```sh
changes="no"
if [ -n "$old_ip_address" ]; then
    if [ "$old_ip_address" != "$new_ip_address" ]; then
        delete_old_address
        delete_old_routes
        changes="yes"
    fi
fi
if [ "$reason" = BOUND ] || \
   [ "$reason" = REBOOT ] || \
   [ -z "$old_ip_address" ] || \
   [ "$old_ip_address" != "$new_ip_address" ]; then
    add_new_address
    add_new_routes
    changes="yes"
fi
add_new_resolv_conf        # <-- called unconditionally on every RENEW
if [ "$changes" = "yes" ] ; then
    /usr/local/sbin/configctl -d interface newip $interface force
fi
```

`add_new_resolv_conf` is called **unconditionally** on every RENEW, regardless of whether anything has actually changed. When the IP has not changed, `changes` remains `"no"` and `configctl interface newip` is correctly skipped — but `add_new_resolv_conf` still runs every time.

---

### Step 5 — Traced what add_new_resolv_conf does

```sh
add_new_resolv_conf()
{
    $LOGGER "Creating resolv.conf"
    ARGS="-i ${interface} -4nd"
    for nameserver in ${new_domain_name_servers}; do
        ARGS="${ARGS} -a ${nameserver}"
    done
    /usr/local/sbin/ifctl ${ARGS}
    /usr/local/sbin/ifctl -i ${interface} -4sd ${new_domain_name:+"-a ${new_domain_name}"}
    return 0
}
```

This calls `ifctl` to update the interface nameserver state on every RENEW. Even though the nameservers haven't changed, this briefly disrupts Unbound's **outgoing DNS over TLS connections** to 1.1.1.1:853 and 9.9.9.9:853.

---

### Step 6 — Connected to Pi-hole FTL behaviour

Pi-hole FTL v6 holds TCP connections open for reuse as a performance feature. During the DoT disruption window caused by `add_new_resolv_conf`, Unbound closes the inbound TCP connection from Pi-hole. FTL doesn't detect the closure until it tries to reuse the connection — at which point it logs the warning.

The Pi-hole dnsmasq log confirms the connection fails and then immediately succeeds on retry with a new connection:

```
11:44:13 dnsmasq[928]: TCP connection failed: Connection prematurely closed by remote server
11:44:13 dnsmasq[928]: config error is REFUSED (EDE: network error)
11:44:13 dnsmasq[929]: forwarded rumt-zh.com to 192.168.9.254
11:44:13 dnsmasq[929]: reply rumt-zh.com is 113.240.76.236
```

DNS resolution continues normally — this is a reliability and logging noise issue rather than a complete outage.

---

## Things Tried That Did Not Fix It

- `edns-tcp-keepalive: yes` on Unbound — Pi-hole FTL does not send the EDNS keepalive option in requests so Unbound never includes it in responses
- `tcp-idle-timeout: 120000` on Unbound — does not address DoT disruption triggered by RENEW
- Outgoing Network Interfaces → All — stopped Unbound from fully restarting on RENEW but `add_new_resolv_conf` still disrupts DoT
- `serve-expired-client-timeout` adjustments — no effect
- `edns-packet-max=1232` on Pi-hole — reduces TCP usage but does not eliminate it

---

## Suggested Fix

Wrap `add_new_resolv_conf` with a condition so it only runs when the IP actually changes, or on first `BOUND`/`REBOOT` — not on every `RENEW` when nothing has changed:

```sh
# Current (runs unconditionally on every RENEW):
add_new_resolv_conf

# Suggested (only runs when something has actually changed):
if [ "$changes" = "yes" ] || [ "$reason" = "BOUND" ] || [ "$reason" = "REBOOT" ]; then
    add_new_resolv_conf
fi
```

This prevents `ifctl` from unnecessarily updating the interface nameserver state on RENEW events where the IP, gateway, and nameservers are identical to what was already configured.

---

## Note

I have not applied this fix directly as the file `/usr/local/opnsense/scripts/interfaces/dhclient-script` is owned by an OPNsense package and would be overwritten on upgrades. Ideally this would be addressed in the script itself in a future release.

Happy to provide any additional logs or information if helpful.