OPNSense / Caddy - Runs on one system, won't start on another (corrupting key)

Started by Linwood, August 05, 2025, 02:08:51 AM

Previous topic - Next topic
I have a weird issue.  I'm running OPNSense 25.7.1.1_1 on two separate machines (hardware not virtual). One is a new, simple minded Beelink EQi12 and it works there.

I have an older system, a Z170-WS motherboard with I7-6700K.  It's a pretty powerful desktop and I was going to use it as a backup system. So I installed OPNSense, then all the plugins, then restored the configuration from the first system changing the interface names.

Everything is fine -- almost.  CADDY won't start.  It does not give an error as it fails to start, it just disappears. The log file looks identical to the other system.

A validation gives an error about missing TLS.  So I started looking around and in /var/db/caddy/data/caddy/certificates/temp is a key pair with a numeric name.  If I look at the key file on the working system it's a private key and looks right.

On the non-working system, about 2/3rds of the way down the ascii characters turn into binary junk.  So ... I think "something corrupted it, I'll just put it back".

So I edited the file, and put it back correctly, and started CADDY and ... it disappears and the file is corrupted again.

I have no idea why or how.  So I deleted the plugin and re-installed it, and replaced the config file and rebooted and -- corrupted again.

There is no sign of any disk related issues, it's ZFS and happy.   The binary crap appears identical each time it gets corrupted (at least to the eye).

What makes this a bit more confusing is that the primary system is itself a copy from yet a different firewall system I replaced, and that one is just fine.  So it's not the process of restoring the config file.  Indeed if I replace the file all I have to do is start caddy to corrupt it, nothing else running.

I am using caddy only for proxy, it is not tied to ACME (I run that separately and it doesn't run during these issues, plus this isn't related to that cert I think).

Here's the log (but this is also the log, including that confusing host-checking error, on the working system).

<14>1 2025-08-04T19:59:32-04:00 OPNsense.leferguson.com caddy - - [meta sequenceId="1"] "info","ts":"2025-08-04T23:59:32Z","logger":"admin","msg":"admin endpoint started","address":"unix//var/run/caddy/caddy.sock|0220","enforce_origin":false,"origins":[]}
<12>1 2025-08-04T19:59:32-04:00 OPNsense.leferguson.com caddy - - [meta sequenceId="2"] "warn","ts":"2025-08-04T23:59:32Z","logger":"admin","msg":"admin endpoint on open interface; host checking disabled","address":"unix//var/run/caddy/caddy.sock|0220"}


This is the config file (slightly munged to remove private stuff):

# DO NOT EDIT THIS FILE -- OPNsense auto-generated file


# caddy_user=root

# Global Options
{
        log {
                include http.log.access.0920f7c1-fc06-4693-b451-93f7ec3e50d1
                output net unixgram//var/run/caddy/log.sock {
                }
                format json {
                        time_format rfc3339
                }
        }

        http_port  xxxx
        https_port qqqq

        servers {
                protocols h1 h2 h3
        }

        auto_https off
        grace_period 10s
        import /usr/local/etc/caddy/caddy.d/*.global
}

# Reverse Proxy Configuration


xxx.xxxxxxxxxxxxx.com:qqqq {
        log 0920f7c1-fc06-4693-b451-93f7ec3e50d1
        tls /var/db/caddy/data/caddy/certificates/temp/63a4a8494b255.pem /var/db/caddy/data/caddy/certificates/temp/63a4a8494b255.key {
        }

        handle {
                reverse_proxy 192.168.131.210:80 {
                        transport http {
                        }
                }
        }
}

import /usr/local/etc/caddy/caddy.d/*.conf

There are no config files in the import folder.

Any idea what is going on?  Or where to look?

Linwood

PS. On the working system, the reverse proxy does actually work, it's not just that caddy runs.

They certificates in the temp folder get extraced from the config.xml

You can run the script manually

https://github.com/opnsense/plugins/blob/d51930bffad504573f570c1923b27a4e56d8a1f0/www/caddy/src/opnsense/scripts/OPNsense/Caddy/caddy_certs.php

Its in /usr/local/opnsense... and path above.

If it does not extract the certificate correctly I assume its faulty in the config.xml.

Try to download the same cert from System - Trust - Certificates or System - Trust - Authorities and see if its the same mangled result.
Hardware:
DEC740

Quote from: Monviech (Cedrik) on August 05, 2025, 06:59:00 AMIf it does not extract the certificate correctly I assume its faulty in the config.xml.

I think I explained the problem poorly.

I've replaced the cert correctly, but as soon as I start the service it becomes corrupted.  It's not the config import.  I literally edited the file to get the right cert, and as soon as I hit the start button (a) it doesn't start, disappearing without error in the log, and (b) the cert key file is corrupt.

Can I somehow run it manually with output to an SSH session the same way it is run when hitting the service start?  I feel like there's an error not being captured that might give me a clue.

I just have no idea what is different in this machine and the other two where it ran fine (a very hold HP I'm retiring, a new BeeLink which is going to be the primary FW) and this machine. It's rather old, but was powerful for its time, however, not unusual for its time.  While it is a "K" it is not overclocked, and has been completely stable for many years, I relegated it to spare only because that processor is not supported on Windows (no TPM, and something about the processor itself).

I explained it above.

In your Caddy configuration in a domain you set a certificate.

This is a certificate that is inside "System - Trust - Certificates"

Upon each service reload, Caddy executes the above PHP script to extract the file from "System - Trust - Certificates", and put it into /var/db/caddy/.../certificates/temp/

That script uses the trust store as source, check if the certificate in the trust store is maybe corrupted or something, by downloading it from the WebGUI in System - Trust - Certificates.
Hardware:
DEC740

Quote from: Monviech (Cedrik) on August 05, 2025, 09:42:12 PMUpon each service reload,

Ah, I was stuck on reboot not service reload.  Let me investigate (it takes a bit as I can't have the two up at the same time without making a mess, so need to run back and forth and wait for reboots a lot).  More in a bit.

Sorry about that and thanks.

Bingo.  When I changed "re0" to "igb0" for interfaces thinking it was safe, and bang in the middle of one of the certs was that string.

Sigh.  Mea culpa. Global search and replace was a bit too global.

Thank you thank you.