no wan, caddy hangs startup

Started by dirtyfreebooter, June 22, 2024, 04:30:56 PM

Previous topic - Next topic
i had an issue where i was debugging a WAN issue and had to reboot OPNsense. 24.1.9_4. the WAN was dropping packets, etc, but "starting caddy..." just hangs. i had to SSH and kill the caddy processes to get the rest of the boot to finish.

anybody experience this. why does caddy need internet access to start?

Are there any steps to reproduce this? Or was it a one time issue?
Hardware:
DEC740

i have not tried reproducing with the WAN cable unplugged as that might work, in my case, my fiber ONT ended up needing a reboot. so my WAN interface was getting an IP, but no packets were getting out and no DNS could be resolved.

whatever the case, i found it strange that caddy seem to need any sort of outside communication to startup.

since i rebooted my ONT, all my connectivity is back, so it does not reproduce, but before i ended up rebooting my ONT, i rebooted OPNsense 3 times and all 3 times i had to SSH in and kill caddy stuck in "service caddy start" phase, otherwise the rest of the boot sequence wouldn't happen. i didn't wait all that long, the first time i waited around 2-3 minutes.

When developing I put caddy on isolated VMs that don't have any outside connection.

On older plugin versions I had it once or twice that it hang during bootup, but I did not experience it again in later versions.

When no explicit interface binding is configured (default_bind) it binds to "any" interface, this not needing any explicit interface or IP to be available during startup.

I'll keep an eye out for it since I have not experiences this myself in quite a while now, though I have before.
Hardware:
DEC740

yea i think the difference with your VM setup is that all interfaces are acting properly. in my case, the WAN interface was UP but connectivity was broken. i run all the standard services, plus vnstat, zenarmor, etc, caddy was the only one to hang during this scenario.

looking at the caddy logs, it seems to be related to ACME certs maybe? Trying to go out to http://r3.o.lencr.org and since the WAN was UP but dropping / error packets at the ONT, that request was taking a long time to fail.

the last thing in the logs before i either killed the process or rebooted was

"warn","ts":"2024-06-22T14:22:28Z","logger":"tls","msg":"stapling OCSP","error":"no OCSP stapling for [*.<hidden>]: making OCSP request: Post \"http://r3.o.lencr.org\": read tcp 71.218.123.180:18842->23.38.100.180:80: read: connection reset by peer","identifiers":["*.<hidden>"]}

that message seems to indicate that particular request actually returned an error "connection reset", but if i was taking a wild guess, i would assume the ACME stuff if run at startup would be where it would try and make an outside connection.

June 22, 2024, 08:36:02 PM #5 Last Edit: June 22, 2024, 09:01:17 PM by Monviech
Are you willing to share your logfiles?

/var/log/caddy/caddy
and any of the other recent logs like
/var/log/caddy/@latest
and a date before that.

And the time the hang happens would be nice too.

You can PM them to me if you dont want to post them publicly. I can then share them with the caddy developers.
Hardware:
DEC740