Caddy/Caddy Plugin memory leak?

Started by FredsterNL, June 02, 2024, 06:39:24 AM

Previous topic - Next topic
Hi all,

I recently installed the lastest 24.1.8 running on a DEC750 firewall.

I am using the Caddy plugin to get a reverse proxy going (and failing) with inconsistent errors. When I request a webpage from the reverse proxy, I noticed that a Netflix stream on my TV was stopping at times.

In OPNsense lobby, with the client still trying to get a page, I saw the following:

- Memory being gobbled up completely,
- ...followed by the Swap disk filling up completely,
- ...followed by a complete freeze (no response from the GUI), or, random services stopped (suffocated?), like Clam, FreshClam, Crowdsec and ALWAYS the Caddy service

The whole process takes about 2 minutes and I either start getting updates from the GUI (Lobby) again with several services stopped, or I can login to the GUI again with all services running (I presume a full crash restarts the machine?).

Either way: If I refresh the request from the Firefox client again, this 'self inflicted DDOS' starts again.

I captured a video of the event, but I got to blur out my details first before I can post it.

My questions:
- Are others seeing this behavior (possibly by unrelated services stopping as described)?
- What LOG/Config files are needed for troubleshooting?

Any input is welcome as always 🙂
Running OPNsense on a Deciso DEC750 with upgraded memory (16GB ECC) and active cooling

June 02, 2024, 08:22:08 AM #1 Last Edit: June 02, 2024, 08:29:17 AM by Monviech
- Did you check how much memory the caddy process had?
- Does the memory consumption of the caddy process increase steadily when you trigger that error?
- Can you check if the same behavior happens when caddy is disabled?
- Can you look at "top" to see which processes take the most memory?

- Please write all aditional plugins you have installed on your OPNsense. I see clamav, does that mean you use squid too? That can be a candidate.

- Caddy having a real memory leak is highly unlikely, since it is written in go which is a memory safe language.
Hardware:
DEC740

June 02, 2024, 03:02:32 PM #2 Last Edit: June 02, 2024, 03:58:01 PM by Monviech
The problem is a configuration issue:

When reverse proxying, the frontend and upstream have to be different socket destinations (IP:PORT) or (DOMAIN:PORT). If they're both the same, there is a loop that will crash caddy.

Example:

example.com:8443 {
    reverse_proxy example.com:8443
}

This configuration will loop the traffic indefinitely in Caddy, since caddy sends AND receives the traffic in a loop.


EDIT:

This can not be input validated, since the above configuration /can/ be valid in certain cases:

There is an external and an internal DNS server with a split DNS zone.

- For caddy only (And the OPNsense), example.com resolves to 192.168.1.1
- For internal and external clients, example.com resolves to 1.1.1.1

Now:

example.com
    reverse_proxy example.com

is entirely valid.
Hardware:
DEC740

Thanks for your help!

DDOSing myself, quite effective as well :o

As input validation is 'iffy/Not always invallid' as you describe in your EDIT, would throwing a warning (with your explanation) be possible?

Alternatively, Not having any GO knowledge whatsoever: Is it maybe possible to limit (configurable) how many GO ROUTINES are being created within X seconds, in order to stop memory/Swap being consumed/running out?
Running OPNsense on a Deciso DEC750 with upgraded memory (16GB ECC) and active cooling

There are some possibilities but it's patching around a configuration error that happens very very rarely. So spending a lot of effort on fixing this or inflating the code in either caddy or the plugin is not that fruitful.

I mean, it's also a bad idea to put your hand onto a stove, but the stove won't automatically turn off because it has a hand detection built in.  :)
Hardware:
DEC740

Quote from: Monviech on June 02, 2024, 05:16:15 PM
I mean, it's also a bad idea to put your hand onto a stove, but the stove won't automatically turn off because it has a hand detection built in.  :)

I like the analogy, as I got burned today ;)

I must be very, very, rare :P, and I fully agree that not every eventuality can/should be handled.
Running OPNsense on a Deciso DEC750 with upgraded memory (16GB ECC) and active cooling