Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - planetf1

#1
I'm using PPPOE with dual ipv4/ipv6 stack.
Version 25.1.5_5.

I decided to enable gateway monitoring.

The ipv6 interface shows ok, but oddly ipv4 is just showing as offline (ie red status icon)

Update: Specified monitor IP to a known good target (ie 1.1.1.1) which works fine. My ppp gateway is not responding to icmp.





#2
Now I realise something I did.

After the upgrade I noticed I had no flow stats, so I clicked the button to 'repair' the database. I'm guessing that is what caused the sudden demand, and subsequent OOM errors. I vaguely seem to remember having this in the past.
#3
I've had opnsense running very reliably for months. Simple home network - nothing fancy. I tend to update soon after release. I applied the latest minor update yesterday.
I'm running on an n100 (16GB ram) with proxmox (4GB ram)

Today for the first time I hit a memory limit - no obvious change in workload. I don't recollect running close to limits before. Suricata was active, but my traffic volumes are low.

Many opnsense services stopped/were killed. traffic is still flowing normally, but the UI is failing (ie unable to view logs etc), and some components are not running.

pid 64347 (suricata), jid 0, uid 0, was killed: failed to reclaim memory
pid 88258 (unbound), jid 0, uid 59, was killed: failed to reclaim memory
pid 85419 (crowdsec), jid 0, uid 0, was killed: failed to reclaim memory
pid 51196 (python3.11), jid 0, uid 0, was killed: failed to reclaim memory
pid 9872 (python3.11), jid 0, uid 0, was killed: failed to reclaim memory
pid 91615 (crowdsec-firewall-b), jid 0, uid 0, was killed: failed to reclaim memory
pid 28803 (python3.11), jid 0, uid 0, was killed: failed to reclaim memory
pid 53875 (php-cgi), jid 0, uid 0, was killed: failed to reclaim memory
pid 20235 (python3.11), jid 0, uid 0, was killed: failed to reclaim memory
pid 19721 (haproxy), jid 0, uid 80, was killed: failed to reclaim memory
pid 35672 (python3.11), jid 0, uid 0, was killed: failed to reclaim memory
pid 28774 (python3.11), jid 0, uid 0, was killed: failed to reclaim memory
pid 54200 (php-cgi), jid 0, uid 0, was killed: failed to reclaim memory
pid 54650 (php-cgi), jid 0, uid 0, was killed: failed to reclaim memory
pid 54426 (php-cgi), jid 0, uid 0, was killed: failed to reclaim memory
pid 54185 (php-cgi), jid 0, uid 0, was killed: failed to reclaim memory
pid 54065 (php-cgi), jid 0, uid 0, was killed: failed to reclaim memory

For now I'm going to assume it was suricata - so I may disable, or increase vm size. Just wanted to mention it in case anyone else has found a change with the latest build ...

Also worth saying that the data I have from proxmox suggests only 3GB was in use - so could there have been a sudden demand that caused an issue? (since proxmox only polls every ?min?)
#4
Just to say this was merged, and is now available in 25.1.2. Yay!
#5
Just to say this was merged, and is now available in 25.1.2. Yay!
#6

* You could inspect network traffic with wireshark or similar?
* Any firewall rules blocking traffic?
* Is vlan configuration correct?
* Any managed switches involved?
#8
General Discussion / Re: unbound and fallback resolvers
February 02, 2025, 04:28:00 PM
I've made a PR for this - feedback welcome

https://github.com/opnsense/core/pull/8275
#9
General Discussion / Re: unbound and fallback resolvers
January 31, 2025, 12:49:30 PM
I see this has been asked a few times.

It seems as if specifying this in /var/unbound/etc/dot.conf

forward-first: yes
would help

My current file is

# Forward zones

# Forward zones over TLS
server:
  tls-cert-bundle: /usr/local/etc/ssl/cert.pem

forward-zone:
  name: "."
  forward-tls-upstream: yes
  forward-addr: 2620:fe::9@853#dns.quad9.net
  forward-addr: 149.112.112.112@853#dns.quad9.net
  forward-addr: 9.9.9.9@853#dns.quad9.net
  forward-addr: 2620:fe::fe@853#dns.quad9.net

This at least would fallback to recursive resolution if required (albeit siliently)

I wonder also if I have too many forwarders here - would unbound just try 1 or all. I suspect all might fail at once.

Still, this could be a useful opnsense enhancement. I previously did a PR to add another parm for unbound (now merged), so may see if I can suggest a change for this?
#10
General Discussion / unbound and fallback resolvers
January 31, 2025, 12:37:09 PM
I'm using opnsense 24.10 (rc2) with unbound.

I use quad9 as my resolver in unbound, via TLS.

All these components are generally reliable. However from time to time quad9 has been known to have an outage -- for example it did today in parts of the UK. Most queries timed out for many in the north.

I do configure unbound to allow serving results after ttl expires. Mostly to handle a few chinese sites where on occasion the upstream resolution can take >5s.

In the past I used a 'ctrld' daemon from ControlD which had a nice feature - as well as defining multiple resolvers, you could configure error handling, for example what to do after a timeout. So you could imagine having an initial 2.5s timeout, then a fallback to another resolver etc.

unbound doesn't seem to have this - I can only specify multiple resolvers. However rather than some random/round robin I want a more predicable work distribution. ie always hit quad9 first, and only hit an alternate after a short timeout. Monitor, and if there are multiple timeouts, flip over to a backup. Then check periodically to see if situation has returned to normal before switching default back. Alert admin of changes.

Why? Well, since quad9 does malicious filtering & seems to be the most accurate - few false positives, yet up to date with current threats. Also benefits from being open & non-profit.

Any thoughts? I can imagine any of the following
 - Give up on malicious filtering, and just load up unbound with more TLS resolvers
 - as above but use recursive resolve (more data exposure, slower)
 - move from unbound to ctrld (flexible, but I think it's a bit flaky)
 - multiple tls reoslvers, and implement local filtering (unbound, or pihole)
 - no nothing as quad9 outage is rare
 - Implement some external monitoring. Flip configuration when outage detected, flip back later

Any thoughts on good approaches here
#11
Answering my own question - I realised the 'insight aggregator' was not actually running. Not sure why not as insight was enabled. I've not rebooted yet so will watch out to see if it comes back. Anyway, it's working ...
#12
I updated from 24.7.12 to 25.1rc1 via a full upgrade (boot/config import).

Generally opnsense is working well. However I've noticed that I am not seeing any insight data (Reporting->Insight). No data in graphs or tables. Netflow is enabled on wan+lan & appears to be configured correctly, as it was with 24.7.12

I've also used 'repair netflow data' and much later 'reset netflow data' which has not made any difference
#13
Thanks for clarifying - I'll keep an eye out on the announcements.
#14
If I toggle back to community I get prompted to install 24.7.12 - though I've not done this yet.
#15
The system shows as '25.1.b_121' which is from the beta.
I had previously installed this, but had an update problem, so had previously restored from a snaphot. Perhaps update files were lingering.

Not rebooted yet - any recommendation?