Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - gromit

#1
Quote from: BertQuodge on January 12, 2025, 08:54:03 PMJust had another OPNSense crash, just over a day from the last, right in the middle of watching a film with the family. The wife acceptance factor has reduced even further. OPNSense recovered and rebooted itself, though it took a while.

The RAM and SSD has been re-seated again, just in case. Memtest64 shows no issues.

I use LibreNMS to monitor my house equipment, and OPNSense has lots of free memory, disk space and wasn't very warm at the time of the crash. The OPNSense was near(ish) to a WiFi AP, but I moved this a few days ago in case EMI was an issue, but this hasn't helped. OPNSense seemed to be fine until I upgraded to 24.7.11, though this could be a coincidence. I've just run a "opnsense-revert -r 24.7.10 opnsense" with a reboot to see if this helps. I'm not sure if I need to run more commands to fully revert to 24.7.10. Any suggestions would be appreciated, or the number of a good divorce lawyer ;-)

I am also using OPNsense on a Protectli system (though I'm not using the exact same hardware as you; I'm using a Protectli Vault FW6A), and I also experienced random crash/reboots like you describe.  In my case, I updated the kernel via "opnsense-update -fk" to get a newer, fixed one.  That stopped the random crash/reboot behaviour for me.

I've recently updated to OPNsense 24.7.12-amd64, and I hope the behaviour remains fixed.

I post this hopefully to let you know this is probably not a hardware problem for you.
#2
Quote from: franco on December 05, 2024, 09:42:16 PM
If you reapply Unbound it should be ok again? If yes it's a race condition because these constructs are fragile, but people keep asking for them. I think we discussed exactly this when including these or at least a few times now on similar subjects.

Thank you for the response.  Restarting the Unbound service has resulted in correctly-expanded IPv6 addresses.  Good to know for next time, if this happens again.
#3
In my IPv6 setup I have IPv6 on WAN configured as DHCPv6 and all the local interfaces configured as "Track Interface". I have several ISC DHCPv6 static mappings configured for these "Track Interface" interfaces, using the "::1:2:3:4" suffix notation accordingly.

Since upgrading to 24.7.10 I've noticed that DHCPv6 static mapping hostnames are not resolved correctly by Unbound. Instead of prepending the DHCPv6-PD prefix to the suffix, it simply returns the suffix as-is. Looking at /var/unbound/host_entries.conf I see both local-data: and local-data-ptr: using just the suffix and not the full prefix+suffix one.

When I view the DHCPv6 static mappings in "Services: ISC DHCPv6: Leases" the correct, full IPv6 addresses are displayed.

Is this a regression in 24.7.10?
#4
Since about release 23.1.2 I have been getting complaints from the Monit service about the RootFs service not working.  The e-mail is as follows:

Does not exist Service RootFs

Date:        Wed, 29 Mar 2023 17:07:23
Action:      restart
Host:        my.opnsense.host
Description: unable to read filesystem '/' state

Your faithful employee,
Monit


This still doesn't work as of the recent 23.1.5 update.

I am running OPNsense 23.1.5 on a ZFS-based install.  The install was bootstrapped from a FreeBSD install via opnsense-bootstrap.  It does have a ZFS file system mounted on /, so I'm not sure how to interpret the "unable to read filesystem '/' state" message.

Is anyone else experiencing this?  If so, is there a fix?  For now, I have disabled the service check so I don't get spammed with Monit e-mails about this service failing.  I would like to have the check working, though.
#5
Quote from: franco on March 29, 2023, 08:46:48 PM
That could be it, but 23.1.5 would address that. Previously we recommended removing these tunables or moving them to /boot/loader.conf.local where they are not being triggered after bootup.

I checked the LAG configuration in my switch and note that everything checks out on both ends.  All the ports are set to Long timeout on the switch and I have Fast timeout unchecked.  Also, the default for Administrative Flow Control is Disable in the switch, which actually matches the tunable setting of dev.em.X.fc being 0 for the LAGG members on the OPNsense side.  I've changed the switch to Auto Negotiation and removed the tunables and will see if that helps matters when I do a test reboot of OPNsense later today.

Unless something else changed between 22.7.x and 23.1.x, I can't think what might have caused the OPNsense LAGG to fail to come up after a reboot.  None of the workarounds mentioned in the thread work for me.  The only thing seems to be rebooting my managed switch.
#6
Quote from: nghappiness on March 29, 2023, 02:46:50 PM
Just updated to 23.1.5.   LAG stays up after reboot.

See the information from reddit. 

https://www.reddit.com/r/opnsense/comments/1255xr8/2314_lagg_wont_come_up_after_reboot/

The Reddit link is very useful, thanks.

In my case, updating to 23.1.5 today did not fix the problem of the LAGG coming up: I still had to reboot my switch.  But, one of the Reddit replies says, "It seems to be a driver issue and custom eee/fc tunables set for your NIC," and I have dev.em.0.fc set to 0 for my NICs.  Maybe that is the cause of the problem?  I will test and report back.
#7
BTW, this is still happening for me with the latest 23.1.3 update.  Rebooting my managed switch is the easiest way for the LAGG to be established at the OPNsense end, otherwise the LAGG does not come up properly at the OPNsense end.  :(
#8
I have a 3-port LACP LAGG configured on my OPNsense system that is connected to a Cisco SG350 managed switch.  This has worked fine in previous versions of OPNsense going back years but since upgrading to 23.1 it gives problems.  Specifically, it has trouble becoming active (configured) after boot.  The individual laggports will change in status, with the flags moving through various states such as <>, <COLLECTING>, <ACTIVE, COLLECTING>, and even with some (but not all) in the desired <ACTIVE, COLLECTING, DISTRIBUTING> state.

This is what it looks like when it is properly configured:

$ ifconfig lagg0
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=4812098<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,VLAN_HWFILTER,NOMAP>
ether 00:eb:ca:c0:05:c5
laggproto lacp lagghash l2,l3,l4
laggport: em1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: em2 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: em3 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
groups: lagg
media: Ethernet autoselect
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>


It seems that even after 10 minutes or so that the LAGG is still cycling through various states with the member laggports and the interfaces built on this LAGG going UP and DOWN accordingly as it tries to configure.  The easiest way to fix it is to restart the Cisco switch.  ???

Has the way LAGG interfaces are configured changed in 23.1?  I see these two entries in the Changelog for 23.1:


  • interfaces: register LAGG, PPP, VLAN and wireless devices as plugins
  • src: assorted FreeBSD 13 stable fixes for e.g. bpf, bridge, bsdinstall ifconfig, iflib, ipfw, ipsec, lagg, netmap, pf, route and vlan components

I don't understand the import of either of those statements.  This setup worked flawlessly up to 22.7.11 and so whatever the problem is now appears to have crept in with 23.1.

Any hints or suggestions on how to get the LAGG to activate reliably are most appreciated.
#9
I don't know whether it's a "best practice" or not but in our setup we use explicit Unbound domain overrides to do forward and reverse lookups for IPv4 private addresses not handled explicitly by the local Unbound.

We have two sites joined via a site-to-site IPSec VPN.  Each site has local (non-overlapping) IPv4 subnets and a local domain name for the addresses its Unbound manages.  In the Domain Overrides for each site, there is an N.N.N.in-addr.arpa override that sends the queries to the other site's Unbound for PTR (reverse) lookups, as well as a site.local.domain. entry that forwards queries for forward lookups.  It's done vice-versa at the other site.

I guess you could use a similar approach to forward all the local IPv4 ranges you're interested in to the outer network's DNS servers.
#10
Quote from: franco on January 27, 2023, 09:10:25 AM
Does this do the job? https://github.com/opnsense/plugins/commit/16cbe99ebf

# opnsense-patch -c plugins 16cbe99ebf


Cheers,
Franco

This patch fixes the problem for me and allows NUT to start up.
#11
General Discussion / Re: Periodic.conf tunables?
November 19, 2022, 05:30:04 AM
Thank you.  I added the setting to /etc/periodic.conf.local.  I just updated to 22.7.8 and the file persisted across that update.
#12
General Discussion / Periodic.conf tunables?
November 10, 2022, 05:09:05 PM
The official OPNsense documentation about tunables (https://docs.opnsense.org/manual/settingsmenu.html) says they are for loader.conf and sysctl.conf tunables. I have a ZFS setup on which I would like to enable a periodic scrub. FreeBSD has a built-in /etc/periodic/daily/800.scrub-zfs task that is disabled by default. I'd like to enable this via the daily_scrub_zfs_enable setting.

Assuming this can't be added as a tunable to the "System: Settings: Tunables" section, the normal way in FreeBSD would be to add it to /etc/periodic.conf or /etc/periodic.conf.local. Will the latter persist across updates?

(Are there any plans to add periodic settings to "System: Settings: Tunables"?)
#13
PS: If anyone has any insight on how to get DNS tools like host and dig to work from the client side I'd be glad to hear about it.
#14
Here is an update on this from me:

Well, it appears that split-DNS was actually "largely working" for me with the macOS IKEv2 built-in client.  It was the way I was testing it that made it seem like it wasn't working at all.

"Largely working" means that resolver-based client DNS resolution works.  More simply, hostname resolution works for commands such as ssh, ping, curl, etc.  Where DNS resolution fails is for tools such as host and dig. These use the wrong resolver at the client side.  (I had been testing with host and dig.)

Although it would be nice for everything to work, I can live with the "largely working" state right now.  :)

One thing that I did actually have to do to get split-DNS (or any IPSec VPN DNS) working is to add the IPSec client network range as an explicit access list in Services -> Unbound DNS -> Access Lists.  I believe this is because IPSec is not available as an interface to select in Services -> Unbound DNS -> General -> Network Interfaces, and so doesn't get included in the "Internal" access lists.  Without this explicit access list entry, I was getting REFUSED responses to DNS lookups from the VPN client to the server.
#15
Quote from: cypher2001 on April 29, 2022, 06:01:16 PM
In my case, the IOT device pulled a DHCP address from the pool.  I clicked on the + sign next to that lease entry and made that same address static.

The system does allow this.  It was my understanding this essentially creates that static reservation.  This way that client will ALWAYS pull the same IP from the pool.

I see folks testing this by creating a new static IP outside of the pool.  In my case, I'm looking to create a DHCP static reservation by using the + side in the leases tab.


I recall this happening to me recently, too.  Converting an existing lease to static via the "+" button (even when changing the IP in the form for the static entry) yielded duplicates for that MAC address, whereas creating a static DHCP lease from scratch (without the client having a pre-existing lease) does not.

The duplicates went away.  I don't remember whether it's because I nuked the leases or whether they just went away when the initial dynamic lease expired.

(Caveat: this happened at a time when I was converting my setup over from HA dynamic DHCP to static DHCP due to DNS registration issues.)