Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - jenix

#1
24.7, 24.10 Legacy Series / Re: New Dashboard
July 26, 2024, 09:35:34 AM
The new dashboard really looks nice. Great job, i like it.

I do noticed some issues with resizing the widgets though:
- The "Firewall Live Log" widget displays as if you can expand it vertically (showing the 'arrow down' mouse cursor when you are at the bottom of the widget), but i can only expand it horizontally
- The "Interfaces" widget can't be expanded vertically which would be really nice to see the status of all my interfaces
- The "Services" widget can be expanded vertically, but the change is not saved. After clicking the 'save' button and reloading the dashboard, it is back to its around half-screen height
- The "Disk" widget can't be expanded vertically only. This would be great to switch to the detailed bar view while keeping the width aligned with all the other widgets.
- The "Traffic Graph" widget shows the mouse cursor for vertical expansion while it is not possible (the same as the "Firewall Live Log").

It would be also nice if there is a separate editing mode (the default dashboard is view only and you have to click an 'edit' button to be able to change any of the widgets). This would also be very helpful to prevent accidental widget changes on mobile.
#2
I want to give another brief update about my findings, hoping someone might find them useful.

First of all, it turns out that at least some of my issue may indeed have come from a defective disk. My SSD now died completely (the infamous 'Solaris: WARNING: Pool 'zroot' has encountered an uncorrectable I/O failure and has been suspended.' error causing OPNsense to freeze, followed by I/O timeouts and device losses during a reinstall), so some file corruption seems possible.

Furthermore (as it is most of the times), I guess there were multiple issues coming together:

- No USB serial access after config import: This seems to be a bug with the "Use USB-based serial ports" setting (in System -> Settings -> Administration). The setting is disabled during the install and after the first boot of the fresh install. If I import my config (from a system where the setting is disabled as well), it gets enabled. This results in the console not being available via USB after the reboot. To solve this, you can either disable the " Exclude console settings from import." setting during import. Or disable the auto reboot after import, go to the settings page and apply them again.

- No access via SSH / HTTP after config restore: This was an issue with my suricata config. I enabled IPS on my LAN interface (overall, you want to detect suspicious activities inside your network as well, right?). After the import / reboot suricata had some errors and blocked all access to OPNsense. This was difficult to identify, as some of the access through the firewall was possible (pinging the firewall and hosts in different net worked, DNS resolution worked, but HTTP or SSH access to the firewall or beyond did not). To solve this, you can access the shell via the console and kill suricata ('killall -TERM suricata').
If you are having similar issues after your upgrade / import of the config, I suggest testing to disable the IDS/IPS temporarily.
#3
Thanks for the reply. I finally figured out, what went wrong:
My /usr/local/etc/swanctl/swanctl.conf config somehow got corrupted. It looked absolutely correct, but couldn't be loaded by swanctl. I tried to manually reload my settings using swanctl --load-all when it complained about invalid characters at one of the lines containing my pool configuration. Those were in fact the pools I configured in the new 'connection' settings. Unfortunately, I can't reproduce the exact issue with the naming. But after deleting the Pools in the WebGUI and adding them again, the config was valid, swanctl loaded and my connection appeared in the connection overview.
#4
Hi all

I had to do a fresh install of 24.1 due to some difficulties during the upgrade. During the import of my old config, opnsense seems to discard the IPSec config of my tunnel settings (not my issue, just an annoyance). I have recreated them according to the documentation (https://docs.opnsense.org/manual/how-tos/ipsec-s2s.html), but it seems that strongswan does not 'sees' the configured connection. After enabling the tunnel and restarting IPSec, nothing happens and nothing is displayed in the "Status Overview" or "Lease Status".

The log file (set to debug) just reads:
2024-02-10T10:17:27 Informational charon 00[JOB] spawning 16 worker threads
2024-02-10T10:17:27 Informational charon 00[LIB] loaded plugins: charon aes des blowfish rc2 sha2 sha1 md4 md5 random nonce x509 revocation constraints pubkey pkcs1 pkcs7 pkcs12 pgp dnskey sshkey pem openssl pkcs8 fips-prf curve25519 xcbc cmac hmac kdf gcm drbg curl attr kernel-pfkey kernel-pfroute resolve socket-default stroke vici updown eap-identity eap-md5 eap-mschapv2 eap-radius eap-tls eap-ttls eap-peap xauth-generic xauth-eap xauth-pam whitelist addrblock counters
2024-02-10T10:17:27 Informational charon 00[CFG] loaded 0 RADIUS server configurations
2024-02-10T10:17:27 Informational charon 00[CFG] loading secrets from '/usr/local/etc/ipsec.secrets'
2024-02-10T10:17:27 Informational charon 00[CFG] loading crls from '/usr/local/etc/ipsec.d/crls'
2024-02-10T10:17:27 Informational charon 00[CFG] loading attribute certificates from '/usr/local/etc/ipsec.d/acerts'
2024-02-10T10:17:27 Informational charon 00[CFG] loading ocsp signer certificates from '/usr/local/etc/ipsec.d/ocspcerts'
2024-02-10T10:17:27 Informational charon 00[CFG] loading aa certificates from '/usr/local/etc/ipsec.d/aacerts'
2024-02-10T10:17:27 Informational charon 00[CFG] loading ca certificates from '/usr/local/etc/ipsec.d/cacerts'
2024-02-10T10:17:27 Informational charon 00[NET] enabling UDP decapsulation for IPv6 on port 4500 failed
2024-02-10T10:17:27 Informational charon 00[KNL] unable to set UDP_ENCAP: Invalid argument
2024-02-10T10:17:27 Informational charon 00[CFG] using '/sbin/resolvconf' to install DNS servers
2024-02-10T10:17:27 Informational charon 00[LIB] providers loaded by OpenSSL: default legacy
2024-02-10T10:17:27 Informational charon 00[DMN] Starting IKE charon daemon (strongSwan 5.9.13, FreeBSD 13.2-RELEASE-p9, amd64)


When modifying the phase 1 / phase 2 settings, the following entries appear in the log:
Quote2024-02-10T10:56:43   Informational   charon   05[CFG] loaded 0 RADIUS server configurations   
2024-02-10T10:56:43   Informational   charon   05[CFG] loaded 0 entries for attr plugin configuration   
2024-02-10T10:56:43   Informational   charon   05[LIB] no files found matching '/usr/local/etc/strongswan.opnsense.d/*.conf'

Phase1 and Phase2 are enabled, as well as the " Enable IPsec" option. Does anyone have an idea, what I am doing wrong?

I also tried to migrate my tunnel to the new "connection" configuration. But I was not able to find the correct documentation for my use case (an IPSec Tunnel over the internet between my OPNsense and a pfsense firewall with DynDNS). Is there a good guide for that?

Thank you already very much for your support.
#5
I tried to debug the issue again, but was unsuccessful. The firewall simply never booted to a point where I could access the device (neither via serial console, nor via ssh or webGUI). The serial console never got to the login prompt, stopping at the fingerprint of the HTTP / HTTPS access. This may be due to a bug where the "USB-based serial" option gets reenabled after importing a config (I suspect it is not saved in the config as I could not find it and thus gets enabled during import) which disables console logins on the DEC840. SSH and webGUI simply do not respond, leading to timeouts when trying to access them.

Having wasted days in trying to figuring out what is going on, I gave up on more troubleshooting. I did a fresh install with 24.1, pulled a clean config export from it, copied over the most important settings from my 23.7 config file (interfaces, aliases, firewall rules, dhcp configuration, ipsec) and (more or less) successfully imported it on the clean install. Now my firewall is back up and running with 24.1, although it did not import my ipsec settings and won't recognize them after reconfiguring them. I will create a new thread for this issue.

With multiple attempts importing different parts of my old config, I suspect that my IPS / Surricata config may have been the culprit leading to my issues. When trying to restore the <IPS> block of my config, my firewall started to act up again during boot. But I didn't test this further to get decisive proof for that suspicion.

Anyhow, for me this problem is solved.
#6
Thanks for getting back to me. I'm not sure if DNS is required at that early stage. But i did enable unbound (usually I'm using a separate dns server in my network) to make sure. It did not made a difference.

After some more testing, I can narrow the issue down somewhat:
- Everything works fine up to 23.7.11. I can upgrade to this release without issues
- Starting with 23.7.12 (both the initial release as well as the hotfixes) and 24.1, the firewall gets extremely slow as soon as it loads my config during boot (starting with the ">>> Invoking early script 'upgrade'" output on console).
- I can reproduce this behaviour consistently when upgrading from 23.7 (starting after the reboot following the upgrade), restoring my config to a fresh install of 24.1 (starting after the reboot) or loading my config while booting 24.1 from a usb drive.

As mentioned above, as soon as 23.7.12 / 24.1 loads my config, everything gets extremely slow (steps taking minutes instead of usually seconds), leading to a boot time of around 30 minutes (instead of roughly 1 minute).
When completing the boot, the firewall is not accessible via WebGUI or SSH, probably as the services are occupied (or even crashed).

Booting the firewall with the serial console attached does not show any conclusive errors. I do see errors like "Generating configuration: error in configd communication %s, see syslog for details" sometimes. But I believe them to be symptoms of the system being occupied as they do not occur on every boot.

Is there any way to increase the console output during boot? I'd like to figure out what is happening and ideally what I need to fix in my config to get 24.1 working.

#7
After spending the morning testing different scenarios, I'm hoping that I have found some new information for my issue.

I got in touch with the Deciso support (as my DEC840 is just still under warrenty). They suggested to update the bios and try again, which i did. Whilst my issue is still unsolved, I noticed the following behaviour:

My DEC boots fine with 23.7 (the major release version available to download). I can import my config and reboot the firewall without issue.
When I do a clean install with 24.1, the firewall also boots normal.

The issue arises, when the firewall tries to boot with my productive config, either after upgrading to 23.7.12 or after importing the config to a fresh 24.1 installation. Then, the boot process is extremely slow (e.g. configuring the routes takes 2-3 minutes instead of mere seconds) which initially lead me to believe that my system froze. But given enough waiting time (the boot process takes around 30 minutes), the firewall manages to complete the boot process. Yet, I'm then still not able to login and analyze further, as my HTTP or SSH access attempts time out. I once managed to log into the WebGUI while the firewall was still complete the interface configuration, but lost the access shortly after.
This lets me wonder if there is an issue either with my configuration or the config migration during the update. It feels like the firewall gets fully occupied loading / migrating the configuration during boot that it struggles to run.

Now I'm at a loss how to proceed. I don't see a possibility how to figure out which part of my configuration (if any) is responsible for my issues. Recreating my whole configuration from scratch in 24.1 seems pretty undesirable.
I already tried to skip non-essential configuration parts during import (e.g. IPS), but this resulted in no change. I also imported the 23.7 config into the fresh 24.1 install, instantly exported it again before reboot and looked at a diff to see what changed. But I can't see any obvious changes which hint at problematic config parts.

Does anyone has an idea, how I can continue analyzing this issue to find out, what causes it?

Thanks already.
#8
Thanks for the reply.
While a hardware defect with the disk certainly is possible, my assumption for now is more of a software issue. What are the odds that the filesystem (which should handle disk corruption to a certain point) writes the same file to the same (corrupt) block 3 times in a row? Nevertheless, I contacted Deciso about warranty as my device is just shy of the 2 years of age.

As I said, I assume more of a software issue during the upgrade, like a faulty config migration. Or maybe a race condition which blocks the access to a system config, prevents it from being written correctly and thus leading to failing to boot.
In this case, installing 24.1 would solve the issue as there would be no upgrade / migration steps which can fail. But without any debug information what is going on when the system hangs during boot, this is hard to tell.

#9
Thanks, I already tried this, sadly without any luck. I don't believe the SFPs or NICs are the problem rather than whatever OPNsense tries to initialize after them.
I suspect the dirty filesystem during the shutdown causes a corruption which prevents the system from correctly reading some files during boot and thus getting stuck. It does not crash (when I unplug the SFPs, this gets noted by the kernel and logged as "Link state changed to down"), but also does not continue to load the system.

Unfortunately, I'm not experienced enough to analyze the issue further. Is there a way to get more information about the boot process to figure out, which file (if my assumption is correct) is corrupt? Is there a possibility to run fsck when single user mode won't boot as well (e.g. from the usb install drive)? Can I update to a previous minor version (23.7.11)?

At this point, with 24.1 around the corner I suspect the best way forward is to wait for the new major version, make a fresh install and restore my config.
#10
Thanks for the reply. My initial installation was the one that the hardware came with (about 2 years ago), converted to community edition once the subscription ended. I then reinstalled with the serial version which gave me the observed error.
#11
EDIT: I adjusted the topic, as I now believe my initial assumptions (device freezes during boot) are wrong, instead the boot process is extremely slow (around 30 minutes). I suspect the culprit to be either my configuration or the config migration. Please see my latest post (https://forum.opnsense.org/index.php?topic=38266.msg188463#msg188463) for more details.

Initial Post:
So the upgrade to 23.7.12 bricks my DEC840 firewall. I can reproduce the issue as it happened 3 times in a row (initial upgrade, after the fresh installation and once again to verify).
The installation of the upgrades completes, but there seems to be an issue when syncing the filesystem during the shutdown. Console output reads:
Waiting (max 60 seconds) for system process `vnlru' to stop... done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining... 6 fsync: giving up on dirty (error = 35) 0xfffff800017f73d0: type VCHR
    usecount 1, writecount 0, refcount 435 seqc users 0 rdev 0xfffff80001785000
    hold count flags ()
    flags ()
    v_object 0xfffff800017e4e70 ref 0 pages 12163 cleanbuf 432 dirtybuf 1
    lock type mntfs: EXCL by thread 0xfffffe00917dee40 (pid 16, syncer, tid 100092)
3 2 0 0 done
All buffers synced.


Then after the reboot, the system gets stuck after enabling the interfaces:
uart0: <8250 or 16450 or compatible> port 0x3f8-0x3ff irq 3 flags 0x10 on acpi0
hwpstate0: <Cool`n'Quiet 2.0> on cpu0
Timecounter "TSC" frequency 2096061312 Hz quality 1000
Timecounters tick every 1.000 msec
Trying to mount root from ufs:/dev/ada0p2 [rw]...
ugen0.1: <AMD XHCI root HUB> at usbus0
uhub0 on usbus0
uhub0: <AMD XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <TS256GMTS952T2 02J0T4GB> ACS-2 ATA SATA 3.x device
ada0: Serial Number G752440056
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 1024bytes)
ada0: Command Queueing enabled
ada0: 244198MB (500118192 512 byte sectors)
uhub0: 8 ports with 8 removable, self powered
ax1: Link is UP - 10Gbps/Full - flow control off
ax1: link state changed to UP
ax0: Link is UP - 10Gbps/Full - flow control off
ax0: link state changed to UP


After that, changes to the interfaces are detected (showing "Link us UP" / "Link is DOWN" when plugging cables in / out), but the system never continues the boot. Both Safe Mode and Single User Mode get stuck at the same stage.

The only solution for me was to reinstall opnsense 23.7 via USB. But as mentioned above, trying to upgrade the fresh install results in the same issue.

I'm not sure what more info I can provide to this issue. Please let me know if I can contribute logs or additional information.
As this is my primary firewall, testing upgrades is difficult (but not impossible if needs be).

Any ideas, what the issue is and how I can solve it?
Thanks already and kind regards

Jens