Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - ze0Ood0O

#1
I think I am starting to understand this problem more.  I originally installed 25.1 on a internal mmcsd0 drive, then I installed a internal nda0 drive and installed OPNsense on that.  However looking at efibootmgr -v the UEFI boot order is still booting off of the mmcsd0 drive which I think caused some things to be updated on the mmc disk and some things to be updated on the nvme disk leading to a entire mess.
Currently it is booting off of mmcsd0
# efibootmgr -v
Boot to FW : false
BootCurrent: 0004
Timeout    : 5 seconds
BootOrder  : 0004, 0003
+Boot0004* UEFI OS HD(1,GPT,8aa2d5db-721c-11f0-9374-6462662f3ea6,0x28,0x82000)/File(\EFI\BOOT\BOOTX64.EFI)
                      gpt/efiboot0:/EFI/BOOT/BOOTX64.EFI /boot/efi//EFI/BOOT/BOOTX64.EFI
# gpart list
1. Name: mmcsd0p1
   Mediasize: 272629760 (260M)
   Sectorsize: 512
   Stripesize: 512
   Stripeoffset: 0
   Mode: r1w1e2
   efimedia: HD(1,GPT,8aa2d5db-721c-11f0-9374-6462662f3ea6,0x28,0x82000)
   rawuuid: 8aa2d5db-721c-11f0-9374-6462662f3ea6
   rawtype: c12a7328-f81f-11d2-ba4b-00a0c93ec93b
   label: efiboot0
   length: 272629760
   offset: 20480
   type: efi
   index: 1
   end: 532519
   start: 40
When it should be booting off of
1. Name: nda0p1
   Mediasize: 272629760 (260M)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 20480
   Mode: r0w0e0
   efimedia: HD(1,GPT,bfe40c7a-721a-11f0-bd70-6462662f3ea6,0x28,0x82000)
   rawuuid: bfe40c7a-721a-11f0-bd70-6462662f3ea6
   rawtype: c12a7328-f81f-11d2-ba4b-00a0c93ec93b
   label: efiboot0
   length: 272629760
   offset: 20480
   type: efi
   index: 1
   end: 532519
   start: 40
#2
Diving into this more it looks like the Wireguard interface didn't come up as well, so I tried to reboot into a snapshot but it won't boot into it.  When I get into the boot menu screen it says it is 25.1 and doesn't have option 8, then it continues to boot and it loads 25.7.1 and it says it failed to load if_wg.ko because it is not available or version mismatch.
#3
Health audit came back clean, not sure what else to look at
***GOT REQUEST TO AUDIT HEALTH***
Currently running OPNsense 25.7.1_1 (amd64) at Tue Aug  5 06:56:44 CDT 2025
>>> Root file system: zroot/ROOT/default
>>> Check installed kernel version
Version 25.7 is correct.
>>> Check for missing or altered kernel files
No problems detected.
>>> Check installed base version
Version 25.7 is correct.
>>> Check for missing or altered base files
No problems detected.
>>> Check installed repositories
OPNsense (Priority: 11)
>>> Check installed plugins
os-ddclient 1.27_4
os-nut 1.9
os-smart 2.3_1
os-theme-cicada 1.40
>>> Check locked packages
No locks found.
>>> Check for missing package dependencies
Checking all packages: .......... done
>>> Check for missing or altered package files
Checking all packages: .......... done
>>> Check for core packages consistency
Core package "opnsense" at 25.7.1_1 has 68 dependencies to check.
Checking packages: ..................................................................... done
***DONE***
#4
Upgraded to 25.7.1 from 25.1.12 and now unbound won't start.  Unbound logs
29>1 2025-08-05T05:38:45-05:00 firewall unbound 79276 - [meta sequenceId="1"] [79276:0] notice: init module 0: python
<27>1 2025-08-05T05:38:45-05:00 firewall unbound 79276 - [meta sequenceId="2"] [79276:0] error: python exception in Py_InitializeFromConfig: init_fs_encoding: failed to get the Python codec of the filesystem encoding
<27>1 2025-08-05T05:38:45-05:00 firewall unbound 79276 - [meta sequenceId="3"] [79276:0] error: module init for module python failed
<26>1 2025-08-05T05:38:45-05:00 firewall unbound 79276 - [meta sequenceId="4"] [79276:0] fatal error: failed to init modules
<165>1 2025-08-05T05:38:46-05:00 firewall unbound 80536 - [meta sequenceId="5"] Backgrounding unbound logging backend.
<163>1 2025-08-05T05:38:56-05:00 firewall unbound 80536 - [meta sequenceId="6"] Unable to open pipe. This is likely because Unbound isn't running.
<29>1 2025-08-05T05:38:57-05:00 firewall unbound 21068 - [meta sequenceId="7"] [21068:0] notice: init module 0: python
<27>1 2025-08-05T05:38:57-05:00 firewall unbound 21068 - [meta sequenceId="8"] [21068:0] error: python exception in Py_InitializeFromConfig: init_fs_encoding: failed to get the Python codec of the filesystem encoding
<27>1 2025-08-05T05:38:57-05:00 firewall unbound 21068 - [meta sequenceId="9"] [21068:0] error: module init for module python failed
<26>1 2025-08-05T05:38:57-05:00 firewall unbound 21068 - [meta sequenceId="10"] [21068:0] fatal error: failed to init modules
<165>1 2025-08-05T05:38:58-05:00 firewall unbound 22528 - [meta sequenceId="11"] Backgrounding unbound logging backend.
<163>1 2025-08-05T05:39:08-05:00 firewall unbound 22528 - [meta sequenceId="12"] Unable to open pipe. This is likely because Unbound isn't running.
<29>1 2025-08-05T05:39:11-05:00 firewall unbound 65213 - [meta sequenceId="1"] [65213:0] notice: init module 0: python
<27>1 2025-08-05T05:39:11-05:00 firewall unbound 65213 - [meta sequenceId="2"] [65213:0] error: python exception in Py_InitializeFromConfig: init_fs_encoding: failed to get the Python codec of the filesystem encoding
<27>1 2025-08-05T05:39:11-05:00 firewall unbound 65213 - [meta sequenceId="3"] [65213:0] error: module init for module python failed
<26>1 2025-08-05T05:39:11-05:00 firewall unbound 65213 - [meta sequenceId="4"] [65213:0] fatal error: failed to init modules
<165>1 2025-08-05T05:39:12-05:00 firewall unbound 66644 - [meta sequenceId="5"] Backgrounding unbound logging backend.
<163>1 2025-08-05T05:39:22-05:00 firewall unbound 66644 - [meta sequenceId="6"] Unable to open pipe. This is likely because Unbound isn't running.
<29>1 2025-08-05T05:41:00-05:00 firewall unbound 54540 - [meta sequenceId="1"] [54540:0] notice: init module 0: python
<27>1 2025-08-05T05:41:00-05:00 firewall unbound 54540 - [meta sequenceId="2"] [54540:0] error: python exception in Py_InitializeFromConfig: init_fs_encoding: failed to get the Python codec of the filesystem encoding
<27>1 2025-08-05T05:41:00-05:00 firewall unbound 54540 - [meta sequenceId="3"] [54540:0] error: module init for module python failed
<26>1 2025-08-05T05:41:00-05:00 firewall unbound 54540 - [meta sequenceId="4"] [54540:0] fatal error: failed to init modules
<165>1 2025-08-05T05:41:00-05:00 firewall unbound 56288 - [meta sequenceId="5"] Backgrounding unbound logging backend.
<165>1 2025-08-05T05:41:04-05:00 firewall unbound 56288 - [meta sequenceId="6"] Closing logger
<29>1 2025-08-05T05:41:05-05:00 firewall unbound 91858 - [meta sequenceId="7"] [91858:0] notice: init module 0: python
<27>1 2025-08-05T05:41:05-05:00 firewall unbound 91858 - [meta sequenceId="8"] [91858:0] error: python exception in Py_InitializeFromConfig: init_fs_encoding: failed to get the Python codec of the filesystem encoding
<27>1 2025-08-05T05:41:05-05:00 firewall unbound 91858 - [meta sequenceId="9"] [91858:0] error: module init for module python failed
<26>1 2025-08-05T05:41:05-05:00 firewall unbound 91858 - [meta sequenceId="10"] [91858:0] fatal error: failed to init modules

If I try and start unbound by hand I get this error:
root@firewall:/var/log/resolver # /usr/local/sbin/unbound -dc /var/unbound/unbound.conf
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Python path configuration:
  PYTHONHOME = (not set)
  PYTHONPATH = (not set)
  program name = 'unbound'
  isolated = 0
  environment = 1
  user site = 1
  safe_path = 0
  import site = 0
  is in build tree = 0
  stdlib dir = '/usr/local/lib/python3.11'
  sys._base_executable = ''
  sys.base_prefix = '/usr/local'
  sys.base_exec_prefix = '/usr/local'
  sys.platlibdir = 'lib'
  sys.executable = ''
  sys.prefix = '/usr/local'
  sys.exec_prefix = '/usr/local'
  sys.path = [
    '/usr/local/lib/python311.zip',
    '/usr/local/lib/python3.11',
    '/usr/local/lib/python3.11/lib-dynload',
  ]
root@firewall:/var/log/resolver #

Any help or pointers is greatly appreciated.
#5
Thanks for the suggestion, I added that and rebooted, still the same results
# sysctl -a | grep igb.2 | grep eee
dev.igb.2.eee_control: 1
# pciconf -lbcevV igb2@pci0:3:0:0
igb2@pci0:3:0:0: class=0x020000 rev=0x03 hdr=0x00 vendor=0x8086 device=0x157b subvendor=0x8086 subdevice=0x0000
    vendor     = 'Intel Corporation'
    device     = 'I210 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 32, base 0x91200000, size 131072, enabled
    bar   [18] = type I/O Port, range 32, base 0x3000, size 32, enabled
    bar   [1c] = type Memory, range 32, base 0x91220000, size 16384, enabled
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks
    cap 11[70] = MSI-X supports 5 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR RO NS
                 max read 512
                 link x1(x1) speed 2.5(2.5) ASPM L1(L0s/L1)
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected
    ecap 0003[140] = Serial 1 00e067ffff22f83e
    ecap 0017[1a0] = TPH Requester 1
#6
Disabling ASPM via the Tunables did not seem to disable ASPM on the intefaces
igb0@pci0:1:0:0: class=0x020000 rev=0x03 hdr=0x00 vendor=0x8086 device=0x157b subvendor=0x8086 subdevice=0x0000
    vendor     = 'Intel Corporation'
    device     = 'I210 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 32, base 0x91000000, size 131072, enabled
    bar   [18] = type I/O Port, range 32, base 0x1000, size 32, enabled
    bar   [1c] = type Memory, range 32, base 0x91020000, size 16384, enabled
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks
    cap 11[70] = MSI-X supports 5 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR RO NS
                 max read 512
                 link x1(x1) speed 2.5(2.5) ASPM L1(L0s/L1)
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected
    ecap 0003[140] = Serial 1 00e067ffff22f83c
    ecap 0017[1a0] = TPH Requester 1
and the system became hung again because of that thread using 100% system CPU, so I have gone and reinstalled 24.7 and restored from backup.  If anyone knows how to disable ASPM via coreboot or another way in FreeBSD I would love to try and see if that resolved my problems so I can upgrade to 25.
#7
Quote from: meyergru on March 18, 2025, 12:27:23 PMI had hangs like that because my hardware could not handle ASPM correctly. After disabling that in the BIOS, the problem went away.
This system is running coreboot for the BIOS, I am not sure how to disable ASPM via coreboot currently so I think I disabled ASPM via the tunables section of OPNsense
System -> Settings -> Tunables
Tunable: hw.pci.enable_aspm
Value: 0
Hit apply and then rebooted the firewall.  Currently trying to verify if ASPM is disabled or not.
#8
Quote from: meyergru on March 18, 2025, 12:01:19 PMWhich NIC hardware?
Thanks for the reply!  This firewall has the Intel(R) I210 NICs.

# sysctl -a | grep -E 'dev.(igb|ix|em).*.%desc:'
dev.igb.3.%desc: Intel(R) I210 Flashless (Copper)
dev.igb.2.%desc: Intel(R) I210 Flashless (Copper)
dev.igb.1.%desc: Intel(R) I210 Flashless (Copper)
dev.igb.0.%desc: Intel(R) I210 Flashless (Copper)
#9
Good morning.  I upgraded to 25.1.2 from 24.7 a couple weeks ago and did not notice any problems right away, but ~4 days after upgrading my firewall became unresponsive and was intermittently routing traffic.  Investigating shows kernel{if_io_tqg_2} seemingly hung as it uses 100% of a core causing the load on the box to gradually increase until services stop responding.

198 threads:   7 running, 177 sleeping, 14 waiting
CPU 0:  1.2% user,  0.0% nice,  1.9% system,  0.0% interrupt, 96.9% idle
CPU 1:  0.4% user,  0.0% nice,  0.8% system,  0.0% interrupt, 98.8% idle
CPU 2:  0.0% user,  0.0% nice,  100% system,  0.0% interrupt,  0.0% idle
CPU 3:  0.4% user,  0.0% nice,  1.5% system,  0.0% interrupt, 98.1% idle
Mem: 130M Active, 1069M Inact, 1172M Wired, 343M Buf, 1607M Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
    0 root        -60    -     0B   704K CPU2     2   5:28  99.97% kernel{if_io_tqg_2}
    2 root        -60    -     0B    64K WAIT     1   1:26   1.03% clock{clock (0)}
36345 unbound      20    0   278M   217M kqread   0   0:21   0.68% unbound{unbound}
    0 root        -60    -     0B   704K -        1   0:24   0.24% kernel{if_io_tqg_1}

I have not found a way to recover from this other than rebooting the firewall.  After rebooting the firewall it became unresponsive again within 2 hours.  I rebooted the firewall again and it was fine for another ~3 days and then the same problem occurred.  I upgraded to 25.1.3 as soon as it came out with hopes it would resolve my problem but it did not.

I've Googled around and not found a definitive answer but did find this post https://www.reddit.com/r/PFSENSE/comments/1ags2z6/pfsense_locks_after_a_few_days_routes_traffic_but/ which is very similar but obviously pfsense and different software versions with no clear solution other than 'patched'.

I did not see this in 24.7.  Does anyone have some ideas on what I could look at next to help diagnose and resolve this?  Any help is greatly appreciated.