Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - DocGonzo74

#1
24.7, 24.10 Series / Re: Upgrade Failed - Fail to rename
September 06, 2024, 05:42:31 PM
I got it to work.

Had to do a pkg update from the CLI.

Hit a few more snags.. just moved the problematic directories to a new name and tried again.. took a few tries.
#2
24.7, 24.10 Series / Re: Upgrade Failed - Fail to rename
September 06, 2024, 05:24:22 PM
Continuing troubleshooting.

I moved and deleted the problem file..   Then another file hit the same error.  Then I moved and deleted that one.. a few times until the package installed, but then I got another failure:

perl5-5.36.3_1: missing file /usr/local/share/licenses/perl5-5.36.3_1/ART10
perl5-5.36.3_1: missing file /usr/local/share/licenses/perl5-5.36.3_1/GPLv1+
perl5-5.36.3_1: missing file /usr/local/share/licenses/perl5-5.36.3_1/LICENSE
perl5-5.36.3_1: missing file /usr/local/share/licenses/perl5-5.36.3_1/catalog.mk
#3
24.7, 24.10 Series / Upgrade Failed - Fail to rename
September 06, 2024, 05:12:46 PM
I am trying to update my Opnsense system (baremetal) to 24.7.3_1-amd64

Failure below.  Anyone seen this before?  Any ideas?


***GOT REQUEST TO UPDATE***
*** Truncated ***
Number of packages to be upgraded: 31

The process will require 3 MiB more space.
[1/31] Upgrading perl5 from 5.36.3_1 to 5.36.3_2...
[1/31] Extracting perl5-5.36.3_2: .......... done
perl5-5.36.3_1: missing file /usr/local/share/licenses/perl5-5.36.3_1/ART10
perl5-5.36.3_1: missing file /usr/local/share/licenses/perl5-5.36.3_1/GPLv1+
perl5-5.36.3_1: missing file /usr/local/share/licenses/perl5-5.36.3_1/LICENSE
perl5-5.36.3_1: missing file /usr/local/share/licenses/perl5-5.36.3_1/catalog.mk
pkg-static: Fail to rename /usr/local/lib/perl5/5.36/.pkgtemp.HTTP.9MjtVee74MPg -> /usr/local/lib/perl5/5.36/HTTP:Invalid argument
Starting web GUI...done.
Generating RRD graphs...done.
***DONE***
#4
High availability / 24.7 HA with KEA DHCP
July 31, 2024, 08:54:12 PM
I was running into issues trying to migrate to KEA DHCP in my OpnSense HA environment. It's still somewhat half-baked, but I have it working well enough for my purposes.   

Word of caution.  When you change something in KEA DHCP on your master node and a config sync happens, some settings get improperly changed on the backup.  I'll highlight these as I walk through the install.

I did the whole configuration on the primary and then sync'd to the backup.   All of this is done on the Primary .. i think I called it master earlier.  Moving on.

Configure Control Agent:
I used my CARP IP address (local IP) and left it port 8000

Configure KEA DHCP > Settings
I leave it disabled until done.   Then I disable the ISC instances and then enable the KEA DHCP.  A PITA to change back when I'm testing, but it is what it is.

Interfaces,
I checked all my inside network interfaces (LAN, IoT, Guest, Lab). When I did this, I kept having issues where clients wouldn't get an address intermittently.  I figured my CARP interface might help somehow.. I'm not sure how I got there, but when I assigned my CARP interface to the group, it started working.   My CARP is directly connected between my firewalls.. no man in the middle worries there.. Unless my cats are up to something.

The valid lifetime (lease timer) is set to 4000 by default.   I feel that's too low.  I'm running 7200.  That said, I tried something like 28800 and a bunch of my IoT devices (camera, alarms) lost their leases and couldn't reconnect.   I checked the leases, and the clients were reporting a 0 lease timer.  I'm guessing the IoT devices are hard-coded to some lower number and they don't understand the longer lease time.


For High Availabliity,
check "Enabled".  Here, you have to enter your full server name (PRIMARY.awesomeserver.com).  I had this just set to PRIMARY and HA wouldn't work until I matched my hostname.


The next tab is Subnets.  I left this default and all kinds of oddness occurred.   What I found is, by default, the option data is checked and the default values were hidden.   When I unchecked this, I saw that KEA was giving my physical IP and not my virtual interface, so the default gateway was wrong.  I also had to fix DNS and NTP.  Kea assumes everything is in a single server configuration, so defaults match a non-HA environment.


Reservations.  I have about 100. There is a tool out there that will convert your ISC DHCP reservations to KEA dhcp reservations.  It worked for me : https://forum.opnsense.org/index.php?topic=39342.0

When you add new reservations, make sure you use the a1:b2:c3 format and not CAPS or -.  I put some in manually with - and caps and they didn't work.


Finally HA peers:  This is another one that was part of HA working properly.. it's right there in the title.

You have to create both the PRIMARY and BACKUP HA peers  and assign them the roles primary/standby.    (Another thing I think is half-baked.  The active node should consider itself the primary when it's the HA MASTER.  It appears that the secondary is always considered secondary, regardless of it's current HA state.   


When I first set this up, I assumed  you only had to create the remote peer.   I was looking over everything and said "why not".. set up both primary and secondary.. and poof.  It worked.




I hope this helps someone set up KEA DHCP with HA  on Opnsense.   Figured I'd type it up, stream of consciousness style in case someone else is stuck like I was.
#5
24.7, 24.10 Series / Re: 24.7 KEA DHCP w/ HA
July 31, 2024, 08:53:35 PM
Moving this to the HA thread.. just noticed it again.
#6
24.7, 24.10 Series / 24.7 KEA DHCP w/ HA
July 31, 2024, 01:03:17 AM
I was running into issues trying to migrate to KEA DHCP. It's still somewhat half-baked, but I have it working well enough for my purposes.   

Word of caution.  When you change something in KEA DHCP on your master node and a config sync happens, some settings get improperly changed on the backup.  I'll highlight these as I walk through the install. 

I did the whole configuration on the primary and then sync'd to the backup.   All of this is done on the Primary .. i think I called it master earlier.  Moving on.

Configure Control Agent: 
I used my CARP IP address (local IP) and left it port 8000

Configure KEA DHCP > Settings
I leave it disabled until done.   Then I disable the ISC instances and then enable the KEA DHCP.  A PITA to change back when I'm testing, but it is what it is. 

Interfaces,
I checked all my inside network interfaces (LAN, IoT, Guest, Lab). When I did this, I kept having issues where clients wouldn't get an address intermittently.  I figured my CARP interface might help somehow.. I'm not sure how I got there, but when I assigned my CARP interface to the group, it started working.   My CARP is directly connected between my firewalls.. no man in the middle worries there.. Unless my cats are up to something. 

The valid lifetime (lease timer) is set to 4000 by default.   I feel that's too low.  I'm running 7200.  That said, I tried something like 28800 and a bunch of my IoT devices (camera, alarms) lost their leases and couldn't reconnect.   I checked the leases, and the clients were reporting a 0 lease timer.  I'm guessing the IoT devices are hard-coded to some lower number and they don't understand the longer lease time.


For High Availabliity,
check "Enabled".  Here, you have to enter your full server name (PRIMARY.awesomeserver.com).  I had this just set to PRIMARY and HA wouldn't work until I matched my hostname.


The next tab is Subnets.  I left this default and all kinds of oddness occurred.   What I found is, by default, the option data is checked and the default values were hidden.   When I unchecked this, I saw that KEA was giving my physical IP and not my virtual interface, so the default gateway was wrong.  I also had to fix DNS and NTP.  Kea assumes everything is in a single server configuration, so defaults match a non-HA environment.


Reservations.  I have about 100. There is a tool out there that will convert your ISC DHCP reservations to KEA dhcp reservations.  It worked for me : https://forum.opnsense.org/index.php?topic=39342.0

When you add new reservations, make sure you use the a1:b2:c3 format and not CAPS or -.  I put some in manually with - and caps and they didn't work. 


Finally HA peers:  This is another one that was part of HA working properly.. it's right there in the title. 

You have to create both the PRIMARY and BACKUP HA peers  and assign them the roles primary/standby.    (Another thing I think is half-baked.  The active node should consider itself the primary when it's the HA MASTER.  It appears that the secondary is always considered secondary, regardless of it's current HA state.   


When I first set this up, I assumed  you only had to create the remote peer.   I was looking over everything and said "why not".. set up both primary and secondary.. and poof.  It worked.




I hope this helps someone set up KEA DHCP with HA  on Opnsense.   Figured I'd type it up, stream of consciousness style in case someone else is stuck like I was.









#7
24.7, 24.10 Series / Re: Kernel panic after upgrade
July 31, 2024, 12:44:19 AM
I ran into the exact same issue.  I ended up grabbing a config backup and re-installing Opnsense 24.7 from scratch, then doing a config import.   I was crashing randomly after 2-20 minutes before.  So far a couple of days since the change and it's still working.

Something corrupted with the upgrade process i suspect.. or goblins.
#8
I have ZeroTier configured with my OpnSense firewall as an endpoint. The VPN works great as a default gateway and a remote access solution. 

The problem I'm having is that all of my interfaces are trying to establish connections to the ZeroTier network.  All of the sessions are being caught and denied by the auto-created default deny rule.  My IPS is seeing these sessions as well.

Has anyone else seen this activity? If so, are you blocking it or just chalking it up to the ZeroTier plugin being a bit chatty and trying to talk out of every port?

2024-07-06T10:05:53.509258-0400   2039784   allowed   1_LAN   192.168.1.253   9993   103.195.103.66   9993   ET INFO ZeroTier Related Activity (udp)   
2024-07-06T10:05:53.509241-0400   2039784   allowed   1_LAN   192.168.1.254   9993   103.195.103.66   9993   ET INFO ZeroTier Related Activity (udp)   
2024-07-06T10:05:53.509228-0400   2039784   allowed   1_LAN   10.254.254.253   9993   103.195.103.66   9993   ET INFO ZeroTier Related Activity (udp)   
2024-07-06T10:05:53.509215-0400   2039784   allowed   1_LAN   172.16.200.253   9993   103.195.103.66   9993   ET INFO ZeroTier Related Activity (udp)   
2024-07-06T10:05:53.509195-0400   2039784   allowed   1_LAN   172.16.100.253   9993   103.195.103.66   9993   ET INFO ZeroTier Related Activity (udp)   
2024-07-06T10:05:53.509177-0400   2039784   allowed   1_LAN   172.16.1.253   9993   103.195.103.66   9993   ET INFO ZeroTier Related Activity (udp)   
2024-07-06T10:05:53.509154-0400   2039784   allowed   1_LAN   172.16.200.1   9993   103.195.103.66   9993   ET INFO ZeroTier Related Activity (udp)   
2024-07-06T10:05:53.509135-0400   2039784   allowed   1_LAN   172.16.100.1   9993   103.195.103.66   9993   ET INFO ZeroTier Related Activity (udp)   
2024-07-06T10:05:53.509112-0400   2039784   allowed   1_LAN   172.16.1.1   9993   103.195.103.66   9993   ET INFO ZeroTier Related Activity (udp
#9
I updated the BIOS to 2801 a few weeks back trying to solve the issue.

Weirdness, though.

I did the latest update after a few tries, to 24.1.8.  After that update, my system has been responsive for 24 hours.  I re-installed the rest of my RAM and have restarted to see if the system remains stable. 

I can't imagine something in that new build fixed my issue, but I've seen much more weirdness in the past.

Fingers crossed.
#10
I ran a memtest and all 4 passes came back clean.

Memory is 64GB (4 x 16) of Patriot DDR4 memory 2400. 

CPU is an i9-9900K with a passive cooling tower cooler.  I ran this as a gaming rig for a while and repurposed it. Never had an overheating problem on this system.

Found SMBIOS entry point in EFI, reading table from /dev/mem.
SMBIOS 3.2 present.

Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
        Manufacturer: ASUSTeK COMPUTER INC.
        Product Name: ROG STRIX Z390-E GAMING
        Version: Rev 1.xx
        Serial Number: 190449688100832
        Asset Tag: Default string


NIC info:
root@PRIMARY:/home/RitchieHome #  sysctl -a | grep -E 'dev.(igb|ix|em).*.%desc:'
dev.em.0.%desc: Intel(R) I219-V CNP(7)
dev.igb.7.%desc: Intel(R) PRO/1000 ET 82576 (Quad Copper)
dev.igb.6.%desc: Intel(R) PRO/1000 ET 82576 (Quad Copper)
dev.igb.5.%desc: Intel(R) PRO/1000 ET 82576 (Quad Copper)
dev.igb.4.%desc: Intel(R) PRO/1000 ET 82576 (Quad Copper)
dev.igb.3.%desc: Intel(R) PRO/1000 ET 82576 (Quad Copper)
dev.igb.2.%desc: Intel(R) PRO/1000 ET 82576 (Quad Copper)
dev.igb.1.%desc: Intel(R) PRO/1000 ET 82576 (Quad Copper)
dev.igb.0.%desc: Intel(R) PRO/1000 ET 82576 (Quad Copper)


The system works fine save for the webgui.  Once it crashes, I have to reboot to get it to come back. 
configctl webgui restart renew doesn't work when it crashes.





#11
I have an issue where, after anywhere from 5 to 25 minutes, my Opnsense GUI will hang and become unresponsive.  When I try to issue a reboot from the CLI, the reboot hangs after it shuts down services.  When I reboot, the GUI is responsive again for some time. 

This is a baremetal server, not VM.   Currently running a memtest (2/4 passes so far came back clear). 

I'm running out of ideas on what to try to fix this.

System Firmware Reporter shows the crash logs.  At the end of /var/crash/textdump.tar.1, I get this output:

Fatal trap 12: page fault while in kernel mode
cpuid = 10; apic id = 0a
fault virtual address   = 0x0
fault code      = supervisor read data, page not present
instruction pointer   = 0x20:0xffffffff8239957f
stack pointer           = 0x28:0xfffffe00e1abc600
frame pointer           = 0x28:0xfffffe00e1abc660
code segment      = base 0x0, limit 0xfffff, type 0x1b
         = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags   = interrupt enabled, resume, IOPL = 0
current process      = 0 (if_io_tqg_10)
trap number      = 12
panic: page fault
cpuid = 10
time = 1711851180
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00e1abc3c0
vpanic() at vpanic+0x151/frame 0xfffffe00e1abc410
panic() at panic+0x43/frame 0xfffffe00e1abc470
trap_fatal() at trap_fatal+0x387/frame 0xfffffe00e1abc4d0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00e1abc530
calltrap() at calltrap+0x8/frame 0xfffffe00e1abc530
--- trap 0xc, rip = 0xffffffff8239957f, rsp = 0xfffffe00e1abc600, rbp = 0xfffffe00e1abc660 ---
pf_test_state_udp() at pf_test_state_udp+0x28f/frame 0xfffffe00e1abc660
pf_test() at pf_test+0xc57/frame 0xfffffe00e1abc7d0
pf_check_in() at pf_check_in+0x25/frame 0xfffffe00e1abc7f0
pfil_run_hooks() at pfil_run_hooks+0x97/frame 0xfffffe00e1abc830
ip_tryforward() at ip_tryforward+0x181/frame 0xfffffe00e1abc8f0
ip_input() at ip_input+0x724/frame 0xfffffe00e1abc980
netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe00e1abc9d0
ether_demux() at ether_demux+0x159/frame 0xfffffe00e1abca00
ng_ether_rcv_upper() at ng_ether_rcv_upper+0x8c/frame 0xfffffe00e1abca20
ng_apply_item() at ng_apply_item+0x2bf/frame 0xfffffe00e1abcab0
ng_snd_item() at ng_snd_item+0x28e/frame 0xfffffe00e1abcaf0
ng_apply_item() at ng_apply_item+0x2bf/frame 0xfffffe00e1abcb80
ng_snd_item() at ng_snd_item+0x28e/frame 0xfffffe00e1abcbc0
ng_ether_input() at ng_ether_input+0x4c/frame 0xfffffe00e1abcbf0
ether_nh_input() at ether_nh_input+0x1f2/frame 0xfffffe00e1abcc50
netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe00e1abcca0
ether_input() at ether_input+0x69/frame 0xfffffe00e1abcd00
iflib_rxeof() at iflib_rxeof+0xbcb/frame 0xfffffe00e1abce00
_task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe00e1abce40
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x15d/frame 0xfffffe00e1abcec0
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc3/frame 0xfffffe00e1abcef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe00e1abcf30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00e1abcf30
--- trap 0x7eb4a043, rip = 0x9566c7f262db2889, rsp = 0xc14793d336fa7ca8, rbp = 0xb3244499abe27515 ---
KDB: enter: panic
panic.txt0600001214602143254  7132 ustarrootwheelpage faultversion.txt0600007414602143254  7535 ustarrootwheelFreeBSD 13.2-RELEASE-p7 stable/23.7-n254871-d5ec322cffc SMP
#12
Before my upgrade, CARP worked fine.  Enter CARP Maintenance mode under Virtual IPs would trigger a failover.  Now, when I enable maintenance mode, the node stays as master and never triggers a failover.  Anyone else seeing this?
#13
22.1 Legacy Series / Re: Acme client + Namecheap DNS
April 14, 2022, 01:21:59 PM
Did more troubleshooting and figured it out.   was on the DNS side  (names weren't set up properly and didn't match certs).  Fixed the names and all worked great.
#14
22.1 Legacy Series / Acme client + Namecheap DNS
April 13, 2022, 06:05:53 PM
I am having an issue using the acme client dns verification against namecheap dns. (DDNS). 

My DDNS works great after properly configuring the newer client. 
In Acme, I'm set up to issue a wildcard cert *.mydomain.org.  The wildcard DDNS works fine..

When I try to issue the certificate, I'm getting this error:

*.mydoman.org:Verify error:During secondary validation: DNS problem: query timed out looking up TXT for _acme-challenge.mydomain.org


Watching the process, my Challenge appears to be working (Acme client adds its own _acme-challenge txt record automatically.  After the timer expires, it checks for cert issuance and I get the query timed out error. 

Has anyone run into this in the past?
#15
Been messing with this on and off over the last week..  I stabilized my dual wan HA setup by removing the mac spoofing and hostname from the WAN interfaces on the primary firewall.  Once I did that, the flapping stopped completely for the primary.  08-setwanmac

Whenever I move to my backup, the backup's WAN interfaces would flap and my WANs would take turns going up and down.

Tried a bunch of different things along what you all have tried, then I thought about it and added a simple script to the following directory:

/usr/local/etc/rc.syshook.d/start/08-setwanmac

08-setwanmac contains this:

#!/bin/sh

# Change WAN MAC addresses
ifconfig igb4 ether yy:yy:yy:yy:yy:yx
ifconfig igb5 ether xx:xx:xx:xx:xx:xy

the 08-setwanmac is silly.. just using ifconfig to change the MAC to the desired MAC (a clone of my primary firewall NIC MAC addresses). 

Super static and simple, but it's survived quite a few reboots and forced swaps with minimal packet loss and zero flaps.   I just inserted the MAC change prior to the newwanip script, thinking that the mac change would occur before the newwanip script.   Working out so far.