Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - davidfi01

#1
This seems to be working fine.  I am updating subject to solved.
#2
SOLVED:  After some thought, and a bit of experimentation, the solution was quite simple.  I created one other CARP VIP for primary wan interface on the primary node.  This connected the drop of the wan gateway through its interface to the carp demotion processing.  As soon as the primary node  wan gateway monitor sees the gateway is down, it triggers a change to the CARP demotion value via the new wan interface CARP VIP from 0 to 240, which cascades across all the carp vips, and puts the primary node into backup, and promotes the secondary node to Master.  With dns and dhcp properly set up in HA, failover and fail back are now completely seamless.

I have physically removed the primary wan cable, triggered the failover, transfer of dns and dhcp processing and verified with whatsmyip and checking both the primary and secondary vip status to confirm successful failover.  Reconnecting the wan cable triggered the failback.  I also shut down primary node and powered it back up to simulate loss of the node.  Failover and failback work seamlessly now.
Therte is no need for wan grouping, or for the wan interface ips and gatways to be on the same switch or segment.  There is NO load balancing. The primary node serves all traffic, the secondary node is a hot standby, only routing when the primary node is down or the primary node's gateway is down.
#3
It's a 2-node configuration.  When dpinger is used to monitor the primary node's wan connection by pinging a known public address, and it fails, dpinger issues a wan interface down message.  CARP is supposed to see that message and raise demotion level from 0 to 240. CARP is not seeing the interface down message, although the interface down message appears in the system logs.  There is something either not correctly configured properly or CARP is not receiving or acting upon the wan interface down message.

D
#4
Yes, but depends on config.  In my case Primary router connected to primary isp.  Secondary router connected to secondary isp.  This is a dual wan, not multi-wan config.  In dual wan, Primary router uses primary isp exclusively.  If dpinger is active, then upon fail to connect to primary isp, dpinger flags gateway as down, and LAN CARP should move traffic over to secondary router using secondary isp.

Still not working although I am thinking I need to create a CARP VIP for WAN interface so Primary runs as Master, and then if its wan link goes down, secondary will be notices over carp link??

D
#5
I have implemented LAN CARP and can confirm using "sysctl net.inet.carp.demotion=240" that the primary demotion level sets to 240 and master -> backup traffic flow works.  However, in my situation with 2 independent WAN Gateways (one on Primary, the other on Secondary) dpinger's DOWN state on the primary isn't converted into a CARP demotion on the primary.  The current carp demotion always remains at 0 when I simulate an isp outage.

How can I get CARP to see dpinger's report of down state on isp side?

D

#6
I found that reco and implemented it today.  Will post with results after I test it for a few cycles.

Thanks for the heads up.

D
#7
It seems that Unbound DNS is losing its binding to a CARP VIP after failover/failback, because it does not automatically re-attach to the VIP when it returns. I need to manually restart Unbound to refresh its interface/IP bindings, restoring full DNS service on the VIP.

This behavior is a seems to be a common issue with Unbound in CARP HA environments. When the CARP VIP fails back to the original master, Unbound DNS sometimes needs to be restarted because it does not dynamically re-bind to the VIP after it returns to the interface.

Is there a way to force a an unbound restart when failover to backup or failback to master even occurs?

D

#8
High availability / Re: Trouble setting up HA 25.7.3
September 13, 2025, 11:19:40 PM
Please, Need some help here ...
#9
High availability / Re: Trouble setting up HA 25.7.3
September 11, 2025, 05:35:13 PM
Updated image:



Primary is:
LAN: 192.168.1.1/24
identifier "lan",device LAGG0

Secondary is:
LAN: 192.168.1.2/24
identifier "lan",device LAGG0

Not sure what attribute needs changing.  the net mask (per opnsense doc's) is set to /24. 
The interface is Lagg0 which consists of 2 physical ports (eth1 & eth2).  Both routers are identical. On the switch side, primary router's LAGG0 is connected to Ch1/I1, ports 1 and 2 on switch 1, and the secondary router Lagg0 is connected to ports 3 and 4 on switch 2.  Switches 1 and 2 are connected via (another) Lagg1 (ch2/i2).  All ports used pass default  (vlan1) untagged traffic, so the routers should be able to ping each other.
#10
High availability / Trouble setting up HA 25.7.3
September 10, 2025, 03:56:17 AM
I am trying to configure HA setup using 2 opnsense routers connecting to 2 different isp wans.
I created:




Existing Primary/master router:
GW: 154.59.210.1
WAN: 154.59.210.30/24
Identifier "wan", device icg0
LAN: 192.168.1.1/24
identifier "lan",device LAGG0
ADDED:
CARP VIP: 10.0.0.1

New Secondary Router:
GW: 154.59.188.1
WAN: 154.59.188.66/24
Identifier "wan", device icg0
LAN: 192.168.1.2/24
identifier "lan",device LAGG0
CARP VIP:10.0.0.2

For test purposes, I added firewall rule for LAN that allows all traffic on each router.

1) The carp VIPs use a direct connection. From console I can ping each machine.

I cannot ping from the console of either router to the other router even though a firewall rule allowing all traffic on LAN on each router is active.

I cannot access the opnsense gui on the secondary router unless I enable it on the WAN port.  It is not accessible from the LAN.

What do I need to do to ensure the secondary router is pingable from the LAN and I can access the secondary opnsense GUI from the LAN.

Thanks in advance,
D
#11
@Jackknife4782 - the checksum error results from the patch you applied. It is to be expected.

D
#12
I also tried usr login and got this as well:

File "/usr/local/opnsense/scripts/filter/pftablecount.py", line 49, in <module>
Missing name for redirect.
if "-" in parts[0]:
if: Badly formed number.
IndexError: list index out of range
IndexError:: Too many arguments.
#13
It happens on reboots for me.  Reproducible. Attached is a screenshot.  It looks like it runs fine using CLI.  Attached is the output I get.

Best,
D
#14
25.7, 25.10 Series / 25.7.2 Backend Log error report
August 21, 2025, 05:05:51 PM
Saw this in the Backend Log after updating to 25.7.2:

opnsense error:

[3910c104-62e6-4f14-8bd5-148de80c702e] Script action failed with Command '/usr/local/opnsense/scripts/filter/pftablecount.py ''' returned non-zero exit status 1. at Traceback (most recent call last): File "/usr/local/opnsense/service/modules/actions/script_output.py", line 78, in execute subprocess.run(script_command, env=self.config_environment, shell=True, File "/usr/local/lib/python3.11/subprocess.py", line 571, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '/usr/local/opnsense/scripts/filter/pftablecount.py ''' returned non-zero exit status 1.

It is non-critical.

D
#15
Can the tunable "hw.pci.enable_aspm" be used to do this in the abscence of an oem bios update?

D