Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - Phoenix4

#1
High availability / pfsync0 interface output errors
September 24, 2025, 12:31:13 PM
After spending a week on this scouring the internet and not getting anywhere, I'm hoping someone has some wisdom somewhere.

I have two instances in a HA set-up. Primary on metal (4x2.5GE Intel i226-V ports), and the backup in a VM on Ubuntu (1x10GE X550-AT2 and 2x1GE I210-AT, hardware passthrough to the VM). Dedicated, direct cabled SYNC connection between the two (no LAGs or VLANs). The LAN and WAN ports are abstracted through LAGs (although that should only be relevant later in the pfsync process).

Basically I can't get any pfsync frames out of the SYNC port of either instance.
  • CARP on the WAN and LAN sides works fine, as does XMLRPC over the SYNC link. I also tried using the LAN ports for pfsync and had the same issues so I've ruled out the SYNC link itself.
  • Tried both 1500 and 9000 MTUs with no difference.
  • Tried both multicast and unicast pfsync destinations (IPv4).
  • All hardware offloads are disabled (as per default).
  • Firewall rule is completely open on the SYNC link (and firewall logs show no dropped packets).
  • Packet capture on the SYNC link confirms that no pfsync packets are being transmitted.

Logs show the same ~65sec timeout every time I trigger pfsync:
 2025-09-23T22:47:23Noticekernel<6>[5605] carp: demoted by -240 to 0 (pfsync bulk fail)
 2025-09-23T22:46:18Noticekernel<6>[5540] carp: demoted by 240 to 240 (pfsync bulk start)

netstat shows out errors on the pfsync0 interface, but not on the physical interfaces (master output below where the SYNC is on igc3):
   netstat -i
   Name        Mtu Network                  Address                                 Ipkts Ierrs Idrop     Opkts Oerrs  Coll
   igc0       1500 <Link#1>                 a8:b8:e0:0a:34:d4                    13226238     0     0   4065000     0     0
   igc1       1500 <Link#2>                 a8:b8:e0:0a:34:d5                     4337795     0     0  13291901     0     0
   igc2*      1500 <Link#3>                 a8:b8:e0:0a:34:d6                           0     0     0         0     0     0
   igc3       1500 <Link#4>                 a8:b8:e0:0a:34:d7                        4242     0     0      4402     0     0
   igc3          - fe80::%igc3/64           fe80::aab8:e0ff:fe0a:34d7%igc3              0     -     -        10     -     -
   igc3          - 192.168.168.0/30         192.168.168.1                             607     -     -        57     -     -
   igc3          - fd83:f1f2:f3f4:a8::/126  fd83:f1f2:f3f4:a8::1                        0     -     -         0     -     -
   lo0       16384 <Link#5>                 lo0                                      3645     0     0      3645     0     0
   lo0           - localhost                localhost                                2708     -     -      2708     -     -
   lo0           - fe80::%lo0/64            fe80::1%lo0                                 0     -     -         0     -     -
   lo0           - your-net                 localhost                                 937     -     -       937     -     -
   enc0*      1536 <Link#6>                 enc0                                        0     0     0         0     0     0
   pflog0*   33152 <Link#7>                 pflog0                                      0     0     0     40534     0     0
   pfsync0    1500 <Link#8>                 pfsync0                                     0     0     0       983 9852824     0

netstat on the protocol shows increasing mbuf memory errors:
   netstat -s -p pfsync
   pfsync:
      0 packets received (IPv4)
      0 packets received (IPv6)
   ...
      963 packets sent (IPv4)
      0 packets sent (IPv6)
         0 clear all requests sent
         0 13.1 state inserts sent
         0 state inserted acks sent
         0 13.1 state updates sent
         1285 compressed state updates sent
         14 uncompressed state requests sent
         0 state deletes sent
         334 compressed state deletes sent
         0 fragment inserts sent
         0 fragment deletes sent
         0 bulk update marks sent
         0 TDB replay counter updates sent
         983 end of frame marks sent
         442 state inserts sent
         0 state updates sent
         9935758 failures due to mbuf memory error
         20 send errors

All of the above are also seen on the backup (VM) instance so it doesn't appear related to specific hardware.

Google has not been very helpful so the netstat errors are as far as I seem to be able to go.
opnsense 25.7.3_7
#2
Thank you so much! I was trying to re-image an older OPNsense device but was stuck with coreboot which doesn't support UEFI. Couldn't get either the OPNsense installer nor the Protectli flash image to boot with legacy bios.
This finally solved it!
#3
Same problem here :(

I can also see in the logs the balancing being done as if the seeting was 128:

2024-05-20T21:42:49   Informational   dhcpd   balanced pool 8256ae180 192.168.0.0/24 total 64 free 28 backup 28 lts 0 max-misbal 8   
2024-05-20T21:42:49   Informational   dhcpd   balancing pool 8256ae180 192.168.0.0/24 total 64 free 34 backup 22 lts 6 max-own (+/-)6   
#4
High availability / Re: How to do IPv6 with DHCPv6-PD?
January 08, 2022, 12:50:55 PM
Quote from: pmhausen on November 21, 2021, 07:32:19 PM
Same here - hadn't noticed. Possibly it is trying to use the CARP address to ping the GW ...

For me the IPv6 gateway shows down on both master and backup as dpinger is binding to the WAN interface address rather than the CARP VIP. As I only have a single GUA for the WAN link, the WAN interface addresses are ULA (I.e. not in the same subnet as the VIP):
root@router-nuc:~ # ps x | grep dpinger
40628  -  Is      0:00.02 /usr/local/bin/dpinger -f -S -r 0 -i WAN_GWv6 -B fd00:1234:5678:90ab::5 -p /var/run/dpinger_WAN_GWv6.pid -u /var/run/dpinger_WAN_GWv6.sock -C /usr/local/etc/rc.syshook monitor -s


On the other hand the IPv4 gateway monitor binds to the VIP and it works.

From what I can see, determination of the address to bind to is done in dpinger_configure_do(). For IPv4, it iterates through the interface IPs until it finds one in the same subnet as the monitor address, so in my case it finds the VIP address. For IPv6, if the monitor address isn't an LLA it uses interfaces_primary_address6() to get the bind address which basically finds the first valid address which isn't an alias. It doesn't do the subnet check.