After spending a week on this scouring the internet and not getting anywhere, I'm hoping someone has some wisdom somewhere.
I have two instances in a HA set-up. Primary on metal (4x2.5GE Intel i226-V ports), and the backup in a VM on Ubuntu (1x10GE X550-AT2 and 2x1GE I210-AT, hardware passthrough to the VM). Dedicated, direct cabled SYNC connection between the two (no LAGs or VLANs). The LAN and WAN ports are abstracted through LAGs (although that should only be relevant later in the pfsync process).
Basically I can't get any pfsync frames out of the SYNC port of either instance.
- CARP on the WAN and LAN sides works fine, as does XMLRPC over the SYNC link. I also tried using the LAN ports for pfsync and had the same issues so I've ruled out the SYNC link itself.
- Tried both 1500 and 9000 MTUs with no difference.
- Tried both multicast and unicast pfsync destinations (IPv4).
- All hardware offloads are disabled (as per default).
- Firewall rule is completely open on the SYNC link (and firewall logs show no dropped packets).
- Packet capture on the SYNC link confirms that no pfsync packets are being transmitted.
Logs show the same ~65sec timeout every time I trigger pfsync:
2025-09-23T22:47:23Noticekernel<6>[5605] carp: demoted by -240 to 0 (pfsync bulk fail)
2025-09-23T22:46:18Noticekernel<6>[5540] carp: demoted by 240 to 240 (pfsync bulk start)
netstat shows out errors on the pfsync0 interface, but not on the physical interfaces (master output below where the SYNC is on igc3):
netstat -i
Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll
igc0 1500 <Link#1> a8:b8:e0:0a:34:d4 13226238 0 0 4065000 0 0
igc1 1500 <Link#2> a8:b8:e0:0a:34:d5 4337795 0 0 13291901 0 0
igc2* 1500 <Link#3> a8:b8:e0:0a:34:d6 0 0 0 0 0 0
igc3 1500 <Link#4> a8:b8:e0:0a:34:d7 4242 0 0 4402 0 0
igc3 - fe80::%igc3/64 fe80::aab8:e0ff:fe0a:34d7%igc3 0 - - 10 - -
igc3 - 192.168.168.0/30 192.168.168.1 607 - - 57 - -
igc3 - fd83:f1f2:f3f4:a8::/126 fd83:f1f2:f3f4:a8::1 0 - - 0 - -
lo0 16384 <Link#5> lo0 3645 0 0 3645 0 0
lo0 - localhost localhost 2708 - - 2708 - -
lo0 - fe80::%lo0/64 fe80::1%lo0 0 - - 0 - -
lo0 - your-net localhost 937 - - 937 - -
enc0* 1536 <Link#6> enc0 0 0 0 0 0 0
pflog0* 33152 <Link#7> pflog0 0 0 0 40534 0 0
pfsync0 1500 <Link#8> pfsync0 0 0 0 983 9852824 0
netstat on the protocol shows increasing mbuf memory errors:
netstat -s -p pfsync
pfsync:
0 packets received (IPv4)
0 packets received (IPv6)
...
963 packets sent (IPv4)
0 packets sent (IPv6)
0 clear all requests sent
0 13.1 state inserts sent
0 state inserted acks sent
0 13.1 state updates sent
1285 compressed state updates sent
14 uncompressed state requests sent
0 state deletes sent
334 compressed state deletes sent
0 fragment inserts sent
0 fragment deletes sent
0 bulk update marks sent
0 TDB replay counter updates sent
983 end of frame marks sent
442 state inserts sent
0 state updates sent
9935758 failures due to mbuf memory error
20 send errors
All of the above are also seen on the backup (VM) instance so it doesn't appear related to specific hardware.
Google has not been very helpful so the netstat errors are as far as I seem to be able to go.
opnsense 25.7.3_7