FRR not starting

Started by Kirk, June 20, 2023, 06:42:54 AM

Previous topic - Next topic
I am in a bit of a strange situation here.  I have two OPNSense instances running, and recently went ahead and updated my secondary router (CARP failover) from 22.7.11 to 23.1.19.  After doing this, I noticed that FRR wouldn't start.

I'm not able to get much output or information as to what the issue is:

root@opn2:~ # /usr/local/etc/rc.d/frr configtest
Checking zebra.conf
2023/06/20 01:30:48 ZEBRA: [EC 100663303] if_ioctl(SIOCGIFMEDIA) failed: Inappropriate ioctl for device
2023/06/20 01:30:48 ZEBRA: [EC 4043309111] Disabling MPLS support (no kernel support)
OK
Checking bgpd.conf
2023/06/20 01:30:48 BGP: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed
2023/06/20 01:30:48 BGP: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed
OK
root@opn2:~ #


root@opn2:~ # /usr/local/etc/rc.d/frr start
Checking zebra.conf
2023/06/20 01:31:49 ZEBRA: [EC 100663303] if_ioctl(SIOCGIFMEDIA) failed: Inappropriate ioctl for device
2023/06/20 01:31:49 ZEBRA: [EC 4043309111] Disabling MPLS support (no kernel support)
OK
/usr/local/etc/rc.d/frr: WARNING: failed precmd routine for zebra
Checking bgpd.conf
2023/06/20 01:31:50 BGP: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed
2023/06/20 01:31:50 BGP: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed
OK
/usr/local/etc/rc.d/frr: WARNING: failed precmd routine for bgpd
root@opn2:~ #


Some of these errors are output on my primary router (where FRR is still working).

root@opn1:~ # /usr/local/etc/rc.d/frr configtest
Checking zebra.conf
2023/06/20 01:38:22 ZEBRA: [EC 4043309111] Disabling MPLS support (no kernel support)
OK
Checking bgpd.conf
2023/06/20 01:38:22 BGP: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed
2023/06/20 01:38:22 BGP: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed
OK
root@opn1:~ #


So I assume the problematic errors here are:

ZEBRA: [EC 100663303] if_ioctl(SIOCGIFMEDIA) failed: Inappropriate ioctl for device
/usr/local/etc/rc.d/frr: WARNING: failed precmd routine for zebra
/usr/local/etc/rc.d/frr: WARNING: failed precmd routine for bgpd



Now, to clarify, it doesn't necessarily appear that the update directly caused this issue.  I rolled back the machine to an earlier snapshot and was experiencing the same issue.... These OPNSense instances are in VM's, and I literally rolled back to a snapshot backup from 18 weeks ago when my upstream provider says they saw my secondary router check in on BGP (which was still running the original version and should have contained a known good config).  So I don't see any logical reason why the backup has the issue too all of a sudden, maybe something has changed at the upstream or something, but I don't have enough knowledge yet about how FRR\Zebra work to know what would even cause these errors.

What tools and troubleshooting avenues are available here?  The config test passes and the output seems unhelpful in actually figuring out the issue...

Thanks for any ideas!

I remember similar topic earlier. It only appears in failover mode on both machines or only on the second?

Quote from: mimugmail on June 20, 2023, 07:42:18 AM
I remember similar topic earlier. It only appears in failover mode on both machines or only on the second?
Currently I am only having the issue on the secondary router, fortunately.... if it were on the primary router I would be rebuilt on different software already and not seeking a solution, haha.

Just to be sure .. your second node has master role and it doesn't start FRR, correct?

Quote from: mimugmail on June 20, 2023, 12:26:56 PM
Just to be sure .. your second node has master role and it doesn't start FRR, correct?

Hmm... I don't actually see what setting this corresponds to, both of them have "Enable CARP Failover" checked in Routing > General, which I see will "activate the routing service only on the master device".  I suppose my secondary is not the master since the BGP is live on the primary.

The thing is I don't remember this behavior when I set it up, I swear at one point it showed both BGP peers connected at the same time.  So the implication here is if my primary goes offline, then FRR will auto start on the secondary and take over?

Not sure why I am not remembering this, I only set this stuff up probably about 6-8 months ago and I did thoroughly test the failover.  I don't recall making any changes to the configuration since then either.

If both Firewalls have own peers you dont need to enable this feature :)

So, it works as intended

Quote from: mimugmail on June 20, 2023, 09:03:43 PM
If both Firewalls have own peers you dont need to enable this feature :)

So, it works as intended

I have two BGP neighbors, so if I uncheck the CARP Failover checkbox, both FRR instances will run simultaneously? 

Will traffic continue to prefer the primary unless it is offline in that case?

Do you have both peers on one machine or each peer on one?

Quote from: mimugmail on June 20, 2023, 09:28:26 PM
Do you have both peers on one machine or each peer on one?
Both peers are on both routers currently, if I remember correctly the High Availability tools synced over the configuration to the secondary router.  I suppose I'd have to remove FRR from the HA sync if I wanted to have it split up?