FRR and/or OSPF is buggy

Started by Layer8, May 22, 2023, 02:32:06 PM

Previous topic - Next topic
May 22, 2023, 02:32:06 PM Last Edit: May 22, 2023, 02:35:13 PM by Layer8
Hi,

in one of our environments, we have two OPNsense ( 23.1.7_3 ) with activated FRR and OSPF which is just working fine.

Because of some problems with HAproxy and nginx on one of these two senses, we decided to install a fresh and clean OPNsense. When we tried to activate FRR with OSPF on the new installation, we were not able to get OSPF working. OSPF always stucked in Init/DROther.

We tried a lot of troubleshooting and in the packet captures of the OSPF interfaces on both senses we saw HELLO packets from the neighbor with plausible values, but nothing more. And yes, we applied allow all/any roules on the OSPF interfaces.

Then, I found this thread: https://forum.opnsense.org/index.php?topic=12413.0

We followed the workaround and disabled the firewall under Firewall -> Settings -> Advanced -> Disable Firewall  on the new installation and OSPF started to exchange routes immediately.

We enabled the firewall again but OSPF is still working, even if we disabled/enable FRR, OSPF, or if we restart the whole sense.

We cant explain this behavior, but it looks like that there may be some kind of cached data which leads into non initializing/working state.

This topic is just to let you know that there is an issue and whats a possible workaround for it.


May 22, 2023, 09:43:08 PM #2 Last Edit: May 22, 2023, 09:44:58 PM by Layer8
Nope, we tried to reboot. We also reinstalled FRR. Both without success.

Edit: We also changed maxsocbuf.

May 24, 2023, 12:29:48 PM #3 Last Edit: May 24, 2023, 01:24:35 PM by Layer8
We noticed another problem.


Our Network configuration looks like this:

https://forum.opnsense.org/index.php?action=dlattach;topic=34171.0;attach=27698;image


On OPNSENSE WAN, under Routing -> OSPF, we checked "x Advertise Default Gateway". This option should advertise a default route (0.0.0.0/0) over the OPNSENSE WAN via OSPF, right?

Additional to the shown network diagram, we have a third OPNsense in the 10.90.2.0/24 network with activated OSPF. We can see the WAN and CORE as neighbours and we see all routes which are distributed over OSPF. We dont have any manually configured gateways or static routes on this sense.

But we noticed, that there is no default route in the OSPF routing table. Should there be one if we activate "Advertise Default Gateway" ?

Edit: When we add a static default route on the OPNSENSE WAN (0.0.0.0/0 over 10.90.0.1), we see this kernel route in the other routers immediately.


May 27, 2023, 12:39:14 AM #4 Last Edit: May 27, 2023, 01:06:09 AM by kpiq
Hmmm.  Are you sure you want to advertise your internal network routing tables to the Internet?  I have never allowed OSPF to become active on the WAN interface, although there might be a use case that I'm not thinking of.

I may not be understanding.  If I do, then adding a static default route defeats the purpose of using OSPF.  The OSPF DR (and its BDR will be in sync) will drive the other OSPF area members to choose a default gateway amongst all the OSPF routers that are advertising an External LSA with Link-State-ID 0.0.0.0. 

Each OSPF device will decide - based on the LSA's metric, cost, distance and whatever other factors go into the calculation - which of the routers advertising that specific LSA they will use to connect to the Internet.  There is no need for a statically defining a default gateway on any of the devices (except in unique use cases which don't concern us now). 

One comes to mind though. If you are going to temporarily disable OSPF on a device and haven't provided a temporary static IP for a default gateway then that device becomes isolated from the rest of the network.  Every device connected to it will need to have static addresses defined temporarily so that they can continue reaching the device.

You may want to make sure that the links between your OPNsense boxes are direct, otherwise, if you have switching equipment between them you must make sure of several things:

1. If LAN is the interface that links your routers and OPNsense devices then make all other OPNsense interfaces "passive interfaces".  That can be found in Routing: OSPF' Passive Interfaces' dropdown list.

2.The switch(es) must use the same version of OSPF (v2) as your firewall.

3. The switch(es)' OSPF network type must match your LAN interfaces' network type (below Hello Interval, Dead Interval, Retransmission Interval, Retransmission Delay, and Priority).

4. Create a record in Routing > OSPF > Interfaces, an item that uses the LAN interface.  Leave the OSPF Area field blank.  The OSPF area must only appear in the Networks tab or the Interfaces tab, not both.

5. If you're not directly connecting your OPNsense computers, then pick the OSPF values for Hello Interval, Dead Interval, Retransmission Interval, Retransmission Delay, and Priority from the switches that interconnect them and apply those values to the record you created in #4.  Authentication must be the same as the type, ID, and keys that you use in your switch' OSPF configuration.

6. Create three records in Routing > OSPF > Networks,
   a. One each for 10.0.0.0/8 (or whatever your chosen private IP network).  Same for 224.0.0.0/24, and 0.0.0.0/0.  Populate your OSPF Area in these records.
   b. In the General tab,
      1. Enable must be checked,
      2. Router-ID must be populated and it must match one your router's static IP addresses (need to activate Advanced Mode for this field to show up). 
      3. Reference cost will lbe the available speed in Mbps of your LAN connection. 
      4. Follow instructions above for the Passive Interfaces.   
      5. Choose Connected, Kernel, and Static in the Route Redistribution list.
      6. Advertise Default Gateway must be checked.
      7. Advertise Default Gateway Metric must be a fairly low number (high priority).  It won't hurt if both firewalls use the same value.

7. In System > GAteways > Groups (not a requirement for OSPF, but Monit gateway alert script requires this), create a Gateway Group that only contains the WAN Gateway, and choose Packet Loss and High Latency as the Trigger.

8. Restart the OSPF service.

Try again.

One question for others who are more experienced than myself.  Please take a look at OPNsense/plugins case 3445 (https://github.com/opnsense/plugins/issues/3445) and FRRouting/frr case 13596 (https://github.com/FRRouting/frr/issues/13597). 

I can't get OSPF in my OPNsense boxes to generate updates when I disable the WAN interface or disconnect the fibers that connect it to my ISP's.  The OPNsense box itself nor any of the other OSPF devices on the network are receiving an update to remove the External LSA with Link State ID 0.0.0.0 that corresponds to the disconnected OPNsense computer.

Am I missing something?

This is not really the answer you are looking for, I know....

I have over the years had many issues with OSPF, running it on switches, pfSense and OPNsense and made the decision a few years ago to move to BGP.

BGP runs over TCP using port 179 unlike OSPF which is protocol 89 and I think that causes some issues on some networks.

I am recommending you do just that, move to BGP. Yes, it seems a lot more complex that OSPF, but, for just a few sites you really can get it working quite easily and the BGP tools are great, the options better, the filtering of routes that BGP can do more easily is better and things like BFD for fast failover, graceful restart and more.

BFD enables really fast convergence and the advantage of OSPF fast convergence is gone if you run BGP with BFD and then add in graceful restart whereby BGP keeps sending packets during the reload and the key reasons to run OSPF are no longer so compelling.

I need only to setup under BGP:
"neighbors", "prefix lists" and "route-maps"

Then under BFD I setup BFD neighbors.

The BGP diagnostics page is excellent and I can easily see what is really happening.

Spend the time and effort to move to BGP and once you get it, you won't go back.

Wow, excellent, thanks. 

From what I've researched it seems that there's been a long term issue with FREEBSD is not deleting IP
addresses removed by frr/ospf 7.5 thru 8.4.1.  That seems to be the most likely issue with my network.

I would consider BGP if our MultiLayer Switches supported it, but they don't.

Really appreciate it, expands my perspective.

Regards

Pedro

It seems like there are not very many options with the hardware I have.  Don't know if this contaminates this subject.  Let me know if I should open a new topic for this.

The goal is to have route (default gateway) redundancy over multiple Internet connections on different firewalls.  Based on this my next question is, with two OPNsense firewalls in different states, each with its own ISP (static public IPs), is CARP an option when one of them loses Internet connectivity?  I am under the impression that CARP (or VRRP in other cases) would be used for firewall failover,when the firewalls are colocated and are using the same Internet connection, not route redundancy.

Appreciate your time and attention.

Quote from: nzkiwi68 on May 28, 2023, 10:54:22 PM
This is not really the answer you are looking for, I know....

I have over the years had many issues with OSPF, running it on switches, pfSense and OPNsense and made the decision a few years ago to move to BGP.


Quote from: kpiq on May 31, 2023, 10:31:00 AM
The goal is to have route (default gateway) redundancy over multiple Internet connections on different firewalls.  Based on this my next question is, with two OPNsense firewalls in different states, each with its own ISP (static public IPs), is CARP an option when one of them loses Internet connectivity?  I am under the impression that CARP (or VRRP in other cases) would be used for firewall failover,when the firewalls are colocated and are using the same Internet connection, not route redundancy.
Correct. I have yet to implement such a setup with OPNsense but have years of experience doing exactly that with Cisco gear.

The CARP/VRRP/HSRP provides a single default gateway to "dumb" client systems that use the HA cluster as their default gateway. Of course this can be done over redundant layer 2 links with LACP or in the worst case with Spanning Tree and failover should a switch fail.

Externally each router (OPNsense in your case) has got Internet connectivity and both share another redundant inter-router link. e.g. 2 cables bundled with LACP. Or with different transfer networks and running OSPF/IS-IS/BGP. Should one router lose its uplink but still be used as a gateway by internal clients, routing must be set up in a way that this one now forwards everything over the inter-router link to the one with the active Internet connection.

We ran this for years with BGP for external routes and OSPF(3) as our IGP.

HTH,
Patrick
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)