OPNsense Forum

English Forums => General Discussion => Topic started by: rossigee on August 02, 2023, 12:53:49 AM

Title: bgp_process_packet: BGP OPEN receipt failed for peer: n.n.n.n
Post by: rossigee on August 02, 2023, 12:53:49 AM
I have an awesome home network setup that now revolves around an OPNSense router. So, massive thanks and kudos to the devs and the whole community.

I have been trying to configure BGP to gather routes from my home K8S cluster and cloud-based K8S clusters and redistribute them to each other. I had it basically working, but then for some reason it started spitting out these errors, one or two per second, which I'm trying to investigate...

```
bgpd[79135] [EC 33554451] bgp_process_packet: BGP OPEN receipt failed for peer: 10.234.234.7
```

Looking into the FRR source, I see this is generated on this line in the `bgp_process_packet` function.

https://github.com/FRRouting/frr/blob/5da58d355a094100ddedb861aa5555be8a4ea1bf/bgpd/bgp_packet.c#L2926 (https://github.com/FRRouting/frr/blob/5da58d355a094100ddedb861aa5555be8a4ea1bf/bgpd/bgp_packet.c#L2926)

Basically, it's triggered if the ` bgp_open_receive` function returns `BGP_Stop`. However, there are a number of reasons this could happen, and the problem I am facing is that I am not seeing the reason logged anywhere, which makes it difficult to determine which step it's failing or what might have broken since it was working.

Within the `bgp_open_receive`, it attempts to do various things and make various checks. If any of these steps fails, it 'flog_err's the message, sends a NOTIFY and returns `BGP_Stop`. In some cases though, it 'zlog's the error. Not sure why that inconsistency exists in the upstream code, but I expect there is a reason.

https://github.com/FRRouting/frr/blob/5da58d355a094100ddedb861aa5555be8a4ea1bf/bgpd/bgp_packet.c#L1365 (https://github.com/FRRouting/frr/blob/5da58d355a094100ddedb861aa5555be8a4ea1bf/bgpd/bgp_packet.c#L1365)

Given that I see the 'receipt failed for peer' message that is 'flog_err'ed with EC_BGP_PKT_OPEN, I would also expect to see the error for any steps that 'flog_err'ed their condition. So, I suspect that the cause of my problem is one of the conditions that 'zlog's it's error. But which one?!

My question at the moment is, where are the 'zlog's getting sent to?

I have set log level to 'Debugging' in the Routing/General section.

Cheers,

--
Ross
Title: Re: bgp_process_packet: BGP OPEN receipt failed for peer: n.n.n.n
Post by: mimugmail on August 02, 2023, 05:57:26 AM
Did this happen after 23.7 upgrade?
Title: Re: bgp_process_packet: BGP OPEN receipt failed for peer: n.n.n.n
Post by: rossigee on August 02, 2023, 11:25:10 PM
No, I'm still on 23.1.11_1.

--
Ross
Title: Re: bgp_process_packet: BGP OPEN receipt failed for peer: n.n.n.n
Post by: mimugmail on August 03, 2023, 06:28:29 AM
Hm, maybe worth to try to update as frr gets bumped from 7 to 8
Title: Re: bgp_process_packet: BGP OPEN receipt failed for peer: n.n.n.n
Post by: rossigee on August 05, 2023, 07:59:36 AM
Thanks, mimugmail. Good idea.

So, I updated and now the debug logs are showing the zlogs so I was able to determine which case was causing the error.

In case anyone is curious, in my case the logs now revealed:

<27>1 2023-08-05T11:51:15+07:00 router1.golder.lan bgpd 44461 - [meta sequenceId="6790251"] [MVZKX-EG443][EC 33554452] bgp_process_p
acket: BGP OPEN receipt failed for peer: 10.234.234.7
<30>1 2023-08-05T11:51:15+07:00 router1.golder.lan bgpd 44461 - [meta sequenceId="6790252"] [HZN6M-XRM1G] %NOTIFICATION: sent to nei
ghbor 10.234.234.7 6/7 (Cease/Connection Collision Resolution) 0 bytes


A little Googling led me to realise that I'd configured both Cilium BGP and 'kube-router' :doh: and they were both trying to connect at once. I disabled Cilium BGP and things are now working as expected again.

Unfortunately, the upgrade seems to have broken my main PPPoE Internet gateway, so it's left us falling back to a backup wifi link for now. I'll look into that now, so if I can't figure it out expect another post shortly :smile:

--
Ross