[Solved]: Firewall states and Kubernetes networking with Cilium & BGP

Started by kryptonian, May 08, 2024, 06:52:53 PM

Previous topic - Next topic
I'm really at a loss as to how I should configure my Firewall to properly not have connection hiccups. The question now is that how would I properly address this problem?

I'm using Cilium as my CNI, with native routing, which means that all pod traffic show up. My current problem is that I'm facing interminiant connection timeouts when connecting to an VIP LB address that is announced to my firewall via BGP from Cilium.

I have disabled state tracking on the rules allowing POD_CIDRs to traverse the firewall, which is more than likely part of the problem.


Kubernetes nodes are on 10.0.105.0/24, while one is also at 192.168.2.129 (single-node cluster, having this issue).
BGP LB prefix for the first cluster is 192.168.10.0/24. Firewall is at 192.168.2.1 (10.0.105.1).
When I do:
mtr -T -P 443 192.168.10.3

I can definitely see that there's packets getting dropped, and my Cilium's hubble shows response packets getting dropped due to TTL exceeded.


May  8 16:49:26.517: 192.168.2.129:33011 (world-ipv4) <> networking/nginx-internal-controller-6cc54b48b7-7z2js:443 (ID:106123) TTL exceeded DROPPED (TCP Flags: SYN)
May  8 16:49:26.520: 192.168.2.129:33015 (world-ipv4) <> networking/nginx-internal-controller-6cc54b48b7-xjsxs:443 (ID:106123) TTL exceeded DROPPED (TCP Flags: SYN)





It seems that I had configured Cilium in a way where it was not communicating with the nodes over the L3, but instead went thought the gateway/firewall.

Now it works better as it's routing inside the L3 instead, aka had to enable the autoDirectNodeRoutes Helm value as otherwise it will not add the PodCIDR routes to the kernel.

https://docs.cilium.io/en/stable/network/concepts/routing/#native-routing:
QuoteIn native routing mode, Cilium will delegate all packets which are not addressed to another local endpoint to the routing subsystem of the Linux kernel. This means that the packet will be routed as if a local process would have emitted the packet. As a result, the network connecting the cluster nodes must be capable of routing PodCIDRs.
QuoteIn order to run the native routing mode, the network connecting the hosts on which Cilium is running on must be capable of forwarding IP traffic using addresses given to pods or other workloads.