Random Networking issues

Started by toxic, January 22, 2024, 10:53:50 AM

Previous topic - Next topic
I know y issue is probably on my proxmox networking so maybe stuff more for linux fans than opnSense, but the network experts I know of live around ehre so I try here ;)

I'm facing some strange random networking issues with LXCs on my PVE cluster not able to communicate.


For instance, sometimes, 10.0.10.51 which is a LXC will not be able to communicate with 10.0.1.23 which is one of my switches.

When this occurs, I see no trafic at all coming in on the gateway (using opnSense, made a packet capture, nothing there), meaning the trafic is not leaving the LXC or not leaving the network bridge. I think I did try a packet capture on the pve host of the lxc and did not see any trafic on the vmbr10 either...

I see that often thanks to my uptime-kuma instance runing on this LXC, and can't really understand why, there is a timeout (60 secs) during which uptime isn't able to either ping or curl http the switch, and doing nothing it starts working again a few minutes later...


The LXC in question is a ubuntu jammy attached with a static ip to vmbr10, the pve host is running v8.1.3 on kernel 6.5.11-7-pve.


While this is occurring, I can reproduce using ssh onside the LXC and communication is indeed down, and during this time I was able to ssh onto my opnsense gateway and confirm it is indeed able to ping or clurl the switch no problem, so were my opnSense to recieve the packets from the LXC it would pass them along correctly...


Uptime is running inside docker inside the LXC and I do believe I have similar issues within docker networking itself (some containers timeout between my traefik instance and the gitea container itselft for example...) but that seems unrelated as within docker itself...


The host is a 8365U so powerfull enough, it's sitting arount 30%CPU usage, no swapping with the 32GB of RAM I added, it is quite busy running around 100 containers total, some in LXCs, some in VMs, but overall no slowness or anything besides these random network dropouts*


I recently tries to increase ulimit -n 99999 (it was 1024 everywhere) but it doesn't seem to do any better...


Any idea ?


Here is my /etc/network/interfaces :


auto lo

iface lo inet loopback


auto enp1s0

iface enp1s0 inet manual

        mtu 9000

#eth0


auto enp2s0

iface enp2s0 inet manual

        mtu 9000

#eth1


auto enp3s0

iface enp3s0 inet manual

        mtu 9000

#eth2


auto enp4s0

iface enp4s0 inet manual

        mtu 9000

#eth3


auto enp5s0

iface enp5s0 inet manual

        mtu 9000

#eth4


auto enp6s0

iface enp6s0 inet manual

        mtu 9000

#eth5


iface enx00e04c534458 inet manual


auto bond1

iface bond1 inet manual

        bond-slaves enp5s0 enp6s0

        bond-miimon 100

        bond-mode balance-xor

        bond-xmit-hash-policy layer3+4

        mtu 9000

#LAGG_WAN


auto bond0

iface bond0 inet manual

        bond-slaves enp1s0 enp2s0 enp3s0 enp4s0

        bond-miimon 100

        bond-mode balance-xor

        bond-xmit-hash-policy layer3+4

        mtu 9000

#LAGG_Switch



auto vmbr1000

iface vmbr1000 inet manual

        bridge-ports bond0

        bridge-stp on

        bridge-fd 0

        bridge-vlan-aware yes

        bridge-vids 1-4094

        mtu 9000

#Bridge All VLANs to SWITCH


auto vmbr2000

iface vmbr2000 inet manual

        bridge-ports bond1

        bridge-stp on

        bridge-fd 0

        bridge-vlan-aware yes

        bridge-vids 1-4094

        mtu 9000

#Bidge WAN


auto vmbr1000.10

iface vmbr1000.10 inet manual

        mtu 9000

#VMs


auto vmbr1000.99

iface vmbr1000.99 inet manual

        mtu 9000

#VMs


auto vmbr10

iface vmbr10 inet static

        address 10.0.10.9/24

        gateway 10.0.10.1

        bridge-ports vmbr1000.10

        bridge-stp off

        bridge-fd 0

        post-up   ip rule add from 10.0.10.0/24 table 10Server prio 1

        post-up   ip route add default via 10.0.10.1 dev vmbr10 table 10Server

        post-up   ip route add 10.0.10.0/24 dev vmbr10 table 10Server

        mtu 9000


auto vmbr99

iface vmbr99 inet static

        address 10.0.99.9/24

        gateway 10.0.99.1

        bridge-ports vmbr1000.99

        bridge-stp off

        bridge-fd 0

        post-up   ip rule add from 10.0.99.0/24 table 99Test prio 1

        post-up   ip route add default via 10.0.99.1 dev vmbr99 table 99Test

        post-up   ip route add 10.0.99.0/24 dev vmbr99 table 99Test

        mtu 9000





I do have the proper tables created I believe :


Code (bash) Select
root@pve:~ # cat /etc/iproute2/rt_tables.d/200_10Server.conf

200 10Server

root@pve:~ # cat /etc/iproute2/rt_tables.d/204_99Test.conf

204 99Test

root@pve:~ #


Thanks in advance for any help or ideas on how to fix it ;)