Strange routing behaivor

Started by vOoPtNa, August 04, 2020, 02:30:56 PM

Previous topic - Next topic
August 04, 2020, 02:30:56 PM Last Edit: August 06, 2020, 05:33:45 PM by vOoPtNa
Hey Guys,

I've discovered a strange routing-behaivor since the upgrade to 20.7.
To illustrate i've drawn this nice schema  ;) Its kind of complex setup... hopefully I can clarify.



Some OPNSense-Firewalls(running in VMWare, vNICs in differnt VMNetworks(VLANs)) are connected to a transport-net which is used for routing between LANs. Every OPNSense does have its own LAN-network(differnt VLANs as well). Not every firewall knows each LAN-Route.


Routing-tables look like this(static):

OPNSense1:
default via internet-gw
LAN2 via transport.2
LANn via transport.n
and so on....

OPNSense2:
default via internet-gw
LANn via transport.n
and so on....

since the upgrade to 20.7 I've seen excessive traffic (20-40MBit where there was less then 1MBit pre 20.7) on some of these firewalls that doesn't even belong to the firewall. Packets should not get there as there is no route defined.
wrong(misdirected) traffic does appear on the WAN-interface as well as the transport-interface.

traceroute (wrong - public IPs shouldn't appear here)

  1    <1 ms    <1 ms    <1 ms  fw01 [LAN1.1]
  2    <1 ms    <1 ms    <1 ms  transport.9
  3    <1 ms     2 ms    <1 ms  internet.87
  4    <1 ms    <1 ms    <1 ms  internet.19
  5     2 ms     3 ms     3 ms  internet.87
  6    <1 ms    <1 ms    <1 ms  internet.87
  7    <1 ms    <1 ms    <1 ms  internet.87
  8    <1 ms     1 ms     1 ms  internet.19
  9     2 ms    <1 ms     1 ms  internet.19
10     1 ms     1 ms     1 ms  internet.87
11     1 ms     3 ms     3 ms  internet.87
12     1 ms     1 ms     1 ms  internet.19
13     1 ms     1 ms     1 ms  internet.87
14     1 ms     2 ms     1 ms  LANn.2


traceroute should look like

traceroute:
  1    <1 ms    <1 ms    <1 ms  fw01 [LAN1.1]
  2    <1 ms    <1 ms    <1 ms  transport.28
  3     1 ms     2 ms     1 ms  LANn.2


traceroutes aren't even consistent, changing every time I test...

tcpdump

14:03:44.525830 IP transport.100 > LANn.2: ICMP time exceeded in-transit, length 100
14:03:44.525871 IP transport.9 > LANn.2: ICMP time exceeded in-transit, length 100
14:03:44.525900 IP transport.23 > LANn.2: ICMP time exceeded in-transit, length 100
14:03:44.525927 IP transport.100 > LANn.2: ICMP time exceeded in-transit, length 100
14:03:44.526040 IP transport.9 > LANn.2: ICMP time exceeded in-transit, length 100
14:03:44.526096 IP transport.100 > LANn.2: ICMP time exceeded in-transit, length 100
14:03:44.526206 IP transport.9 > LANn.2: ICMP time exceeded in-transit, length 100
14:03:44.526222 IP transport.9 > LANn.2: ICMP time exceeded in-transit, length 100
14:03:44.526237 IP transport.23 > LANn.2: ICMP time exceeded in-transit, length 100
14:03:44.526254 IP transport.23 > LANn.2: ICMP time exceeded in-transit, length 100
14:03:44.526264 IP transport.100 > LANn.2: ICMP time exceeded in-transit, length 100
14:03:44.526280 IP transport.100 > LANn.2: ICMP time exceeded in-transit, length 100
14:03:44.526342 IP transport.9 > LANn.2: ICMP time exceeded in-transit, length 100


and it is not only ICMP, every protocol appears in tcpdump on wrong firewalls(TCP,UDP,ESP,...)


funny thing about the scenario: everything is working fine currently except the higher bandwith  ???

I'm out of ideas here...
was there a basic routing-change in FreeBSD 12.1 somehow?!

Anyone does have a clue how to troubleshoot this?

thanks in advance!

my idea is now that it has something to do with vmware promiscuous mode...

but why did the behaivor change from 20.1.9 to 20.7(nothing was changed on the VMWare-side)?

There were some driver changes, could be promisc, yes.


problem was caused by vmware promiscuous mode  ::)

it is solved now