IPv4: wrong destination for linux client on other side of transparent bridge

Started by Uwe@Home, April 11, 2021, 08:33:02 PM

Previous topic - Next topic
Hello,

(this is an english copy of https://forum.opnsense.org/index.php?topic=22566.msg107282#msg107282)

following setup with OPNsense 20.7.8 running on an APU4D4:

OPNsense  (LAN)                                                  (LAN)
                 - Switch - Transparent WLan-Bridge - Switch
                 - Client "A"                                        - Client "L" (Linux)
                                                                          - Client "W" (Windows)

Problem: "L" can´t ping "A" BUT "W" can ping "A"  ("A" can ping everything)

What´s important: this setup was perfectly working with Sophos UTM 9 before.

I compared an ICMP-paket and found the following differences:

Request:
  IPv4
    Windows "W": no flags set / Linux "L": don't fragment - bit set
  ICMP
    Windows "W": 32 Byte length (without timestamp) / Linux "L": 48 Bytes (with Timestamp)

Reply:
  IPv4
    Windows "W": no flags set / Linux "L": don't fragment - bit set
    Windows "W": Destination Address = Tp-LinkT_49:33:52 (=> MAC of Access-Point)
    Linux "L": Destination Address = Raspberr_52:ed:3c (=> MAC of Client "L")
  ICMP
    Windows "W": 32 Byte length (without timestamp) / Linux "L": 48 Bytes (with Timestamp)
    Windows "W": response time = 0,088 ms / Linux "L" = 0,154 ms

It seems that OPNsense replaces the MAC-Adress in case "don´t fragment" is set.

What I have tried so far (followed by reboot of OPNsense after each change):
System: Settings: Tunables: net.inet.ip.redirect: 0 => 1
Interfaces => Settings
  ARP Handling: [ ] => [X] Suppress ARP messages
  Hardware TSO: Disable hardware TCP segmentation offload [X] => [ ]
Firewall: Settings: Normalization:  IP Do-Not-Fragment [ ] => [X]
Firewall: Settings: Advanced:
  Network Address Translation: Reflection for port forwards [X] => [ ]
  Network Address Translation: Automatic outbound NAT for Reflection [X] => [ ]
  Disable automatic rules which force local services to use the assigned interface [ ] => [X] 

No Interface has an "upstream gateway" set
Intrusion-Detection ist disabled.

LAN-Interface is configured like this:
  Block private networks: [ ]
  Block bogon networks: [ ]
  MTU: 1500
  MSS: (not set)
  Dynamic gateway policy: [ ] This interface does not require an intermediate system to act as a gateway
  IPv4 Upstream Gateway: Auto-Detect

Is it possible to disabled that "feature" and force OPNsense to always
return the paket to the Source-MAC-Adresse as specified by the initial request?

Root cause found: in the static (DHCPv4) IP reservations, the real MAC-Address
of the Client was entered instead of the MAC-Address of the Access-Point.

Ping works right after deletion of static entry and flush of ARP-Table.

In this case the access point is not configured as a transparent bridge. I would try to fix that instead of messing with DHCP.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

QuoteIn this case the access point is not configured as a transparent bridge

With a Sophos UTM I never every had this issue.
The configuration of the Access-Point has not been modified.
Both Access-Points of the bridge are TP-Link CPE510.
Those Access-Points have no configuration options for "transparency".
The bridge behaves transparent as long as OPNSense sends the pakets back to the origin MAC-Address.

Every now an then, OPNSense uses the wrong MAC-address.
My assumption is that the MAC-address gets replaced during renew of the DHCP lease.

How do I get OPNSense to always recognize the MAC-Address of the outermost header
(the "Bridge-entrance") instead of the MAC-Address of the device behind the bridge?

The OPNSense-tunables are currently set like this - all others have the default-value:
net.inet.icmp.drop_redirect = 1
net.inet.ip.redirect = 1

(Decsribed in here: https://www.freebsdhandbook.com/security/)

Upgraded to OPNsense 22.1.4 - but it still happend again, today (about 3 weeks later).
Meanwhile I don´t think that it has something to do with DHCP because the lease-timeout
is set to the default-value of 7200 seconds (2 hours)

I solved it again by (temporarily) flushing the ARP-Cache of OPNSense.

Am I the only one with this issue? What the heck is the root-cause?
Is it possible to configure a cron-job to flush the ARP-Table automatically after two weeks?

Meanwhile I figured out how to create a cron-entry to flush the arp-cache periodically:

SSH => 8:Shell


cd /usr/local/opnsense/service/conf/actions.d
vi actions_arp.conf



[clear]
command:/usr/sbin/arp -d -a
parameters:
type:script
message:clearing arp cache
description:Clear ARP Cache



service configd restart
configctl arp clear
exit


System => Settings => Cron => + => Clear ARP Cache

But it´s sad that no one seems to know the root cause for wrong entries appearing in arp-cache, randomly (between 5 and 30 days).