So I'm running OPNsense with the mentioned cable router as the gateway.
I'm starting the OPNsense device (AMD E1-6010, 4 GB RAM) manually every day to save some electricity.
But I'm experiencing the problem that the LAN switch of the cable router doesn't want to tell OPNsense its MAC address which results in having no internet connection.
I had this issue already on a Linux-based firewall, so it doesn't seem to be a software problem of the underlying FreeBSD.
But on OPNsense this issue appears much more frequently. I suspect either the WAN NIC of the firewall or the LAN NIC of the cable modem as the cause. Restarting the cable modem does not fix the issue but restarting OPNsense does fix the issue. Restarting all services on OPNsense does not fix the issue.
Any tips on how to find the cause? How would I debug the ARP handling? Here are the ARP entries when internet is down:
root@OPNsense:~ # arp -a
? (192.168.7.100) at a8:a1:59:aa:a1:0c on re1 expires in 1183 seconds [ethernet]
? (192.168.7.2) at 3c:8c:f8:15:78:80 on re1 expires in 1132 seconds [ethernet]
? (192.168.7.1) at f4:b5:20:4b:cd:53 on re1 permanent [ethernet]
? (192.168.0.1) at (incomplete) on re0 expired [ethernet]
? (192.168.0.2) at 00:e0:4c:68:1a:07 on re0 permanent [ethernet]
root@OPNsense:~ #
Here are the ARP entries after OPNsense reboot:
root@OPNsense:~ # arp -a
? (192.168.7.100) at a8:a1:59:aa:a1:0c on re1 expires in 1136 seconds [ethernet]
? (192.168.7.2) at 3c:8c:f8:15:78:80 on re1 expires in 1185 seconds [ethernet]
OPNsense (192.168.7.1) at f4:b5:20:4b:cd:53 on re1 permanent [ethernet]
? (192.168.0.1) at 9c:c8:fc:3b:e3:7a on re0 expires in 1182 seconds [ethernet]
? (192.168.0.2) at 00:e0:4c:68:1a:07 on re0 permanent [ethernet]
root@OPNsense:~ #
NICs:
LAN: Onboard Realtek RTL8111H
WAN: Add-on Realtek RTL8111E
Adding monitor to the gatewy did not help.
Plugging the cable on both interfaces did not help.
Restarting the interface did not help. Only after down/up the interface will be listed in the ARP table. "arp -a" takes about 10 seconds, is that normal?
Debugging when there is expired arp:
root@OPNsense:~ # sysctl net.link.ether
net.link.ether.inet.allow_multicast: 0
net.link.ether.inet.log_arp_permanent_modify: 1
net.link.ether.inet.log_arp_movements: 1
net.link.ether.inet.log_arp_wrong_iface: 1
net.link.ether.inet.garp_rexmit_count: 0
net.link.ether.inet.max_log_per_second: 1
net.link.ether.inet.maxhold: 16
net.link.ether.inet.wait: 20
net.link.ether.inet.proxyall: 0
net.link.ether.inet.maxtries: 5
net.link.ether.inet.max_age: 1200
net.link.ether.arp.log_level: 6
root@OPNsense:~ # netstat -sp arp
arp:
73 ARP requests sent
0 ARP requests failed to sent
23 ARP replies sent
28 ARP requests received
1 ARP reply received
29 ARP packets received
215 total packets dropped due to no ARP entry
0 ARP entrys timed out
0 Duplicate IPs seen
root@OPNsense:~ # arp -a
? (192.168.7.100) at a8:a1:59:aa:a1:0c on re1 expires in 1183 seconds [ethernet]
? (192.168.7.2) at 3c:8c:f8:15:78:80 on re1 expires in 1182 seconds [ethernet]
? (192.168.7.1) at f4:b5:20:4b:cd:53 on re1 permanent [ethernet]
? (192.168.0.1) at (incomplete) on re0 expired [ethernet]
? (192.168.0.2) at 00:e0:4c:68:1a:07 on re0 permanent [ethernet]
root@OPNsense:~ # netstat -sp arp
arp:
118 ARP requests sent
0 ARP requests failed to sent
30 ARP replies sent
35 ARP requests received
1 ARP reply received
36 ARP packets received
379 total packets dropped due to no ARP entry
0 ARP entrys timed out
0 Duplicate IPs seen
root@OPNsense:~ # service netif restart re0
Stopping Network: re0.
re0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: WAN (wan)
options=82088<VLAN_MTU,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>
ether 00:e0:4c:68:1a:07
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
Starting Network: re0.
re0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: WAN (wan)
options=82088<VLAN_MTU,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>
ether 00:e0:4c:68:1a:07
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
root@OPNsense:~ # sysctl net.link.ether
net.link.ether.inet.allow_multicast: 0
net.link.ether.inet.log_arp_permanent_modify: 1
net.link.ether.inet.log_arp_movements: 1
net.link.ether.inet.log_arp_wrong_iface: 1
net.link.ether.inet.garp_rexmit_count: 0
net.link.ether.inet.max_log_per_second: 1
net.link.ether.inet.maxhold: 16
net.link.ether.inet.wait: 20
net.link.ether.inet.proxyall: 0
net.link.ether.inet.maxtries: 5
net.link.ether.inet.max_age: 1200
net.link.ether.arp.log_level: 6
root@OPNsense:~ # netstat -sp arp
arp:
167 ARP requests sent
0 ARP requests failed to sent
42 ARP replies sent
47 ARP requests received
1 ARP reply received
48 ARP packets received
571 total packets dropped due to no ARP entry
0 ARP entrys timed out
0 Duplicate IPs seen
root@OPNsense:~ # arp -a
? (192.168.7.100) at a8:a1:59:aa:a1:0c on re1 expires in 1183 seconds [ethernet]
? (192.168.7.2) at 3c:8c:f8:15:78:80 on re1 expires in 1183 seconds [ethernet]
? (192.168.7.1) at f4:b5:20:4b:cd:53 on re1 permanent [ethernet]
root@OPNsense:~ # ifconfig re0 down
root@OPNsense:~ # ifconfig re0 up
root@OPNsense:~ # arp -a
? (192.168.7.100) at a8:a1:59:aa:a1:0c on re1 expires in 1183 seconds [ethernet]
? (192.168.7.2) at 3c:8c:f8:15:78:80 on re1 expires in 1175 seconds [ethernet]
? (192.168.7.1) at f4:b5:20:4b:cd:53 on re1 permanent [ethernet]
? (192.168.0.1) at (incomplete) on re0 expired [ethernet]
? (192.168.0.2) at 00:e0:4c:68:1a:07 on re0 permanent [ethernet]
root@OPNsense:~ # service netif restart re0
Stopping Network: re0.
re0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: WAN (wan)
options=82088<VLAN_MTU,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>
ether 00:e0:4c:68:1a:07
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
Starting Network: re0.
re0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: WAN (wan)
options=82088<VLAN_MTU,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>
ether 00:e0:4c:68:1a:07
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
root@OPNsense:~ # arp -a
? (192.168.7.100) at a8:a1:59:aa:a1:0c on re1 expires in 1185 seconds [ethernet]
? (192.168.7.2) at 3c:8c:f8:15:78:80 on re1 expires in 1129 seconds [ethernet]
? (192.168.7.1) at f4:b5:20:4b:cd:53 on re1 permanent [ethernet]
root@OPNsense:~ # ping 192.168.0.1
PING 192.168.0.1 (192.168.0.1): 56 data bytes
ping: sendto: No route to host
ping: sendto: No route to host
ping: sendto: No route to host
ping: sendto: No route to host
ping: sendto: No route to host
^C
--- 192.168.0.1 ping statistics ---
5 packets transmitted, 0 packets received, 100.0% packet loss
root@OPNsense:~ # ping 192.168.0.2
PING 192.168.0.2 (192.168.0.2): 56 data bytes
ping: sendto: No route to host
ping: sendto: No route to host
ping: sendto: No route to host
ping: sendto: No route to host
ping: sendto: No route to host
^C
--- 192.168.0.2 ping statistics ---
5 packets transmitted, 0 packets received, 100.0% packet loss
root@OPNsense:~ # arp -a
? (192.168.7.100) at a8:a1:59:aa:a1:0c on re1 expires in 1180 seconds [ethernet]
? (192.168.7.2) at 3c:8c:f8:15:78:80 on re1 expires in 1143 seconds [ethernet]
? (192.168.7.1) at f4:b5:20:4b:cd:53 on re1 permanent [ethernet]
root@OPNsense:~ # ifconfig re0 down
root@OPNsense:~ # ifconfig re0 up
root@OPNsense:~ # arp -a
? (192.168.7.100) at a8:a1:59:aa:a1:0c on re1 expires in 1179 seconds [ethernet]
? (192.168.7.2) at 3c:8c:f8:15:78:80 on re1 expires in 1130 seconds [ethernet]
? (192.168.7.1) at f4:b5:20:4b:cd:53 on re1 permanent [ethernet]
? (192.168.0.1) at (incomplete) on re0 expired [ethernet]
? (192.168.0.2) at 00:e0:4c:68:1a:07 on re0 permanent [ethernet]
root@OPNsense:~ # ping 192.168.0.2
PING 192.168.0.2 (192.168.0.2): 56 data bytes
^C
--- 192.168.0.2 ping statistics ---
77 packets transmitted, 0 packets received, 100.0% packet loss
("ping 192.168.0.1" shows "Host ist down" when there is an ARP entry)
Last ping didn't show anything on screen (strange because it has a valid MAC address). Then I did something stupid ("ifconfig re1 down"), but why can't I shutdown the device via the power button after LAN interface is down? Had to hold the power button. Also strange that the expired ARP entry was gone after the cold start.
Today I switched the NICs in Interfaces: Assignements. As expected now the LAN port did not work anymore until several reboots. So I can pinpoint the issue on the RTL8111E NIC, so probably a hardware issue. But it must also be a driver issue because on the Linux firewall that problem only appeared once in a blue moon, the Linux driver seems to be more forgiving.
In dmesg you can see that the PHY is different in RTL8111E (re1) than in RTL8111H (re0):
pcib1: <ACPI PCI-PCI bridge> irq 24 at device 2.1 on pci0
pci1: <ACPI PCI bus> on pcib1
re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xe000-0xe0ff mem 0xfe900000-0xfe900fff,0xe0800000-0xe0803fff irq 24 at device 0.0 on pci1
re0: Using 1 MSI-X message
re0: ASPM disabled
re0: Chip rev. 0x2c800000
re0: MAC rev. 0x00100000
miibus0: <MII bus> on re0
rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus0
rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
re0: Using defaults for TSO: 65518/35/2048
re0: Ethernet address: 00:e0:4c:68:1a:07
re0: netmap queues/slots: TX 1/256, RX 1/256
pcib2: <ACPI PCI-PCI bridge> irq 25 at device 2.2 on pci0
pci2: <ACPI PCI bus> on pcib2
re1: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xd000-0xd0ff mem 0xfe804000-0xfe804fff,0xfe800000-0xfe803fff irq 28 at device 0.0 on pci2
re1: Using 1 MSI-X message
re1: Chip rev. 0x54000000
re1: MAC rev. 0x00100000
miibus1: <MII bus> on re1
rgephy1: <RTL8251/8153 1000BASE-T media interface> PHY 1 on miibus1
rgephy1: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
re1: Using defaults for TSO: 65518/35/2048
re1: Ethernet address: f4:b5:20:4b:cd:53
re1: netmap queues/slots: TX 1/256, RX 1/256
There's probably no hope in debugging it further when there's a driver issue, though it only cost 8 EUR.
https://www.inter-tech.de/productdetails/ST-705_EN.html
The page is so outdated that it still shows RTL8111C.
Yesterday evening I installed the Realtek vendor driver (os-realtek-re) and was very sure everything would be fine on the next cold boot. But I was wrong and the problem still persists.
Here is the dmesg output of the vendor driver:
pcib1: <ACPI PCI-PCI bridge> irq 24 at device 2.1 on pci0
pci1: <ACPI PCI bus> on pcib1
re0: <Realtek PCIe GbE Family Controller> port 0xe000-0xe0ff mem 0xfe900000-0xfe900fff,0xe0800000-0xe0803fff irq 24 at device 0.0 on pci1
re0: Using Memory Mapping!
re0: Using 1 MSI-X message
re0: ASPM disabled
re0: version:1.98.00
re0: Ethernet address: 00:e0:4c:68:1a:07
This product is covered by one or more of the following patents:
US6,570,884, US6,115,776, and US6,327,625.
re0: Ethernet address: 00:e0:4c:68:1a:07
pcib2: <ACPI PCI-PCI bridge> irq 25 at device 2.2 on pci0
pci2: <ACPI PCI bus> on pcib2
re1: <Realtek PCIe GbE Family Controller> port 0xd000-0xd0ff mem 0xfe804000-0xfe804fff,0xfe800000-0xfe803fff irq 28 at device 0.0 on pci2
re1: Using Memory Mapping!
re1: Using 1 MSI-X message
re1: version:1.98.00
re1: Ethernet address: f4:b5:20:4b:cd:53
This product is covered by one or more of the following patents:
US6,570,884, US6,115,776, and US6,327,625.
re1: Ethernet address: f4:b5:20:4b:cd:53
I saw in Interfaces that you can manually set a MAC address, I guess this is a static ARP entry? It would be nice if this option was also available in the Gateway. I saw "Enable Static ARP entries" in the DHCP service but that is probably only relevant to the client IPs.
This thread https://forum.opnsense.org/index.php?topic=33127.0 brought me to the attention of the WOL plugin. I now added the MAC address of the Arris Touchstone TG3442DE cable router to the WAN interface. Then it should be listed as "permanent", right?
Scratch that, I just realized there is no IP added in the WOL plugin. I could probably add a static entry with "arp -s" but it seems that entry doesn't survive reboots like it does in Windows.
The vendor driver seems to have some mitigation for this bug.
The last message in dmesg now is:
re0: link state changed to DOWN
re0: link state changed to UP
re0: link state changed to DOWN
re0: link state changed to UP
Which is the same fix as my previous manual "ifconfig re0 down/up".