Hardware: Lenovo M900 mini PC 16 GB RAM | LAN interface- em0- Intel(R) I219-LM SPT-H(2) | WAN interface - igb0 - Intel(R) I210 (Copper)
OPNsense version: 25.1.7_4-amd64 running on bare metal
Topology: Internet --> OPNsense --> unmanaged switch --> Sophos XG135 running SFOS 21.0.1 (Static IP) --> LAN devices
Issue:
Hi folks
The LAN interface is not reachable from any of the LAN devices, or the Sophos box whenever there is an interruption on the OPNSense LAN interface. It could be a reboot after a OPNSense software upgrade, or the OPNSense LAN cable being unpatched and repatched again. The em0 interface is up when I check via console, but does not respond to ping unless I power off the box and power it back up; or issue the ifconfig em0 down | up command. The WAN interface operates normally when this happens.
I tested by connecting the Sophos box directly, and using a unmanaged switch; but the result is the same. MTU is set to 1500 on both sides, and the BIOS on the Lenovo Box is the most recent one.
This issue has persisted for quite some time, will appreciate some tips on how to resolve this issue.
Update: I replaced the OPNSense box with a spare Ubiquiti ER-X with exactly the same IP address combination and the same topology.
I did the following tests, and the connection was restored as soon as the ER-X device LAN interface was available.
- Unplug ER-X LAN interface
- Unplug Sophos WAN interface
- Reboot ER-X
I beleive this is definitely an issue with OPNSense.
Quote from: Lantern5 on June 11, 2025, 01:40:58 PMI believe this is definitely an issue with OPNSense.
Why, when you have also changed the hardware?
Your problem appears to be that after a reboot of the Opnsense box you need to reboot it, or the LAN interface, a second time before it will respond on em0. Is that correct? I ask because your description is a little ambiguous.
The behaviour of em0 contrasts with igb0 which continues to operate normally.
Realtek aside, I know of no other cases of similar behaviour under Opnsense, certainly not on several different boxes which I have used. Your own case shows that Opnsense has no difficulty on igb0.
I would be looking at the NIC. Perhaps a workaround could be to add a script that issues ifconfig commands to cycle em0 after startup completes. I have not thought about how to do that.
Quote from: passeri on June 12, 2025, 01:40:20 AMWhy, when you have also changed the hardware?
Since the Sophos + ER-X combo does not exhibit the same issues as Sophos + OPNSense; I came to the conclusion that the issue is not on the Sophos Box. And this was validated again by putting a unmanaged switch between the devices. By doing this, I ensured that there was no disruption to the Sophos WAN interface while the ER-X / OPNSense box were unplugged / rebooted.
Quote from: passeri on June 12, 2025, 01:40:20 AMYour problem appears to be that after a reboot of the Opnsense box you need to reboot it, or the LAN interface, a second time before it will respond on em0. Is that correct? I ask because your description is a little ambiguous.
I need to completely power down the OPNSense box after a reboot, or a LAN cable change; before the OPNSense box starts passing traffic on the LAN interface. The interface is up, but does not do anything.
Quote from: passeri on June 12, 2025, 01:40:20 AMThe behaviour of em0 contrasts with igb0 which continues to operate normally.
That is correct, and both NICs are based on Intel Chipset. Only difference is that em0 is onboard, while igb0 is in a m.2 slot. I will also check the BIOS settings to make sure there's nothing in there causing the issue.
Quote from: passeri on June 12, 2025, 01:40:20 AMRealtek aside, I know of no other cases of similar behaviour under Opnsense, certainly not on several different boxes which I have used. Your own case shows that Opnsense has no difficulty on igb0.
Correct again.
Quote from: passeri on June 12, 2025, 01:40:20 AMI would be looking at the NIC. Perhaps a workaround could be to add a script that issues ifconfig commands to cycle em0 after startup completes. I have not thought about how to do that.
The em0 NIC is onboard, I could try swapping boxes and see how that goes. Will share an update after I do further testing with a different box.
Quote from: Lantern5 on June 12, 2025, 02:32:38 AMSince the Sophos + ER-X combo does not exhibit the same issues as Sophos + OPNSense; I came to the conclusion that the issue is not on the Sophos Box.
I did not imply I thought it was the Sophos box. The switch was from Lenovo M900 with its LAN ports to Ubiquiti ER-X, was it not? If not, what exactly are you swapping please? A labelled network diagram may be helpful
Quote from: Lantern5 on June 11, 2025, 11:12:28 AMThe em0 interface is up when I check via console, but does not respond to ping unless I power off the box and power it back up; or issue the ifconfig em0 down | up command
Quote from: Lantern5 on June 12, 2025, 02:32:38 AMI need to completely power down the OPNSense box after a reboot, or a LAN cable change; before the OPNSense box starts passing traffic on the LAN interface. The interface is up, but does not do anything.
[my emphases]
The bold parts of the statements are in conflict. What statement is both true and complete please?
Quote from: passeri on June 12, 2025, 03:44:00 AMI did not imply I thought it was the Sophos box. The switch was from Lenovo M900 with its LAN ports to Ubiquiti ER-X, was it not? If not, what exactly are you swapping please? A labelled network diagram may be helpful
I swapped the OPNSense box with the ER-X for testing. OPNSense - fail, ER-X - Pass. The Sophos device and the unmanaged switch are unchanged. Diagram attached.
Quote from: Lantern5 on June 11, 2025, 11:12:28 AMThe em0 interface is up when I check via console, but does not respond to ping unless I power off the box and power it back up; or issue the ifconfig em0 down | up command
Quote from: Lantern5 on June 12, 2025, 02:32:38 AMI need to completely power down the OPNSense box after a reboot, or a LAN cable change; before the OPNSense box starts passing traffic on the LAN interface. The interface is up, but does not do anything.
Quote from: passeri on June 12, 2025, 03:44:00 AMThe bold parts of the statements are in conflict. What statement is both true and complete please?
My apologies, this statement is true and complete
"The em0 interface is up when I check via console, but does not respond to ping unless I power off the box and power it back up; or issue the ifconfig em0 down | up command"
Thank you Lantern5. One test would be to try Opnsense on different hardware or at least with a different NIC, if you are able to. While I am also curious to know what DHCP you are running, it seems unlikely to be the problem given cycling the interface works. My conjecture is that the NIC itself is entering an unresponsive state when disconnected physically (LAN cable) or virtually (Opnsense restart) until it is itself re-initialised by one of the two means you mentioned. In that state its internal (to Opnsense) interface is up but its external (to LAN) is not -- it cannot even be pinged quite apart from not issuing addresses. That does not sound to me like an Opnsense problem.
Sounds like you have no convenient GUI access when the problem occurs, which makes troubleshooting less convenient. Also, the consistency of the issue you describe doesn't match typical issues. Still, have you checked "arp -a" for correct mappings? How about counters, say "netstat -i" - do they increment as you expect?
@passeri / @pfry thanks for your inputs.
Here's what I tried, and
none of them fixed the issue.
- Opnsense on a different mini pc. (Lenovo m910 with the same Intel(R) I219-LM onboard NIC as the M900)
- Static ARP on both sides
- MTU of 1300 on both sides
- Restarting the WAN interface on the Sophos
- Static IP and DHCP IP for the Sophos WAN interface
- Was using KEA DCHP on Opnsense, tried using ISC DHCP
Observations:
netstat -i on the Opnsense box shows outpackets incrementing. inpackets stays at 0
The Sophos works just fine when connected to a Ubiquity ER-X, but not with OPNSense. I'll test with pfsense sometime this weekend to see if that fixes the problem.
Update: I think I have isolated the problem to ZenArmour which is enabled on the OPNSense LAN interface. The issues I described go away if I stop Zenarmour and disrupt the OPNSense LAN connection. Tested again with ZenArmour running and connectivity is not restored until I restart the OPNSense LAN Interface.
That sounds promising, Lantern5, assuming you are willing to do without / wait on a fix for ZenArmor.
While you had not mentioned you were running ZenArmor, I should still either have asked about other parts of the configuration or simply advised switching off all add-ons. As a comment, your additional tests changed the machine but not the NIC which was the more likely culprit, if ZenArmor were not intruding.
Quote from: passeri on Today at 03:08:38 AMAs a comment, your additional tests changed the machine but not the NIC which was the more likely culprit, if ZenArmor were not intruding.
I did change the NIC(s) when I changed the machine. em0 is onboard, and igb0 was another m.2 NIC I had as a spare. That eliminates the NIC or any hardware as the root cause.
I'll move this thread to the Zenarmour section and see if folks have encountered this problem before.
Thanks for the assistance and inputs.
Second Update, and I hope this finally solves the issue.
A Google search lead me to this thread (https://forum.opnsense.org/index.php?topic=35601.0). Checked my settings, and I found that I had configured Zenarmor to use the native nmap driver. Changed it to emulated namp driver, and all tests which were failing are passing now. Fingers crossed it stays that way.
I hope this helps anyone else who might be running into the same problem.