LAN interface issues.

Started by Lantern5, June 11, 2025, 11:12:28 AM

Previous topic - Next topic
June 11, 2025, 11:12:28 AM Last Edit: June 11, 2025, 11:19:49 AM by Lantern5
Hardware: Lenovo M900 mini PC 16 GB RAM | LAN interface- em0- Intel(R) I219-LM SPT-H(2) | WAN interface - igb0 - Intel(R) I210 (Copper)
OPNsense version: 25.1.7_4-amd64 running on bare metal

Topology: Internet --> OPNsense --> unmanaged switch --> Sophos XG135 running SFOS 21.0.1  (Static IP) --> LAN devices


Issue:

Hi folks
The LAN interface is not reachable from any of the LAN devices, or the Sophos box whenever there is an interruption on the OPNSense LAN interface. It could be a reboot after a OPNSense software upgrade, or the OPNSense LAN cable being unpatched and repatched again. The em0 interface is up when I check via console, but does not respond to ping unless I power off the box and power it back up; or issue the ifconfig em0 down | up command. The WAN interface operates normally when this happens.
 
I tested by connecting the Sophos box directly, and using a unmanaged switch; but the result is the same. MTU is set to 1500 on both sides, and the BIOS on the Lenovo Box is the most recent one.

This issue has persisted for quite some time, will appreciate some tips on how to resolve this issue.

Update: I replaced the OPNSense box with a spare Ubiquiti ER-X with exactly the same IP address combination and the same topology.

I did the following tests, and the connection was restored as soon as the ER-X device LAN interface was available.
  • Unplug ER-X LAN interface
  • Unplug Sophos WAN interface
  • Reboot ER-X

I beleive this is definitely an issue with OPNSense.

Quote from: Lantern5 on June 11, 2025, 01:40:58 PMI believe this is definitely an issue with OPNSense.
Why, when you have also changed the hardware?

Your problem appears to be that after a reboot of the Opnsense box you need to reboot it, or the LAN interface, a second time before it will respond on em0. Is that correct? I ask because your description is a little ambiguous.

The behaviour of em0 contrasts with igb0 which continues to operate normally.

Realtek aside, I know of no other cases of similar behaviour under Opnsense, certainly not on several different boxes which I have used. Your own case shows that Opnsense has no difficulty on igb0.

I would be looking at the NIC. Perhaps a workaround could be to add a script that issues ifconfig commands to cycle em0 after startup completes. I have not thought about how to do that.
Deciso DEC697

Quote from: passeri on June 12, 2025, 01:40:20 AMWhy, when you have also changed the hardware?
Since the Sophos + ER-X combo does not exhibit the same issues as Sophos + OPNSense; I came to the conclusion that the issue is not on the Sophos Box. And this was validated again by putting a unmanaged switch between the devices. By doing this, I ensured that there was no disruption to the Sophos WAN interface while the ER-X / OPNSense box were unplugged / rebooted.

Quote from: passeri on June 12, 2025, 01:40:20 AMYour problem appears to be that after a reboot of the Opnsense box you need to reboot it, or the LAN interface, a second time before it will respond on em0. Is that correct? I ask because your description is a little ambiguous.
I need to completely power down the OPNSense box after a reboot, or a LAN cable change; before the OPNSense box starts passing traffic on the LAN interface. The interface is up, but does not do anything.


Quote from: passeri on June 12, 2025, 01:40:20 AMThe behaviour of em0 contrasts with igb0 which continues to operate normally.
That is correct, and both NICs are based on Intel Chipset. Only difference is that em0 is onboard, while igb0 is in a m.2 slot. I will also check the BIOS settings to make sure there's nothing in there causing the issue.

Quote from: passeri on June 12, 2025, 01:40:20 AMRealtek aside, I know of no other cases of similar behaviour under Opnsense, certainly not on several different boxes which I have used. Your own case shows that Opnsense has no difficulty on igb0.
Correct again.

Quote from: passeri on June 12, 2025, 01:40:20 AMI would be looking at the NIC. Perhaps a workaround could be to add a script that issues ifconfig commands to cycle em0 after startup completes. I have not thought about how to do that.
The em0 NIC is onboard, I could try swapping boxes and see how that goes. Will share an update after I do further testing with a different box.

Quote from: Lantern5 on June 12, 2025, 02:32:38 AMSince the Sophos + ER-X combo does not exhibit the same issues as Sophos + OPNSense; I came to the conclusion that the issue is not on the Sophos Box.
I did not imply I thought it was the Sophos box. The switch was from Lenovo M900 with its LAN ports to Ubiquiti ER-X, was it not? If not, what exactly are you swapping please? A labelled network diagram may be helpful

Quote from: Lantern5 on June 11, 2025, 11:12:28 AMThe em0 interface is up when I check via console, but does not respond to ping unless I power off the box and power it back up; or issue the ifconfig em0 down | up command
Quote from: Lantern5 on June 12, 2025, 02:32:38 AMI need to completely power down the OPNSense box after a reboot, or a LAN cable change; before the OPNSense box starts passing traffic on the LAN interface. The interface is up, but does not do anything.
[my emphases]

The bold parts of the statements are in conflict. What statement is both true and complete please?
Deciso DEC697

Quote from: passeri on June 12, 2025, 03:44:00 AMI did not imply I thought it was the Sophos box. The switch was from Lenovo M900 with its LAN ports to Ubiquiti ER-X, was it not? If not, what exactly are you swapping please? A labelled network diagram may be helpful

I swapped the OPNSense box with the ER-X for testing. OPNSense - fail, ER-X - Pass. The Sophos device and the unmanaged switch are unchanged. Diagram attached.


Quote from: Lantern5 on June 11, 2025, 11:12:28 AMThe em0 interface is up when I check via console, but does not respond to ping unless I power off the box and power it back up; or issue the ifconfig em0 down | up command

Quote from: Lantern5 on June 12, 2025, 02:32:38 AMI need to completely power down the OPNSense box after a reboot, or a LAN cable change; before the OPNSense box starts passing traffic on the LAN interface. The interface is up, but does not do anything.
Quote from: passeri on June 12, 2025, 03:44:00 AMThe bold parts of the statements are in conflict. What statement is both true and complete please?


My apologies, this statement is true and complete

"The em0 interface is up when I check via console, but does not respond to ping unless I power off the box and power it back up; or issue the ifconfig em0 down | up command"

Thank you Lantern5. One test would be to try Opnsense on different hardware or at least with a different NIC, if you are able to. While I am also curious to know what DHCP you are running, it seems unlikely to be the problem given cycling the interface works. My conjecture is that the NIC itself is entering an unresponsive state when disconnected physically (LAN cable) or virtually (Opnsense restart) until it is itself re-initialised by one of the two means you mentioned. In that state its internal (to Opnsense) interface is up but its external (to LAN) is not -- it cannot even be pinged quite apart from not issuing addresses. That does not sound to me like an Opnsense problem.
Deciso DEC697

Sounds like you have no convenient GUI access when the problem occurs, which makes troubleshooting less convenient. Also, the consistency of the issue you describe doesn't match typical issues. Still, have you checked "arp -a" for correct mappings? How about counters, say "netstat -i" - do they increment as you expect?