Network Failure

Started by guywyers, January 23, 2023, 04:45:10 PM

Previous topic - Next topic
Hi all,

I'm using OPNSense in straightforward setup consisting of a wired LAN with RJ45 sockets all over my house, with drop-cables going into two connected switches and, finally, the LAN port of OPNSense connected to that same LAN. There's also WIFI, but that uses a wired Access Point connected to that same LAN, so for this issue it is not relevant.

I'm experiencing periodic network failures, not triggered by any particular event. They appear suddenly and then disappear without any concrete pattern.
I've been narrowing this down, starting by harassing my ISP, who played dumb as usual, but in this case, they really were not the cause. I came to that conclusion during one of the episodes when I logged on to the OPNSense console and discovered that pinging the outside world worked fine, but that I couldn't ping any machine on the LAN! I then tried pinging machines on the LAN from other machines on the LAN, but that didn't work either. So, during one of these episodes, my whole LAN is dead.

My next guess where the two network switches, these were indeed two old plasticky Netgear things, so I thought let's replace them: didn't solve the problem.

I ruled out the drop cables, because the probability that all of them went bad at the same time is zero.

In terms of cables there were two potential single points of failure: the cable between the OPNSense and the LAN switch and the cable linking the two switches together. I replaced both: didn't solve the problem.

That leaves one potential issue that I can think of: a faulty network port on my OPNSense box (hardware issue). So here I have some questions:

  • Is there a way to diagnose this? I mean, are there logs where a faulty network port would leave traces? One thing I have noticed is that the usual message about interface down/interface up, which appears when you unplug a cable, DOES NOT appear on the console when one these episodes happen.
  • Is it possible that such a failure would cause my whole LAN to fall over? I can understand that losing connection to the Firewall implies a loss of internet connection, but shouldn't the internal LAN keep functioning?


Finally, I have a more general question: aAre there any recommendations or tools to help me diagnose this issue?

Any help would be greatly appreciated, because this is driving me nuts.

Thanks

If you can't ping devices on the same LAN then it's got nothing to do with the router.
The router would never even see that traffic, it's layer 2 traffic only.
So you're back to the switches being bad or cables between them.

Next time, plug into a switch, ping devices on that switch. If successful, do the same on the other switch.
If both are successful then you have a bad cable or port between the switches.
Also, try from the switches to the internet, you don't say how they're connected but if you can ping the internet from the one connected to the router, then that helps to narrow it down more.

Quote from: Demusman on January 23, 2023, 05:34:43 PM
If you can't ping devices on the same LAN then it's got nothing to do with the router.
The router would never even see that traffic, it's layer 2 traffic only.
So you're back to the switches being bad or cables between them.

That's exactly what I thought initially, but what's the probability of this happening, after having both of them replaced? Will try your suggestion though, you never know...

Quote from: Demusman on January 23, 2023, 05:34:43 PM
Also, try from the switches to the internet, you don't say how they're connected
It's bassically two switches with an ethernet cable between them, all drop cables arriving in ports on one of the switches and one cable between one switch and the LAN port on the router.

Come to think of it, since I cannot ping LAN devices from the router, and every device connects to the internet through the router, it would be very surprising that a ping to the internet initiated from a device connected to one of the switches would work, no? It would imply that network connectivity "device -> switch -> router" works, whereas "router -> switch -> device" does not.

Yeah, I missed the 'ping from router" part.
Do the ping devices on the same switch thing and see where you get from there but it's not going to be the router so don't waste time there for now.

Yes I will do that. I had reached more or less the same conclusion, but since my network knowledge is more than rusty I thought I had overlooked something.
I will get things set up so that I have a device available to plug in the switches when it happens again. Will keep you posted. Thanks for your help so far.