Intermittent connectivity issues with LAN interface

Started by diskonekted, December 15, 2023, 08:27:17 AM

Previous topic - Next topic
Hi all,

Fairly new to OPNsense - so this issue could just be a config error on my part, but equally I'm wondering if something funky is going on with my interface cards.

Setup is as follows:

Dell Optiplex running OPNsense 23.7.10_1 and three interfaces; the motherboard is MGMT and there are two 1Gbps StarTech interface cards plugged into the PCIe slots (one as LAN and the other as WAN).

Both LAN and MGMT share the same IP space (192.168.1.1, with MGMT being .200 and LAN being .249).

The WAN setup is solid - RJ45 runs into a modem which is set up in bridge mode and OPNsense performs PPoE flawlessly.

The issue is on the LAN side, which also serves up DHCP. Now all clients can get addresses assigned, no problem there. But occassionally the gateway (192.168.1.249) becomes unresponsive and will stop passing traffic, won't respond to ICMP, can't raise the webUI, etc.

Until I plug in MGMT and then everything returns to normal. Not only can I hit .200 and pull up the webUI, but clients will start communicating with internet resources again.... The first few times this happened, I thought I had some kind of routing issue and that MGMT interface was acting as a gateway (and that clients were passing through it to get out the WAN), but I can disconnect the MGMT link and traffic still passes.

It's like the LAN address space is sleeping or something and that connecting in another link via the MGMT port wakes it up.

Ive scoured the config, but can't find anything obvious - perhaps there's a setting somewhere that I've missed?

Dump from the system logs > general

Date                          Severity    Process       Line
2023-12-15T18:10:14   Notice   kernel   <6>em0: link state changed to DOWN   
2023-12-15T18:10:14   Notice   opnsense   /usr/local/etc/rc.linkup: DEVD: Ethernet detached event for opt1(em0)   
2023-12-15T18:07:22   Notice   opnsense   /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())   
2023-12-15T18:07:22   Notice   opnsense   /usr/local/etc/rc.linkup: plugins_configure dns (execute task : dnsmasq_configure_do())   
2023-12-15T18:07:22   Notice   opnsense   /usr/local/etc/rc.linkup: plugins_configure dns ()   
2023-12-15T18:07:22   Notice   opnsense   /usr/local/etc/rc.linkup: plugins_configure dhcp (execute task : dhcpd_dhcp_configure())   
2023-12-15T18:07:22   Notice   opnsense   /usr/local/etc/rc.linkup: plugins_configure dhcp ()   
2023-12-15T18:07:22   Notice   opnsense   /usr/local/etc/rc.linkup: plugins_configure ipsec (execute task : ipsec_configure_do(,opt1))   
2023-12-15T18:07:22   Notice   opnsense   /usr/local/etc/rc.linkup: plugins_configure ipsec (,opt1)   
2023-12-15T18:07:22   Notice   opnsense   /usr/local/etc/rc.linkup: ROUTING: entering configure using 'opt1'   
2023-12-15T18:07:22   Notice   kernel   <6>em0: link state changed to UP   
2023-12-15T18:07:22   Notice   opnsense   /usr/local/etc/rc.linkup: DEVD: Ethernet attached event for opt1(em0)


All, further to the below, the logs from the DHCPv4 service are of interest.

For note, I'm running this service on re1 (LAN) interface, but not em0 (MGMT) interface. Understood the reasons that it might display a warning about a possible collision; but wouldn't think an ERROR is the right logging level/tag.

I was considering removing the MGMT interface entirely, but it has come in handy when "resolving" this issue.

Date           Severity            Process      Line
2023-12-15T18:07:22   Error   dhcpd   Multiple interfaces match the same shared network: re1 em0   
2023-12-15T18:07:22   Error   dhcpd   Multiple interfaces match the same subnet: re1 em0   
2023-12-15T17:56:14   Error   dhcpd   Multiple interfaces match the same shared network: re1 em0   
2023-12-15T17:56:14   Error   dhcpd   Multiple interfaces match the same subnet: re1 em0   
2023-12-15T17:47:03   Error   dhcpd   Multiple interfaces match the same shared network: re1 em0   
2023-12-15T17:47:03   Error   dhcpd   Multiple interfaces match the same subnet: re1 em0   
2023-12-15T17:46:59   Error   dhcpd   Multiple interfaces match the same shared network: re1 em0   
2023-12-15T17:46:59   Error   dhcpd   Multiple interfaces match the same subnet: re1 em0   
2023-12-15T17:34:07   Error   dhcpd   Multiple interfaces match the same shared network: re1 em0   
2023-12-15T17:34:07   Error   dhcpd   Multiple interfaces match the same subnet: re1 em0   
2023-12-15T17:33:48   Error   dhcpd   Multiple interfaces match the same shared network: re1 em0   
2023-12-15T17:33:48   Error   dhcpd   Multiple interfaces match the same subnet: re1 em0   
2023-12-15T17:33:39   Error   dhcpd   Multiple interfaces match the same shared network: re1 em0   
2023-12-15T17:33:39   Error   dhcpd   Multiple interfaces match the same subnet: re1 em0   
2023-12-15T17:33:17   Error   dhcpd   Multiple interfaces match the same shared network: re1 em0   
2023-12-15T17:33:17   Error   dhcpd   Multiple interfaces match the same subnet: re1 em0   
2023-12-15T16:59:24   Error   dhcpd   icmp_echorequest 192.168.1.136: Network is down   
2023-12-15T16:56:10   Error   dhcpd   Multiple interfaces match the same shared network: re1 em0   
2023-12-15T16:56:10   Error   dhcpd   Multiple interfaces match the same subnet: re1 em0   
2023-12-15T16:55:50   Error   dhcpd   Multiple interfaces match the same shared network: re1 em0

Furthermore, seems to be a commonly shared problem - noting a few other forum posts, which leads me to think this is a result from a default/out-of-the-box config somewhere.

You cannot have two interfaces with different IP addresses in the same network. That explains all your problems.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Quote from: Patrick M. Hausen on December 15, 2023, 11:53:51 AM
You cannot have two interfaces with different IP addresses in the same network. That explains all your problems.

Why is this? Please, elaborate.

Whilst not present in my config/setup right now, there can be bonded/grouped interfaces and 1 network across multiple NICs.

Understood that the way in which I've configured my setup is non-standard, which may lead to some unexpected problems. I wouldn't have thought this would be the culprit; but I guess such is the nature of unexpected problems!  ;D

The original plan (and current implementation) was to have only the WebUI and management aspects be performed off the MGMT interface; with other equipment in my network performing the vlan separation. As mentioned before I was thinking of removing this interface altogether for simplicity - but have made a slight change to the power configuration (was on 'hiadaptive', but now set to 'maximum') and will soak test for a few days.

If the problem returns, I'll swap MGMT to another address range and test again.

When physical interfaces are bonded or bridged, they form one logical interface that only extends over more than one physical link. What you try to do is two logical interfaces with the same subnet. This happens on another OSI layer.

There must be routing in place between logical interfaces - how could that work if both have the same IP range?

This is basic networking knowledge. It won't work like you tried. Period.

Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A

Quote from: meyergru on December 15, 2023, 12:30:50 PM
When physical interfaces are bonded or bridged, they form one logical interface that only extends over more than one physical link. What you try to do is two logical interfaces with the same subnet. This happens on another OSI layer.

There must be routing in place between logical interfaces - how could that work if both have the same IP range?

Static routes with priority weighting.

QuoteThis is basic networking knowledge. It won't work like you tried. Period.

It does in fact work.

Both when one interface is connected and when both are.

The issue is that after multiple days one will become unresponsive; then plugging in the other wakes up the former. So will concede that perhaps the config is causing some kind of issue with the routing engine that takes numerous days to manifest.

Thanks for the response.