Interfaces randomly go down/unroutable

Started by n1nja, November 26, 2021, 11:12:38 PM

Previous topic - Next topic
I have experienced similar issue as some here.  LAN interface is not reachable.  Has happened with Suricata turned on and enabled on DMZ and LAN interface.  I was able to reach DMZ side,  not LAN.  Turning off Suricata has kept my firewall working. 

Although the status here is inconclusive we're walking back on the new Netmap API for a 21.7 to see if this is actually the issue or not.

https://github.com/opnsense/tools/commit/43679e8b1894

Expect this with 21.7.7 later this week.


Cheers,
Franco

Also having random issues with routing to WAN interface. Restarting all services using SSH console restores routing. Issue solved after disabling IDS/IPS and Suricata. Only disabling Suricata did not solve the issue. IDS/IPS was active on WAN and Suricata was active on all LAN/VLANs.

Quote from: UdK on December 13, 2021, 05:21:29 PM
Also having random issues with routing to WAN interface. Restarting all services using SSH console restores routing. Issue solved after disabling IDS/IPS and Suricata. Only disabling Suricata did not solve the issue. IDS/IPS was active on WAN and Suricata was active on all LAN/VLANs.

Hmmm, IDS/IPS is actually suricata, what else have you disabled?
kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....

It seems the actual issue is that non-physical interfaces are enabled in Suricata IPS mode where the new libnetmap/Suricata combination will move these previously defunct setups (IPS doesn't work but traffic goes through) from defunct setup (IPS will be enabled on these interfaces in emulation mode causing traffic drops eventually due to partial support).

We will revert the behaviour for 21.7.x, but in 22.1 and up we would ask users to take more care in verifying their setups beforehand.

None of this was ever an issue on using IPS mode for physical interfaces or VLAN parents in promiscuous mode when VLAN scanning is necessary.


Cheers,
Franco

I don't have VLANs.
kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....


Sorry to ask, I just have my two LAN interfaces in ID and was getting problems until I disabled it, but I was thinking the other day of adding my WAN which is PPPoE, so what's the best way to do this or do I have to stick with just the ones I already have?

Quote from: franco on December 14, 2021, 09:57:42 AM
It seems the actual issue is that non-physical interfaces are enabled in Suricata IPS mode where the new libnetmap/Suricata combination will move these previously defunct setups (IPS doesn't work but traffic goes through) from defunct setup (IPS will be enabled on these interfaces in emulation mode causing traffic drops eventually due to partial support).

We will revert the behaviour for 21.7.x, but in 22.1 and up we would ask users to take more care in verifying their setups beforehand.

None of this was ever an issue on using IPS mode for physical interfaces or VLAN parents in promiscuous mode when VLAN scanning is necessary.


Cheers,
Franco

I have the following set-up:
WAN PPoE
LAN running several VLANS
Scurita running on WAN and LAN, with promiscous mode

Some weeks (months?) ago I had the issue that IPS was running, traffic working but I did not get any alerts (it was around when the policy /rules set-up was changed in IPS). This resolved at some point (can't remember if by updated FW or some change in my config). Then was running OK until the recent update.
I think I run IPS only on the physical interfaces, not on the VLAN itself. So what should I change or what is wrong in my set-up?


By now my motivation to provide community support for relentless setup issues regarding IPS is almost zero, sorry.

I suggest switching to IDS or find an expert who can spend the time to look at the setup and give a recommendation on how to solve the identified issue reasonably.


Cheers,
Franco

Franco, I fully understand that you cannot resolve my issue ad-hoc and for free.
Just from your sentence
"We will revert the behaviour for 21.7.x, but in 22.1 and up we would ask users to take more care in verifying their setups beforehand."
I understood that - if things got broken in 21.7.6 - it has to do with a faulty set-up - which I cannot see from my settings, as I used recommended settings (IPS on physical devices). But I confess I'm not too much of an expert and quite new to OpnSense...
But many Thanks anyway for taking up the topic and submitting for a fix / change. Appreaciate!

It might not be that particular issue. I think in any version update, even the ones that have no relevant changes IPS comes up as "unreliable". This can be a configuration issue, could be hardware related (actual chipsets, RAM considerations, CPU power, driver quality), could be software related (mainly netmap/iflib framework in FreeBSD or Suricata itself), could be traffic spikes encountered.

There are some easy things to check: NIC driver name? "dmesg" output on the console? Problem same as previous major version? Or actually minor version issue? Trying to see if IDS works fine or reduce the number of interfaces inspected (most of the time LAN or DMZ is the one you need, not more).

This is just the tip of the iceberg. Half an hour of dedicated support is not uncommon here just to make sure we have a clear picture.


Cheers,
Franco

December 14, 2021, 07:58:28 PM #42 Last Edit: May 26, 2022, 12:27:08 AM by allan
I have been chasing after this since upgrading to 21.7.6. No issues with 21.7.5 running the same configuration.

My hardware specs:

  • Qotom Q355G4
  • Intel Core i5-5300U
  • 8GB memory - 31% used
  • Load avg: 0.27, 0.19, 0.16
  • igb NIC driver
  • igb1 - LAN
  • igb2_vlan2 - IoT VLAN
  • igb2_vlan3 - Guest VLAN
  • Traffic shaping is configured
  • On a 1Gb/35Mb circuit

Symptom:

  • Issue comes up around every 2 days
  • DHCP clients on igb1 do not receive IP address - shows 169.254.0.0/16 APIPA
  • tcpdump on igb1 shows repeated DHCPDISCOVER and DHCPOFFER packets from/to the same client MAC
  • /usr/local/etc/rc.d/suricata stop and start fixes the problem with no other change
  • nothing in dmesg stands out. There are several lines of iflib_netmap_config but I read that is normal.
  • Suricata log shows rules update via cron, then no entries until service restart.

Settings:

  • Enabled, IPS mode, and Promiscuous mode - CHECKED
  • Syslog alerts, Eve syslog output, and Log package payload - UNCHECKED
  • Pattern matcher - Hyperscan
  • Detect profile - Medium
  • Interfaces - LAN, [igb2]
  • Default packet size - BLANK

Troubleshooting attempts:

  • Completely disable IDS - ran for 1 week with no issues
  • Deleted all policies and disabled all rules except for opnsense.test.rules - still issues

My next step is to set Interfaces to just LAN and see if the situation improves. This system is in production so I can only gather info and quickly restart Suricata. I am not looking for or expecting support, just submitting another data point.

2021-12-15 1700Z update - still ran into issues after removing [igb2] from Interfaces. I also noticed this affects DHCPREQUEST and DHCPACK as well. Running Wireshark on a client in igb1, I see REQUESTs go out but no ACKs received even though dhcpd logged sending these ACKs. Another observation; an IoT device in igb2_vlan2 was having issues (2 clients in igb1 flagged it offline) even after I removed [igb2] from the Suricata interface list. The device was immediately flagged online after the Suricata restart. I will upgrade to 21.7.7 and put [igb2] back in.

In case this is related, here are my non-default tunables. I also have Hardware CRC, TSO, LRO, and VLAN Hardware Filtering all DISABLED.

  • hw.ibrs_disable = 1
  • hw.igb.rx_process_limit = -1
  • hw.igb.tx_process_limit = -1
  • legal.intel_igb.license_ack = 1
  • net.inet.icmp.drop_redirect = 1
  • vm.pmap.pti = 0

I also noticed differences in the Suricata log. This may be related to the netmap change, but I want to put it out there:

21.7.6 - 03:18 is the ids rule update cron

2021-12-15T10:39:36 suricata[60953] [100520] <Notice> -- all 4 packet processing threads, 4 management threads initialized, engine started.
2021-12-15T10:39:34 suricata[58549] [100680] <Notice> -- This is Suricata version 6.0.4 RELEASE running in SYSTEM mode
2021-12-15T10:39:33 suricata[36929] [100447] <Notice> -- Signal Received. Stopping engine.
2021-12-15T03:18:01 suricata[36929] [100447] <Notice> -- rule reload complete
2021-12-15T03:18:01 suricata[36929] [100447] <Notice> -- rule reload starting
2021-12-14T23:24:18 suricata[36929] [100447] <Notice> -- all 4 packet processing threads, 4 management threads initialized, engine started.
2021-12-14T23:24:16 suricata[27169] [100448] <Notice> -- This is Suricata version 6.0.4 RELEASE running in SYSTEM mode
2021-12-14T23:24:15 suricata[88364] [100551] <Notice> -- Signal Received. Stopping engine.


21.7.7 - [igb2] added back to interface list

2021-12-15T11:28:24 suricata[29857] [100447] <Notice> -- all 4 packet processing threads, 4 management threads initialized, engine started.
2021-12-15T11:28:24 suricata[29857] [100888] <Notice> -- opened netmap:igb2/T from igb2: 0x8c35aa2300
2021-12-15T11:28:24 suricata[29857] [100888] <Notice> -- opened netmap:igb2^ from igb2^: 0x8c35aa2000
2021-12-15T11:28:23 suricata[29857] [100879] <Notice> -- opened netmap:igb2^ from igb2^: 0x8c0b5f7300
2021-12-15T11:28:23 suricata[29857] [100879] <Notice> -- opened netmap:igb2/R from igb2: 0x8c0b5f7000
2021-12-15T11:28:23 suricata[29857] [100878] <Notice> -- opened netmap:igb1/T from igb1: 0x8bcbdfd300
2021-12-15T11:28:23 suricata[29857] [100878] <Notice> -- opened netmap:igb1^ from igb1^: 0x8bcbdfd000
2021-12-15T11:28:22 suricata[29857] [100869] <Notice> -- opened netmap:igb1^ from igb1^: 0x8bb6ca3300
2021-12-15T11:28:22 suricata[29857] [100869] <Notice> -- opened netmap:igb1/R from igb1: 0x8bb6ca3000
2021-12-15T11:28:22 suricata[25946] [100432] <Notice> -- This is Suricata version 6.0.4 RELEASE running in SYSTEM mode
2021-12-15T11:28:21 suricata[92259] [100589] <Notice> -- Stats for 'igb1^': pkts: 51783, drop: 0 (0.00%), invalid chksum: 0
2021-12-15T11:28:21 suricata[92259] [100589] <Notice> -- Stats for 'igb1': pkts: 12554, drop: 0 (0.00%), invalid chksum: 0
2021-12-15T11:28:21 suricata[92259] [100589] <Notice> -- Signal Received. Stopping engine.


2021-12-22 1700Z update - it has been one week, and all has been working and stable since the upgrade to 21.7.7. No configuration changes were made during that time and Suricata logs only show rule updates. Although others still report issues, disabling netmap v14 fixed mine. I am going to start reenabling rules but I do not expect any more issues.

2022-05-25 2200Z update - to anyone still running into issues or holding back on upgrading, the solution I found is to set "VLAN Hardware Filtering" under Interfaces > Settings > Network Interfaces section to "Leave default". I ran several versions of 22.1.x without issues - currently on 22.1.6. This might be the fix for you as well.

Yes, my issue is more or less the same as Allan's. I also doubt a hardware issue, it's two intel NICs in a x86 platform, supported without any extras by stock firmware.