20.7.4 - WAN interface blown away when suricata active...

Started by chemlud, October 30, 2020, 11:17:07 AM

Previous topic - Next topic
@franco

Quote from: chemlud on November 01, 2020, 01:51:02 PM
Ooops! I thought that it's "base " under System -> Firmware -> Packages, but it's "kernel"! And "base" is 20.7.4, while kernel is 20.7.3.

...discussion moved on quite a bit since the post you quoted ;-)
kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....

I updated my third Dell box with 5 NICs to 20.7.4, enabled IPS on WAN and LANs (set various rule sets to block). Stable for the moment, but the differences are: WAN has a private IP (no chance to connect directly to ISP during day time) and virtually no traffic going back and forth, as only one client in LAN...

kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....

Transfered the config to the newer Dell Optiplex 7020 and the IPS can be started without breaking WAN. Interesting. Both older Optiplex 790 had the error on boot described earlier

https://forum.opnsense.org/index.php?topic=19377.msg89368#msg89368

and both recently started to loose the WAN interface with IPS enabled, one on 20.7.3 (after running fine for weeks with that version), one after updating to 20.7.4.

This makes hardly any sense to me. But seems hardware-related.

And: No, it's not the RAM, I moved the RAM from the 790 to the 7020 and it works just fine in this machine...
And: All machines are on latest BIOS version available.

How to get IPS back on the second opnsense with Optiplex 790? Do I have to buy a new machine?  :-o
kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....

OK, fresh install with 20.7 on a 790er, (last fresh install with 20.1...) and import of config. No error message on boot and IPS doing fine for more than an hour.

kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....

Quote from: chemlud on November 05, 2020, 09:23:05 PM
OK, fresh install with 20.7 on a 790er, (last fresh install with 20.1...) and import of config. No error message on boot and IPS doing fine for more than an hour.

Thx, please keep us updated.

No idea why I didn't see this yesterday, but on the 7020 with fresh install the error came back:

2020-11-04T16:59:29 opnsense[30930] /usr/local/etc/rc.filter_configure: There were error(s) loading the rules: /tmp/rules.debug:186: no routing address with matching address family found. - The line in question reads [186]: pass out route-to ( em2 aaa.bbb.ccc.ddd ) from {em2} to {!(em2:network)} keep state allow-opts label "470b24148e83cbf020300f9a54691951" # let out anything from firewall host itself (force gw)


....


2020-11-04T16:59:17 opnsense[98631] /usr/local/etc/rc.linkup: Hotplug event detected for xxxxxx(opt3) but ignoring since interface is configured with static IP (aaa.bbb.ccc.ddd ::)
2020-11-04T16:59:17 kernel em3: link state changed to UP
2020-11-04T16:59:16 opnsense[52609] /usr/local/etc/rc.linkup: Hotplug event detected for yyyyyyyy(opt2) but ignoring since interface is configured with static IP (vvv.bbb.nnn.mmm ::)
2020-11-04T16:59:16 kernel em0: link state changed to DOWN
2020-11-04T16:59:16 opnsense[25138] /usr/local/etc/rc.linkup: Hotplug event detected for LAN2(opt1) but ignoring since interface is configured with static IP (fff.ddd.sss.aaa ::)
2020-11-04T16:59:15 kernel em4: link state changed to DOWN
2020-11-04T16:59:15 opnsense[96867] /usr/local/etc/rc.linkup: Hotplug event detected for LAN(lan) but ignoring since interface is configured with static IP (qqq.www.eee.rrr ::)
2020-11-04T16:59:15 kernel em1: link state changed to DOWN
2020-11-04T16:59:14 opnsense[80384] /usr/local/etc/rc.linkup: Hotplug event detected for xxxxx(opt3) but ignoring since interface is configured with static IP (aaa.bbb.ccc.ddd ::)
2020-11-04T16:59:14 kernel em3: link state changed to DOWN


The log session starts with all interfaces (including WAN (em2), which is the built-in NIC of the Optiplex board) going down some minutes after a reboot. Then 6 seconds later the WAN is said to be up from kernel. And new WAN IP is requested via DHCP from ISP.

THEN (!) the "DEVD: Ethernet detached event for wan" error message occurs, followed by the ": no routing address with matching address family found. " error message.

But in the seconds thereafter the interfaces go down again. And some 5-6 minutes later all interfaces go down again. And some 10 minutes later interfaces down again, but then (without any interference by admin) the interfaces are stable.

Does this make sense at all?

IPS with 20.7.4 stable for the moment.
kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....

Just to add: after first reboot also the second OPNsense has the same error shown in the Dashboard:

2020-11-11T19:44:27 opnsense[59392] /usr/local/etc/rc.filter_configure: There were error(s) loading the rules: /tmp/rules.debug:207: no routing address with matching address family found. - The line in question reads [207]: pass out route-to ( em2 wan.add.res.1 ) from {em2} to {!(em2:network)} keep state allow-opts label "470b24148e83cbf020300f9a54691951" # let out anything from firewall host itself (force gw)
kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....


Nope, one install has a Cisco cabel modem in bridge mode, the other one a fiber converter. In both cases the DHCP server is on ISPs network (although at least one ISP uses a private subnet IP for DHCP (what I see from system logs), which is apparently not (!) blocked by the "Block private networks" on the WAN interface.

Both installs have IPv6 disabled wherever I can find it. But sometimes when I test DNS (with websites for testing) I get in the results even an IPv6 address for one of the configured DNS (DoT) servers. I have a feeling that the OPNsense services itself (such as unbound) might be using IPv6 even when generally disabled ("no routing address with matching address family found")...
kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....

PS: To be 100% clear: The topic of this thread (Wan blown away in suricata IPS mode) is resolved by the re-installation of OPNsense 20.7, but the other issue (error on reboot) is still valid, starting from SECOND reboot after re-installing the config.xml in BOTH OPNsense installs.

On rebooting everything looks normal at first:

2020-11-11T19:43:07 kernel OK
2020-11-11T19:43:07 kernel pflog0: promiscuous mode enabled
2020-11-11T19:43:07 kernel pflog0: promiscuous mode disabled
2020-11-11T19:43:07 kernel
2020-11-11T19:43:07 kernel done.
2020-11-11T19:43:07 kernel ...
2020-11-11T19:43:07 /flowd_aggregate.py[36005] vacuum done
2020-11-11T19:43:07 /flowd_aggregate.py[36005] start watching flowd
2020-11-11T19:43:06 /flowd_aggregate.py[36005] startup, check database.


But after 1 min. and 7 seconds all interfaces go down and then up again:

2020-11-11T19:44:28 opnsense[24025] /usr/local/etc/rc.newwanip: IPv4 renewal is starting on 'ovpns5'
2020-11-11T19:44:27 opnsense[54704] /usr/local/etc/rc.newwanip: OpenVPN server 5 instance started on PID 29501.
2020-11-11T19:44:27 kernel ovpns5: link state changed to UP
2020-11-11T19:44:27 opnsense[59392]
2020-11-11T19:44:27 opnsense[59392] /usr/local/etc/rc.filter_configure: There were error(s) loading the rules: /tmp/rules.debug:207: no routing address with matching address family found. - The line in question reads [207]: pass out route-to ( em2 fff.ggg.hhh.jjj ) from {em2} to {!(em2:network)} keep state allow-opts label "470b24148e83cbf020300f9a54691951" # let out anything from firewall host itself (force gw)
2020-11-11T19:44:27 opnsense[94696] /usr/local/etc/rc.linkup: Clearing states for stale wan route on em2
2020-11-11T19:44:27 opnsense[94696] /usr/local/etc/rc.linkup: DEVD Ethernet detached event for wan
2020-11-11T19:44:27 opnsense[74449] plugins_configure newwanip (execute task : webgui_configure_do(,opt2))
2020-11-11T19:44:26 opnsense[74449] plugins_configure newwanip (execute task : vxlan_configure_interface())
2020-11-11T19:44:26 kernel ovpns5: link state changed to DOWN
2020-11-11T19:44:26 opnsense[54704] /usr/local/etc/rc.newwanip: Resyncing OpenVPN instances for interface WAN.
2020-11-11T19:44:26 kernel pflog0: promiscuous mode enabled
2020-11-11T19:44:26 kernel pflog0: promiscuous mode disabled
...
2020-11-11T19:44:19 kernel em1: link state changed to UP
2020-11-11T19:44:18 kernel em0: link state changed to UP
2020-11-11T19:44:18 kernel em2: link state changed to DOWN
2020-11-11T19:44:17 opnsense[74449] /usr/local/etc/rc.newwanip: On (IP address: aaa.bbb.ccc.ddd) (interface: LAN2[opt2]) (real interface: em4).
2020-11-11T19:44:17 opnsense[74449] /usr/local/etc/rc.newwanip: IPv4 renewal is starting on 'em4'
2020-11-11T19:44:17 opnsense[93354] /usr/local/etc/rc.linkup: Hotplug event detected for LAN3(opt2) but ignoring since interface is configured with static IP (ddd.eee.fff.ggg ::)
2020-11-11T19:44:17 kernel em4: link state changed to UP
2020-11-11T19:44:16 opnsense[50217] /usr/local/etc/rc.linkup: Hotplug event detected for LAN(lan) but ignoring since interface is configured with static IP (bbb.aaa.sss.ddd ::)
2020-11-11T19:44:16 kernel em1: link state changed to DOWN
2020-11-11T19:44:15 opnsense[82468] /usr/local/etc/rc.linkup: Hotplug event detected for LAN2(opt1) but ignoring since interface is configured with static IP (aaa.ccc.vvv.bbb ::)
2020-11-11T19:44:15 kernel em0: link state changed to DOWN
2020-11-11T19:44:14 opnsense[4008] /usr/local/etc/rc.linkup: Hotplug event detected for LAN3(opt2) but ignoring since interface is configured with static IP (ccc.qqq.www.eee ::)
2020-11-11T19:44:14 kernel em4: link state changed to DOWN


I just skipped some lines with messages that aliases could not be resolved (as WAN is down...)

And: Yes, the line AFTER the message, starting with "kernel " is REALLY empty in the system log.
kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....


Now: Yes
Before reinstall: No.
kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....


What happenz about 1 min after reboot is completed to reproducibly kill off all physical interfaces? That makes no sense to me at all...
kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....

Hmmm:

Suricata starts something with netmap!

2020-11-11T19:44:23 suricata[11099] [100109] <Notice> -- all 8 packet processing threads, 4 management threads initialized, engine started.
2020-11-11T19:44:22 suricata[11099] [100325] <Notice> -- opened netmap:em2/T from em2: 0x3d4a9fcf300
2020-11-11T19:44:21 suricata[11099] [100325] <Notice> -- opened netmap:em2^ from em2^: 0x3d4a9fcf000
2020-11-11T19:44:19 suricata[11099] [100316] <Notice> -- opened netmap:em2^ from em2^: 0x3d4a84c5300
2020-11-11T19:44:18 suricata[11099] [100316] <Notice> -- opened netmap:em2/R from em2: 0x3d4a84c5000
2020-11-11T19:44:16 suricata[11099] [100315] <Notice> -- opened netmap:em1/T from em1: 0x3d4a7400300
2020-11-11T19:44:16 suricata[11099] [100315] <Notice> -- opened netmap:em1^ from em1^: 0x3d4a7400000
2020-11-11T19:44:16 suricata[11099] [100307] <Notice> -- opened netmap:em1^ from em1^: 0x3d4a6e18300
2020-11-11T19:44:16 suricata[11099] [100307] <Notice> -- opened netmap:em1/R from em1: 0x3d4a6e18000
2020-11-11T19:44:15 suricata[11099] [100306] <Notice> -- opened netmap:em0/T from em0: 0x3d4a5ec9300
2020-11-11T19:44:15 suricata[11099] [100306] <Notice> -- opened netmap:em0^ from em0^: 0x3d4a5ec9000
2020-11-11T19:44:15 suricata[11099] [100298] <Notice> -- opened netmap:em0^ from em0^: 0x3d4a5c38300
2020-11-11T19:44:15 suricata[11099] [100298] <Notice> -- opened netmap:em0/R from em0: 0x3d4a5c38000
2020-11-11T19:44:15 suricata[11099] [100297] <Notice> -- opened netmap:em4/T from em4: 0x3d4a4d85300
2020-11-11T19:44:14 suricata[11099] [100297] <Notice> -- opened netmap:em4^ from em4^: 0x3d4a4d85000
2020-11-11T19:44:14 suricata[11099] [100288] <Notice> -- opened netmap:em4^ from em4^: 0x3d4a328e300
2020-11-11T19:44:14 suricata[11099] [100288] <Notice> -- opened netmap:em4/R from em4: 0x3d4a328e000


The second it starts, the same moment the interface is blown away...

And takes about 3 sec. to recover...
kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....