20.7.4 - WAN interface blown away when suricata active...

Started by chemlud, October 30, 2020, 11:17:07 AM

Previous topic - Next topic
I did a

opnsense-revert -r 20.7.3 suricata

and rebooted, enabled IPS again (with WAN and all LAN's) and within 2-3 min WAN interface detached with the same log entry as posted above (DEVD Ethernet detached event for WAN).

So which package is the next suspect? :-O

kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....

PS: in IPS mode it's stable for the moment...
kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....


You have to look in the difference between legacy and inline mode to find the culprit.

Quote from: mimugmail on November 02, 2020, 07:16:31 AM
Maybe  disable all rules, perhaps your ram blows away

It's an i3 with 8GB RAM, usage 15% at most... No other package to downgrade? opnsense base, maybe?
kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....

Quote from: Supermule on November 02, 2020, 08:20:38 AM
You have to look in the difference between legacy and inline mode to find the culprit.

Hi, could you elaborate on that a little bit, I'm a user, not a coder... ;-)
kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....


I have a second box with comparable hardware, doing fine with 20.7.3 and suricata in IPS mode on LAN and WAN. But this morning at about 9:00 the WAN went down, reboot helped only for 1-2 minutes.

After third loss of WAN I disabled IPS and the connection is stable, but only until I enable IPS again, then the WAN interface get's down pretty much immediately.

2020-11-02T12:23:08 opnsense[20113] /usr/local/etc/rc.linkup: DEVD Ethernet detached event for wan

I changed nothing on this box! Can the IPS interfere with the box? Scary....
kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....

Quote from: mimugmail on November 02, 2020, 09:21:32 AM
Look at the console for the error why it freezes

Normally there is no error message in the serial console when the WAN goes down, but some minutes ago I had in the console while WAN down (on the second machine, starting this morning, with 20.7.3 installed):

...466.167540 [1742] nm_rxsync_prologue        em0 RX1: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail' h 0 c 0 t 1022 rh 0 rc 0 rt 1022 hc 1021 ht 1022
466.181403 [1787] netmap_ring_reinit        called for em0 RX1
472.463680 [1742] nm_rxsync_prologue        em0 RX1: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail' h 31 c 31 t 21 rh 31 rc 31 rt 21 hc 20 ht 21
472.477197 [1787] netmap_ring_reinit        called for em0 RX1
473.254040 [1742] nm_rxsync_prologue        igb2 RX1: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail' h 714 c 714 t 713 rh 714 rc 714 rt 713 hc 711 ht 713
473.268350 [1787] netmap_ring_reinit        called for igb2 RX1
475.718351 [1742] nm_rxsync_prologue        igb2 RX1: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail' h 725 c 725 t 723 rh 725 rc 725 rt 723 hc 721 ht 723
475.732652 [1787] netmap_ring_reinit        called for igb2 RX1
475.740008 [1742] nm_rxsync_prologue        igb2 RX1: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail' h 724 c 724 t 723 rh 724 rc 724 rt 723 hc 566 ht 723
475.754306 [1787] netmap_ring_reinit        called for igb2 RX1
483.402555 [1742] nm_rxsync_prologue        igb0 RX2: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail' h 26 c 26 t 22 rh 26 rc 26 rt 22 hc 21 ht 22
483.416156 [1787] netmap_ring_reinit        called for igb0 RX2
489.248676 [1742] nm_rxsync_prologue        em0 RX1: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail' h 106 c 106 t 100 rh 106 rc 106 rt 100 hc 99 ht 100
489.262805 [1787] netmap_ring_reinit        called for em0 RX1
492.266657 [1742] nm_rxsync_prologue        em0 RX1: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail' h 114 c 114 t 108 rh 114 rc 114 rt 108 hc 107 ht 108
492.280870 [1787] netmap_ring_reinit        called for em0 RX1
492.324378 [1742] nm_rxsync_prologue        igb2 RX1: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail' h 752 c 752 t 747 rh 752 rc 752 rt 747 hc 744 ht 747
492.338674 [1787] netmap_ring_reinit        called for igb2 RX1
496.058933 [1742] nm_rxsync_prologue        em0 RX1: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail' h 142 c 142 t 127 rh 142 rc 142 rt 127 hc 126 ht 127
496.073143 [1787] netmap_ring_reinit        called for em0 RX1
502.387235 [1742] nm_rxsync_prologue        em0 RX1: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail' h 144 c 144 t 142 rh 144 rc 144 rt 142 hc 141 ht 142
502.401452 [1787] netmap_ring_reinit        called for em0 RX1
504.390558 [1742] nm_rxsync_prologue        em0 RX1: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail' h 147 c 147 t 145 rh 147 rc 147 rt 145 hc 144 ht 145
504.404775 [1787] netmap_ring_reinit        called for em0 RX1


But I could not catch all the output.
kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....

Quote from: chemlud on November 02, 2020, 01:15:21 PM
Quote from: mimugmail on November 02, 2020, 09:21:32 AM
Look at the console for the error why it freezes

Normally there is no error message in the serial console when the WAN goes down, but some minutes ago I had in the console while WAN down (on the second machine, starting this morning, with 20.7.3 installed):

...466.167540 [1742] nm_rxsync_prologue        em0 RX1: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail' h 0 c 0 t 1022 rh 0 rc 0 rt 1022 hc 1021 ht 1022
466.181403 [1787] netmap_ring_reinit        called for em0 RX1
472.463680 [1742] nm_rxsync_prologue        em0 RX1: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail' h 31 c 31 t 21 rh 31 rc 31 rt 21 hc 20 ht 21
472.477197 [1787] netmap_ring_reinit        called for em0 RX1
473.254040 [1742] nm_rxsync_prologue        igb2 RX1: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail' h 714 c 714 t 713 rh 714 rc 714 rt 713 hc 711 ht 713
473.268350 [1787] netmap_ring_reinit        called for igb2 RX1
475.718351 [1742] nm_rxsync_prologue        igb2 RX1: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail' h 725 c 725 t 723 rh 725 rc 725 rt 723 hc 721 ht 723
475.732652 [1787] netmap_ring_reinit        called for igb2 RX1
475.740008 [1742] nm_rxsync_prologue        igb2 RX1: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail' h 724 c 724 t 723 rh 724 rc 724 rt 723 hc 566 ht 723
475.754306 [1787] netmap_ring_reinit        called for igb2 RX1
483.402555 [1742] nm_rxsync_prologue        igb0 RX2: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail' h 26 c 26 t 22 rh 26 rc 26 rt 22 hc 21 ht 22
483.416156 [1787] netmap_ring_reinit        called for igb0 RX2
489.248676 [1742] nm_rxsync_prologue        em0 RX1: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail' h 106 c 106 t 100 rh 106 rc 106 rt 100 hc 99 ht 100
489.262805 [1787] netmap_ring_reinit        called for em0 RX1
492.266657 [1742] nm_rxsync_prologue        em0 RX1: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail' h 114 c 114 t 108 rh 114 rc 114 rt 108 hc 107 ht 108
492.280870 [1787] netmap_ring_reinit        called for em0 RX1
492.324378 [1742] nm_rxsync_prologue        igb2 RX1: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail' h 752 c 752 t 747 rh 752 rc 752 rt 747 hc 744 ht 747
492.338674 [1787] netmap_ring_reinit        called for igb2 RX1
496.058933 [1742] nm_rxsync_prologue        em0 RX1: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail' h 142 c 142 t 127 rh 142 rc 142 rt 127 hc 126 ht 127
496.073143 [1787] netmap_ring_reinit        called for em0 RX1
502.387235 [1742] nm_rxsync_prologue        em0 RX1: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail' h 144 c 144 t 142 rh 144 rc 144 rt 142 hc 141 ht 142
502.401452 [1787] netmap_ring_reinit        called for em0 RX1
504.390558 [1742] nm_rxsync_prologue        em0 RX1: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail' h 147 c 147 t 145 rh 147 rc 147 rt 145 hc 144 ht 145
504.404775 [1787] netmap_ring_reinit        called for em0 RX1


But I could not catch all the output.

Thats a netmap related error and enough of them will crash the machine.

Legacy mode doesnt use netmap.

Quote from: Supermule on November 02, 2020, 01:28:00 PM
Quote from: chemlud on November 02, 2020, 01:15:21 PM
...

Thats a netmap related error and enough of them will crash the machine.

Legacy mode doesnt use netmap.

Hmm, I know this legacy thing only for Snort, where to configure in opnsense IDS/IPS?
kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....

I updated the second box to 20.7.4, same game, whenever I enable IPS, the WAN is dead within 1-2 minutes. No idea what to try next...
kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....

Quote from: chemlud on November 02, 2020, 10:57:27 PM
I updated the second box to 20.7.4, same game, whenever I enable IPS, the WAN is dead within 1-2 minutes. No idea what to try next...

And it was 20.7.3 before?

Quote from: mimugmail on November 03, 2020, 06:30:47 AM
Quote from: chemlud on November 02, 2020, 10:57:27 PM
I updated the second box to 20.7.4, same game, whenever I enable IPS, the WAN is dead within 1-2 minutes. No idea what to try next...

And it was 20.7.3 before?

Yes, the box that started loosing WAN yesterday around 9:00 the first time. I will activate my testing box with comparable (but slightly different, see https://forum.opnsense.org/index.php?topic=19377.msg89581#msg89581) hardware to see what happens there...
kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....

Quote from: chemlud on October 31, 2020, 04:18:24 PM
I get here:

# opnsense-update -kr 20.7.3                                     
Fetching kernel-20.7.3-amd64.txz: ...... done                                   
!!!!!!!!!!!! ATTENTION !!!!!!!!!!!!!!!                                         
! A critical upgrade is in progress. !                                         
! Please do not turn off the system. !                                         
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!                                         
Installing kernel-20.7.3-amd64.txz... done                                     
Please reboot.   


After reboot I enabled suricata, but it took only about one minute to kill the WAN again. And then I saw that kernel is still 20.7.4?!? I disabled again suricata for the moment, any ideas what went wrong? opnsense-revert instead of opnsense-update, maybe?

Huh, what made you say it's still "20.7.4"? The commands to verify this are "opnsense-version kernel" and "uname -a".


Cheers,
Franco