OPNsense Forum

Archive => 16.1 Legacy Series => Topic started by: interfaSys on January 20, 2016, 11:04:37 pm

Title: Feedback on 15.7_41
Post by: interfaSys on January 20, 2016, 11:04:37 pm
OK, so lots of things went wrong :D

On boot there is a controller timeout

Code: [Select]
uhub0: 13 ports with 13 removable, self powered
ugen0.2:  at usbus0
uhub1:  on usbus0
sdhci_pci0-slot0:  Controller timeout
sdhci_pci0-slot0: ============== REGISTER DUMP ==============
sdhci_pci0-slot0: Sys addr: 0x00000000 | Version:  0x00001002
sdhci_pci0-slot0: Blk size: 0x00000000 | Blk cnt:  0x00000000
sdhci_pci0-slot0: Argument: 0x00000000 | Trn mode: 0x00000000
sdhci_pci0-slot0: Present:  0x1fff0001 | Host ctl: 0x00000001
sdhci_pci0-slot0: Power:    0x00000000 | Blk gap:  0x00000080
sdhci_pci0-slot0: Wake-up:  0x00000000 | Clock:    0x0000fa07
sdhci_pci0-slot0: Timeout:  0x00000000 | Int stat: 0x00000000
sdhci_pci0-slot0: Int enab: 0x01ff00fb | Sig enab: 0x01ff00fb
sdhci_pci0-slot0: AC12 err: 0x00000000 | Slot int: 0x00000000
sdhci_pci0-slot0: Caps:     0x446cc8b2 | Max curr: 0x00000000
sdhci_pci0-slot0: ===========================================
uhub1: 4 ports with 4 removable, self powered
ugen0.3:  at usbus0
umass0:  on usbus0
umass0:  SCSI over Bulk-Only; quirks = 0x4000
umass0:2:0:-1: Attached to scbus2
da0 at umass-sim0 bus 0 scbus2 target 0 lun 0
da0:  Removable Direct Access SPC-2 SCSI device
da0: Serial Number 20120926571200000
da0: 40.000MB/s transfers
da0: Attempt to query device size failed: NOT READY, Medium not present
da0: quirks=0x2
ugen0.4:  at usbus0
ukbd0:  on usbus0
kbd1 at ukbd0
sdhci_pci0-slot0:  Controller timeout

Then there were these, but I don't think they're new
Code: [Select]
opnsense: /usr/local/etc/rc.bootup: The command '/sbin/route delete -host 85.25.105.193' returned exit code '1', the output was 'route: writing to routing socket: No such process delete host 85.25.105.193 fib 0: not in table'Once per DNS server

Also, lots of
Code: [Select]
inetd[74756]: 19000/tcp: unknown service
After that suricata was started on the VPN and it crapped out

Code: [Select]
kernel: re1: link state changed to DOWN
devd: Executing '/usr/local/opnsense/service/configd_ctl.py interface linkup stop re1'
configd.py: [aaaaaaa-cccc-dddd-eeee-bbbbbbbb] Linkup stopping re1
kernel: 423.838079 [ 266] generic_find_num_desc     called, in tx 1024 rx 1024
kernel: 423.838564 [ 274] generic_find_num_queues   called, in txq 0 rxq 0
kernel: 423.839022 [ 798] generic_netmap_dtor       Restored native NA 0
kernel: 423.839509 [ 266] generic_find_num_desc     called, in tx 1024 rx 1024
kernel: 423.839982 [ 274] generic_find_num_queues   called, in txq 0 rxq 0
opnsense: /usr/local/etc/rc.linkup: DEVD Ethernet detached event for wan
opnsense: /usr/local/etc/rc.linkup: Clearing states to old gateway 192.168.1.1.
kernel: arpresolve: can't allocate llinfo for 192.168.1.1 on re1
kernel: arpresolve: can't allocate llinfo for 192.168.1.1 on re1
kernel: arpresolve: can't allocate llinfo for 192.168.1.1 on re1
kernel: arpresolve: can't allocate llinfo for 192.168.1.1 on re1

It was impossible to connect to the main WAN and the VPN could not be shut down.

Code: [Select]
opnsense: /usr/local/etc/rc.linkup: The command '/sbin/route add -inet default '192.168.1.1'' returned exit code '1', the output was 'route: writing to routing socket: Network is unreachable add net default: gateway 192.168.1.1 fib 0: Network is unreachable'

I turned off IPS and was able to restart all the other services.


I wasn't able to turn on IPS on suricata 2, but it worked on 3. I'll give it another go later.

Title: Re: Feedback on 15.7_41
Post by: franco on January 21, 2016, 07:59:09 am
OK, so lots of things went wrong :D

On boot there is a controller timeout

Code: [Select]
uhub0: 13 ports with 13 removable, self powered
ugen0.2:  at usbus0
uhub1:  on usbus0
sdhci_pci0-slot0:  Controller timeout
sdhci_pci0-slot0: ============== REGISTER DUMP ==============
sdhci_pci0-slot0: Sys addr: 0x00000000 | Version:  0x00001002
sdhci_pci0-slot0: Blk size: 0x00000000 | Blk cnt:  0x00000000
sdhci_pci0-slot0: Argument: 0x00000000 | Trn mode: 0x00000000
sdhci_pci0-slot0: Present:  0x1fff0001 | Host ctl: 0x00000001
sdhci_pci0-slot0: Power:    0x00000000 | Blk gap:  0x00000080
sdhci_pci0-slot0: Wake-up:  0x00000000 | Clock:    0x0000fa07
sdhci_pci0-slot0: Timeout:  0x00000000 | Int stat: 0x00000000
sdhci_pci0-slot0: Int enab: 0x01ff00fb | Sig enab: 0x01ff00fb
sdhci_pci0-slot0: AC12 err: 0x00000000 | Slot int: 0x00000000
sdhci_pci0-slot0: Caps:     0x446cc8b2 | Max curr: 0x00000000
sdhci_pci0-slot0: ===========================================
uhub1: 4 ports with 4 removable, self powered
ugen0.3:  at usbus0
umass0:  on usbus0
umass0:  SCSI over Bulk-Only; quirks = 0x4000
umass0:2:0:-1: Attached to scbus2
da0 at umass-sim0 bus 0 scbus2 target 0 lun 0
da0:  Removable Direct Access SPC-2 SCSI device
da0: Serial Number 20120926571200000
da0: 40.000MB/s transfers
da0: Attempt to query device size failed: NOT READY, Medium not present
da0: quirks=0x2
ugen0.4:  at usbus0
ukbd0:  on usbus0
kbd1 at ukbd0
sdhci_pci0-slot0:  Controller timeout

Looks like an embedded device's SD card reader. A card wasn't present. These things have all sorts of quirk bits that may need to be set in their driver. Which embedded device is this? Did the boot continue? I'm assuming yes...

Then there were these, but I don't think they're new
Code: [Select]
opnsense: /usr/local/etc/rc.bootup: The command '/sbin/route delete -host 85.25.105.193' returned exit code '1', the output was 'route: writing to routing socket: No such process delete host 85.25.105.193 fib 0: not in table'Once per DNS server

Annoying yet harmless. Definitely there before, this comes from GUI code, not FreeBSD.

Also, lots of
Code: [Select]
inetd[74756]: 19000/tcp: unknown service

Known issue. pfSense just recently worked through this by migrating from inetd to xinetd.

After that suricata was started on the VPN and it crapped out

Code: [Select]
kernel: re1: link state changed to DOWN
devd: Executing '/usr/local/opnsense/service/configd_ctl.py interface linkup stop re1'
configd.py: [aaaaaaa-cccc-dddd-eeee-bbbbbbbb] Linkup stopping re1
kernel: 423.838079 [ 266] generic_find_num_desc     called, in tx 1024 rx 1024
kernel: 423.838564 [ 274] generic_find_num_queues   called, in txq 0 rxq 0
kernel: 423.839022 [ 798] generic_netmap_dtor       Restored native NA 0
kernel: 423.839509 [ 266] generic_find_num_desc     called, in tx 1024 rx 1024
kernel: 423.839982 [ 274] generic_find_num_queues   called, in txq 0 rxq 0
opnsense: /usr/local/etc/rc.linkup: DEVD Ethernet detached event for wan
opnsense: /usr/local/etc/rc.linkup: Clearing states to old gateway 192.168.1.1.
kernel: arpresolve: can't allocate llinfo for 192.168.1.1 on re1
kernel: arpresolve: can't allocate llinfo for 192.168.1.1 on re1
kernel: arpresolve: can't allocate llinfo for 192.168.1.1 on re1
kernel: arpresolve: can't allocate llinfo for 192.168.1.1 on re1

It was impossible to connect to the main WAN and the VPN could not be shut down.

Was was the suricata version you used here, 2 or 3? 2 doesn't support this, that's why we'll flip the version with 16.1. Only upgrading the kernel doesn't do that at the moment.


Code: [Select]
opnsense: /usr/local/etc/rc.linkup: The command '/sbin/route add -inet default '192.168.1.1'' returned exit code '1', the output was 'route: writing to routing socket: Network is unreachable add net default: gateway 192.168.1.1 fib 0: Network is unreachable'

I turned off IPS and was able to restart all the other services.

That sounds like it...

I wasn't able to turn on IPS on suricata 2, but it worked on 3. I'll give it another go later.

Yes, this is how it is.

There is a lot of potential for error with the development version right now because it has the IPS mode configuration bits, but not the Suricata to support it. This will be tied together for 16.1 by upgrading everything in one batch, also unhiding the IPS mode in the stable version.
Title: Re: Feedback on 15.7_41
Post by: interfaSys on January 21, 2016, 10:20:16 am
Quote
Looks like an embedded device's SD card reader. A card wasn't present. These things have all sorts of quirk bits that may need to be set in their driver. Which embedded device is this? Did the boot continue? I'm assuming yes...

Indeed. I just don't get why the dumps if it's only missing a card. I haven't attempted to use the slots though, but I don't think I have a use for them.
It's a Zotac Nano CI323.

Quote
Was was the suricata version you used here, 2 or 3? 2 doesn't support this, that's why we'll flip the version with 16.1. Only upgrading the kernel doesn't do that at the moment.

I was on 3 and IPS was working before the upgrade.
Title: Re: Feedback on 15.7_41
Post by: interfaSys on January 22, 2016, 04:58:44 pm
OK, so 2 things I've noticed.

1. It's impossible to use Suricata and an OpenVPN client at the same time. Tuning on IPS brings the whole network down since it kills the gateway.
Usually a service reload brings it back up, but as soon as the VPN client connects, the WAN is taken down again.
If Suricata is off, then the VPN connection works fine.
If Suricata is on, enabling the VPN takes it down

I'm using Suricata on both WAN and VPN_CLIENT. Both of these gateways have NAT outbound rules.

2. It's impossible to turn off the OpenVPN client service. It's automatically brought back up. Disabling it does work though. When disabled, the gateway is still shown as online in the dashboard widget (edit: that was because it wasn't monitored)

EDIT: I also end up with multiple copies of openvpn using the same config
Title: Re: Feedback on 15.7_41
Post by: interfaSys on January 25, 2016, 10:27:49 am
That's the log when enabling IPS

Code: [Select]
Jan 25 10:04:59 router configd.py: [u-u-i-d] get suricata daemon status
Jan 25 10:04:59 router configd.py: [u-u-i-d] generate template OPNsense.IDS
Jan 25 10:04:59 router configd.py: generate template container OPNsense.IDS
Jan 25 10:05:00 router configd.py: [u-u-i-d] install suricata rules
Jan 25 10:05:03 router configd.py: [u-u-i-d] restart suricata daemon
Jan 25 10:05:03 router kernel: ovpnc1: promiscuous mode disabled
Jan 25 10:05:03 router kernel: re1: promiscuous mode disabled
Jan 25 10:05:06 router configd.py: [u-u-i-d] get suricata daemon status
Jan 25 10:05:22 router kernel: re1: link state changed to DOWN
Jan 25 10:05:22 router devd: Executing '/usr/local/opnsense/service/configd_ctl.py interface linkup stop re1'
Jan 25 10:05:22 router configd.py: [u-u-i-d] Linkup stopping re1
Jan 25 10:05:22 router kernel: 722.600747 [ 266] generic_find_num_desc     called, in tx 1024 rx 1024
Jan 25 10:05:22 router kernel: 722.601235 [ 274] generic_find_num_queues   called, in txq 0 rxq 0
Jan 25 10:05:22 router kernel: 722.601691 [ 798] generic_netmap_dtor       Restored native NA 0
Jan 25 10:05:22 router kernel: 722.602184 [ 266] generic_find_num_desc     called, in tx 1024 rx 1024
Jan 25 10:05:22 router kernel: 722.602654 [ 274] generic_find_num_queues   called, in txq 0 rxq 0
Jan 25 10:05:22 router opnsense: /usr/local/etc/rc.linkup: DEVD Ethernet detached event for wan
Jan 25 10:05:23 router opnsense: /usr/local/etc/rc.linkup: Clearing states to old gateway 192.168.1.1.
Jan 25 10:05:23 router kernel: arpresolve: can't allocate llinfo for 192.168.1.1 on re1
Jan 25 10:05:24 router kernel: re1: link state changed to UP
Jan 25 10:05:24 router devd: Executing '/usr/local/opnsense/service/configd_ctl.py interface linkup start re1'
Jan 25 10:05:24 router configd.py: [u-u-i-d] Linkup starting re1
Jan 25 10:05:24 router kernel: arpresolve: can't allocate llinfo for 192.168.1.1 on re1
Jan 25 10:05:24 router opnsense: /usr/local/etc/rc.linkup: DEVD Ethernet attached event for wan
Jan 25 10:05:24 router opnsense: /usr/local/etc/rc.linkup: HOTPLUG: Configuring interface wan
Jan 25 10:05:25 router opnsense: /usr/local/etc/rc.newwanip: rc.newwanip: Informational is starting re1.
Jan 25 10:05:25 router opnsense: /usr/local/etc/rc.newwanip: rc.newwanip: on (IP address: 192.168.1.102) (interface: WAN[wan]) (real interface: re1).
Jan 25 10:05:25 router opnsense: /usr/local/etc/rc.newwanip: The command '/sbin/route delete -host 85.25.105.193' returned exit code '1', the output was 'route: writing to routing socket: No such process delete host 85.25.105.193 fib 0: not in table'
Jan 25 10:05:25 router opnsense: /usr/local/etc/rc.newwanip: The command '/sbin/route delete -host 87.98.175.85' returned exit code '1', the output was 'route: writing to routing socket: No such process delete host 87.98.175.85 fib 0: not in table'
Jan 25 10:05:25 router opnsense: /usr/local/etc/rc.newwanip: ROUTING: remove current default route to 192.168.1.1
Jan 25 10:05:25 router opnsense: /usr/local/etc/rc.newwanip: ROUTING: setting default route to 192.168.1.1
Jan 25 10:05:25 router opnsense: /usr/local/etc/rc.newwanip: Resyncing OpenVPN instances for interface WAN.
Jan 25 10:05:25 router kernel: ovpnc1: link state changed to DOWN
Jan 25 10:05:25 router configd.py: [u-u-i-d] Reloading filter
Jan 25 10:05:26 router configd.py: [u-u-i-d] Reloading filter
Jan 25 10:05:28 router opnsense: /usr/local/etc/rc.newwanip: Creating rrd update script
Jan 25 10:05:31 router opnsense: /usr/local/etc/rc.linkup: ROUTING: remove current default route to 192.168.1.1
Jan 25 10:05:31 router opnsense: /usr/local/etc/rc.linkup: ROUTING: setting default route to 192.168.1.1
Jan 25 10:05:35 router configd.py: [u-u-i-d] updating dyndns wan
Jan 25 10:05:41 router configd.py: [u-u-i-d] updating dyndns GW_WAN
Jan 25 10:05:41 router configd.py: [u-u-i-d] Restarting ipsec tunnels
Jan 25 10:05:42 router configd.py: [u-u-i-d] Restarting OpenVPN tunnels/interfaces GW_WAN
Jan 25 10:05:42 router opnsense: /usr/local/etc/rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use GW_WAN.
Jan 25 10:05:42 router opnsense: /usr/local/etc/rc.openvpn: OpenVPN: Resync client1 VPN1

At that point both WAN and VPN are down.

If Suricata is restarted, it dies silently and both interfaces are brought back online.
Even though traffic is routed through the VPN, under "System: Diagnostics: Services", the service is marked as down.

Title: Re: Feedback on 15.7_41
Post by: interfaSys on January 25, 2016, 10:50:02 am
After a reboot, all services are started. We'll see if it's stable or if a VPN restart per example brings everything down.
Title: Re: Feedback on 15.7_41
Post by: interfaSys on January 25, 2016, 01:52:14 pm
Apparently, a reboot is necessary.

Code: [Select]
25/1/2016 -- 10:42:56 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 're1+': Device not configured (6)
25/1/2016 -- 10:42:56 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'ovpnc1+': Device not configured (6)
Title: Re: Feedback on 15.7_41
Post by: franco on January 27, 2016, 08:59:38 pm
Your problems look related to re(4) driver + netmap(4) on top. I don't know how well this holds up under traffic knowing that re(4) has always been a bit flaky. Do you have hardware to double-check against a more stable em(4) or igb(4) Intel driver?
Title: Re: Feedback on 15.7_41
Post by: interfaSys on January 28, 2016, 02:38:34 pm
Quote
Do you have hardware to double-check against a more stable em(4) or igb(4) Intel driver?
Not at the moment, no, but prolly next month.