Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - Phiolin

#1
1.15.1 - no access to Zenarmor GUI again, same as in 1.15 before, just an endlessly spinning loading circle.
#2
I'm seeing the same issue (and have reported that by mail as well).
Setting the timestamp brings the GUI back for me as well. :)

/usr/local/opnsense/mvc/app/library/OPNsense/Zenarmor/CLI.php settimestamp
#3
Yes, you can only have 3 policies in the home subscription (default policy + 2 custom policies).
#4
Especially watch out for your "MBUF Usage" on the Opnsense Dashboard. If you notice it increasing very quickly (like several thousands over the span of an hour), you might suffer from another MBUF leak - which should however actually be fixed in the latest build from Franco.
#5
Check out this thread, in particular this post from Franco: https://forum.opnsense.org/index.php?topic=32114.msg161656#msg161656

You can install the netmap testing kernel on the Opnsense command line with the command outlined in the post and then make sure your Zenarmor is configured to run in "Routed Mode (L3 Mode, Reporting + Blocking) with emulated netmap driver", to see if that fixes your issue.
I had the same issue with connection stalls after 2-3 days on all Zenarmor protected interfaces and am currently trying it out as well.
#6
Are you running Zenarmor? That could well be a reason for this as there's some netmap issues being worked on.
#7
I no longer see the mbuf leak with this version. Will keep running this one to see if there's any further issues.
#8
Thanks Franco. :)
I'll give that a try in emulated mode tomorrow to see if that fixes my mbuf issue.
#9
Not sure whether the Netmap-kernel is now part of 23.1.5 as it is not mentioned in the Changelog, so I'd assume that's still in the queue?

For me, all my Zenarmor issues remain.
In native mode with the igb Intel driver, Zenarmor will either stall or crash after 1-2 days, breaking all my inter-VLAN connections until Zenarmor is restarted.
In emulated mode, with or without the new netmap kernel, I'm seeing ever increasing MBUF usage until MBUF is topped out at 100% and practically everything just stops working.

So Zenarmor is currently more or less unusable. It used to be rock solid for me a year ago, not sure what has changed that led to it becoming the major issue of my network connectivity. Of course I have long-running support cases open with them, but it's not really moving forward either direction.
#10
Yes, of course a virtual VLAN adapter has been added for the VLAN 99 on the Mac.
Here's the network devices, en7 parent adapter and virtual vlan0 adapter.
That shouldn't really be the issue.
As I said - I can reach other devices on VLAN 99 just fine, just not the OPNsense device.
It goes so far, that I can even ping OPNsense from the Mac and will also get a DHCP address assigned, just cannot access the web gui via HTTPS or SSH.

When I switch the Mac to a native VLAN 99 access port and configure en7 with appropriate IP addresses, then everything works fine, so there's also no firewall rules in the way that would prevent the connections.

en7: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
options=6467<RXCSUM,TXCSUM,VLAN_MTU,TSO4,TSO6,CHANNEL_IO,PARTIAL_CSUM,ZEROINVERT_CSUM>
ether 80:6d:97:2b:1b:a6
inet6 fe80::94:1c33:8678:aba9%en7 prefixlen 64 secured scopeid 0xd
inet 10.0.11.168 netmask 0xfffffe00 broadcast 10.0.11.255
nd6 options=201<PERFORMNUD,DAD>
media: autoselect (1000baseT <full-duplex>)
status: active
vlan0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
options=6063<RXCSUM,TXCSUM,TSO4,TSO6,PARTIAL_CSUM,ZEROINVERT_CSUM>
ether 80:6d:97:2b:1b:a6
inet6 fe80::77:1d43:f194:879d%vlan0 prefixlen 64 secured scopeid 0xf
inet 10.0.99.50 netmask 0xffffff00 broadcast 10.0.99.255
nd6 options=201<PERFORMNUD,DAD>
vlan: 99 parent interface: en7
media: autoselect (1000baseT <full-duplex>)
status: active


Here's a curl test to see that traffic goes through and I can hit the HTTP-redirect rule, but cannot successfully establish the HTTPS session:


curl -vvv http://10.0.99.1:80
*   Trying 10.0.99.1:80...
* Connected to 10.0.99.1 (10.0.99.1) port 80 (#0)
> GET / HTTP/1.1
> Host: 10.0.99.1
> User-Agent: curl/7.87.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 301 Moved Permanently
< Location: https://10.0.99.1/
< Content-Length: 0
< Date: Tue, 04 Apr 2023 14:32:29 GMT
< Server: OPNsense
<
* Connection #0 to host 10.0.99.1 left intact



curl -vvv https://10.0.99.1:443
*   Trying 10.0.99.1:443...
* Connected to 10.0.99.1 (10.0.99.1) port 443 (#0)
* ALPN: offers h2
* ALPN: offers http/1.1
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* [CONN-0-0][CF-SSL] (304) (OUT), TLS handshake, Client hello (1):
* Recv failure: Connection reset by peer
* LibreSSL/3.3.6: error:02FFF036:system library:func(4095):Connection reset by peer
* Closing connection 0
curl: (35) Recv failure: Connection reset by peer
#11
A bit of a strange issue here that I fail to understand.

Client is a MacBook which I mainly use for all kinds of admin stuff.
Client is connected via a switch port that has untagged/native VLAN 10 and tagged VLAN 99 configured.

OPNsense admin web-gui and SSH are configured to listen on all interfaces and of course OPNsense has interfaces configured in VLAN 10 and in VLAN 99. Firewall rules allow the relevant connections.

Client can reach OPNsense on VLAN 10: no problem, web-gui and SSH access working fine.
Client fails to reach OPNsense on VLAN 99: no access to web-gui and SSH.
Client can however reach other devices on VLAN 99 perfectly fine, just not OPNsense, so generally VLAN 99 connectivity seems to be working.

Now I switch the client to a native/untagged VLAN 99 switch port for verifying and connection immediately works fine.
Client can reach OPNsense on VLAN 99: web-gui and SSH access working fine.

In the first scenario with VLAN 10 untagged and VLAN 99 tagged, a packet capture on the OPNsense side sees a lot of TCP retransmissions. It looks like there is some kind of connectivity between the devices (TLS handshake), but something seems to fail.
I have attached an image of the packet capture and the pcap file from the session, if that helps.

The VLAN 99 interface on the client side is a virtual interface on the adapter that also holds the VLAN 10 connection - so both will share the same MAC address. Would that be an issue? I'd think switches can tell that apart and shouldn't have an issue with same MAC addresses in different VLANs and as connections to other devices on VLAN 99 work fine, I'd not think that would be an issue here?
#12
Updated to the new kernel yesterday and switched to Zenarmor emulated driver mode.
Unfortunately not even 24 hours later my Protectli VP2410 running Opnsense is completely unreachable via network, not only the Zenarmor protected interfaces, but also my separate interface on a management VLAN. Had to do a hard reboot to get it back online again, as currently I don't have serial console access at the location where it is installed.
For what it's worth, I was still able to get an IP via DHCP on the management interface, but couldn't access any services (web gui, SSH etc).
So possibly a hint that mainly TCP connections were affected.

At least in native driver mode the Zenarmor worker just crashes every 2-3 days and restarts automatically, so I only have a connection drop lasting a couple of seconds.
In emulated mode with the new kernel it doesn't really work longer than a few hours for me.

This is on a Protectli VP2410 with igb network interfaces, no virtualization, Opnsense installed directly on the hardware.
#13
Yes, I still see queue stalls with this kernel.
I have even gone through some effort to pass a hardware interface through to Opnsense to move away from vtnet onto an igb driver interface and now am no longer using Netmap generic mode (at least I no longer see it in dmesg) - but I still see the queue stalls where traffic stops flowing and i need to stop eastpect to get it to work again.

It's a regular occurrence here, I pretty much see it every 2-3 days. So if you want me to test/debug something, I can probably do it within that timeframe.

Can also switch back to generic mode easily if required for further testing.
#14
Unfortunately I just had Zenarmor pass out again on the netmap test kernel. I use VLANs on a vtnet interface, so I'm the classic case for this issue I guess.
Had to restart Zenarmor and then everything came back.
Let me know if you need any more specific information!

% uname -a
FreeBSD redacted.local 13.1-RELEASE-p5 FreeBSD 13.1-RELEASE-p5 netmap-n250377-0c47d02eefe SMP amd64


Actually, why did this start happening anyway? I never had these issues before like... idk, November 2022 or so?
I guess there have been kernel changes in this area that are now causing the issue, so it's good that it is being looked into, but I wonder if it wouldn't be easier to just roll back whatever change introduced the problem in the first place?
#15
I'm affected by the netmap/Zenarmor issue and will install the patch today to test. Thanks for bringing this forward! :)
Will report back in 2-3 days as it usually took a while for Zenarmor to get stuck on the old kernel.