Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - SirUffsALot

#1
Well, it seems that the hardware offload was the culprit after all.

I changed the setup to a vm with sr-iov under proxmox and had the same problem when I imported the old configuration.
With a fresh install (where the offload features were disabled) the problem disappeared and i can still hit about ~6gbit routing, which is enough for me.

Case closed. :D
#2
Hardware and Performance / mlx4en: CQE completed in Error
February 21, 2024, 02:14:46 PM
Hello, I have a strange problem.

For performance reasons, I switched from a Sophos XG105 to a Lenovo M720Q ThinClient (Pentium G5400T, 16GB RAM).
Initially, a Mellanox ConnectX-2 card was installed, but I replaced it relatively quickly with a Mellanox ConnectX-3 card (HP 546SFP+).

However, I have the following phenomena with both cards (At first i thought that the old one was broken):


Sometimes after a reboot (e.g. after updates), network connections are unusably slow. If I then connect a monitor, I see that there are kernel messages spamming with the following content (also see attached image):
"mlx4_en: mlxen0: CQE completed in error - vendor syndrom: 0xf9 syndrom: 0x5"
If I then restart the firewall via the opnsense menu, the error disappears in 75% of cases and everything runs as desired (even for weeks) - if not, a second reboot usually helps.

My setup consists of a single interface (but the plan is actually to use an LACP) with various VLANs. One of them with PPPoE.
Hardware CRC, TSO, LRO and VLAN Filtering is enabled (did not change the default) and i don't think that's the cause, because the problem is not persistent.

BIOS is up-to-date. Tried 23.7 and 24.1, both affected.

any ideas where i could start?

#3
The last week the system ran well and stable, but today a crash occurred again. BIOS is unfortunately already the latest installed, because the firewall model is EOL and the last Sophos version XG 17.5.17 was already installed before.

But I took the chance and reinstalled the system completely and restored the config.xml, maybe the behavior improves. At that time I installed OPNSense manually over FreeBSD so I could use ZFS.
#4
Thanks both of you for your input.

I have to see how to update the BIOS on this appliance. Sophos do not provide standalone update files, maybe it automatically updates when XG is installed. I have to try it.

I checked the c-states and it seems that the CPU only supports C0/1?


root@FWOPS01DEL:~ # sysctl -a | grep cx_
hw.acpi.cpu.cx_lowest: C1
dev.cpu.1.cx_method: C1/hlt
dev.cpu.1.cx_usage_counters: 130821134
dev.cpu.1.cx_usage: 100.00% last 89us
dev.cpu.1.cx_lowest: C1
dev.cpu.1.cx_supported: C1/1/0
dev.cpu.0.cx_method: C1/hlt
dev.cpu.0.cx_usage_counters: 251320055
dev.cpu.0.cx_usage: 100.00% last 32us
dev.cpu.0.cx_lowest: C1
dev.cpu.0.cx_supported: C1/1/0
#5
Hi,
Unfortunately, my OPNSense firewall has been randomly crashing for some time now. I cannot predict a pattern. Sometimes it happens after a week or two, sometimes within 24 hours. Mostly while low traffic (normal websurfing)
At first I thought it might be the combination of RAM disk and firewall logs, however the crashes continue to occur even after deactivation.
CPU temperatures seems normal.

Hardware/Configuration:
OPNsense 22.1.8_1-amd64
Sophos XG 105
Intel Atom Processor E3930 @ 1.30GHz (2 cores, 2 threads)
2048MB RAM
4x Intel I211
64GB SSD (ZFS)
No CARP or IPS in use.

Installed Plugins:

os-acme-client
os-ddclient
os-dmidecode
os-dyndns
os-git-backup
os-hw-probe
os-iperf
os-mdns-repeater
os-smart
os-telegraf
os-theme-cicada
os-udpbroadcastrelay
os-vnstat
os-wireguard (+ kmod)


Following tunables modified:

hw.ibrs_disable = 1
hw.igb.rx_process_limit = -1
hw.igb.tx_process_limit = -1
hw.mds_disable = 0
hw.pci.honor_msi_blacklist = 0
legal.intel_igb.license_ack = 1
net.inet.icmp.drop_redirect = 1
net.inet.ip.redirect = 0
vfs.zfs.arc_max = 256M
vm.pmap.pti = 0


I was able to record a crash message from the serial console. Unfortunately i cannot post it into this message due to the character limit, but i uploaded it on my pastebin service and attached it as a file to this post.
https://paste.biocrafting.net/?ce2a1af0e2c5d868#FZUKBAbbQVpNkTaEyVsvc979ggYSfitZFNvNfZYR2njW

Has somebody any idea what can causes the crashes?

Best regards
#6
Hab meine XG105 mit einem Draytek Vigor 130 am laufen. Gibt es gebraucht für ~45€ und reicht für Vectoring vollkommen aus. Wenn man sonst keine Funktionen der Fritzbox braucht, würde ich die verkaufen und dann mit ein wenig Gewinn rausgehen.

Tut bei mir einwandfrei seinen Dienst, aber ist halt leider wieder ein extra Gerät.. als SFP Modul ist das natürlich schon eine elegantere Lösung. Schade das es da kaum was gibt.