DEC3920 Quick Review

Started by dirtyfreebooter, April 03, 2026, 10:25:45 PM

Previous topic - Next topic
yea i did upgrade the i226 firmware on my VP2440 to 2.32. i would not upgrade the DEC3920 without Deciso blessing or if i decide to keep the hardware. i am currently trying to run a bunch of iperf3 / packet-sender tests in an isolated setup, see if i can reproduce or if its only the ONT connection.

i put the VP2440 back in use sunday night and of course it has worked flawlessly since then. 100% uptime on connection and ONT, etc

Quote from: newsense on April 07, 2026, 10:13:20 AMIs the issue reproducible on other ports?

Can you swap igc0 with another port to see if the same drops happen?

If you can confirm the issue on two ports then the next step I'd try after confirming with Deciso would be to update the firmware on one of the ports to 2.31 which is the latest we have publicly available and see if the issue persists.

yea, in my latest tests, i am trying WAN on igc0 and igc1


Another thing I didn't see mentioned here:

Did you try disabling the auto negotiation on the ports ?

What switch is it connected to then?

igc (on other hardware) and Netgear was a bad mix on my end and we replaced the switch with a Cisco for that reason.


Cheers,
Franco

On my end I have two DEC750 with igc connected to a Netgear MS510TXPP stack and its rock solid, even with auto negotiation (2.5gig is negotiated).

Router is a Fritzbox and both igc are also directly connected there, no link drops.

I guess it really depends...
Hardware:
DEC740

Quote from: franco on April 08, 2026, 11:16:32 AMWhat switch is it connected to then?

igc (on other hardware) and Netgear was a bad mix on my end and we replaced the switch with a Cisco for that reason.


Cheers,
Franco

it was connected directly to my Calix 711 GE ONT. which is the same setup i have had for > 5 years. and other igc based firewalls, odroid h4 ultra, aliexpress mini pcs, protectli vp2440, etc. but of course DEC3920 is different and has a different i226 firmware then all the others.

so i put the DEC3920 back in lab mode, isolated with a linux server on both sides. i put over 10TB of traffic through, and not a single issue. i even put a 1gbe switch on the WAN side so that the 2.5g wan would autonegotiate @ 1 gbps.

so last night i pulled the power from the ONT for 5 minutes, i know, its dumb. put the DEC3920 back, powered back the ONT. going to see if that helps, as i was not able to reproduce the behavior. if it goes out again, i will try putting that dumb switch in between the ONT and WAN.

maybe some progress, but this is all just certainly --> try this, try that. after running a bunch of tests in the lab, i made some minor tweaks by iterating with AI over sysctls by feeding AI netstat -Q and sysctl dev.igc and retesting with iperf3 and packet-sender.

i put the DEC3920 back on my WAN, and at this point, i have had the longest uptime since i got the device.

# uptime
10:24AM  up 2 days, 1 hr, 5 users, load averages: 1.17, 0.68, 0.52

  • i pulled the power from the ONT for 5 minutes, before connecting the ONT to DEC3920.
  • tested all cables with Klein Tools VDV526-200 Cable Tester

i switched the WAN to igc1, which interrupts are mapped to cpu 4,5,6,7
changed ax0 cpu offset to 1, it only has 3 interrupt queues, mapping it to cpus 1,2,3
leaving cpu 0 for system interrupts.

this was based off looking at the interrupt counters.

now, i kinda think this is all ridiculous for 1 Gbps, when this system was 10g and 2.5g interfaces. I would think the CPU could just plow through 1 Gbps, but it doesn't. especially when you add ZenArmor and Wireguard into the mix.

Tunables

change the cpu offset from 0 to 1 for ax0, mapping tx queues to cpu 1, 2, 3
dev.ax.0.iflib.core_offset=1

igc tweaks
these come from looking at r_drops, r_stalls on the tx queues from sysctl dev.igc.1.iflib while running iperf3 at 1 Gbps

dev.igc.0.fc=0
dev.igc.1.enable_aim=0
dev.igc.1.fc=0
dev.igc.1.iflib.override_nrxds=4096
dev.igc.1.iflib.override_ntxds=4096
dev.igc.1.iflib.rx_budget=512
dev.igc.2.fc=0
dev.igc.3.fc=0
hw.igc.max_interrupt_rate=64000

added to help spread the packets over the igc queues, otherwise 1 cpu processes everything (from netstat -Q)
kern.ipc.nmbclusters=2000000
net.inet.ip.intr_queue_maxlen=4096
net.inet.rss.bits=3
net.inet.rss.enabled=1
net.isr.bindthreads=1
net.isr.maxthreads=-1

added by zenarmor
dev.netmap.buf_num=1000000
dev.netmap.ring_num=1024
dev.netmap.admode=2

things i added to prevent netmap queue full messages in dmesg
dev.netmap.generic_rings=4
dev.netmap.generic_ringsize=2048

factory defaults
hw.ibrs_disable=1
vm.pmap.pti=0
ice_ddp_load=YES

i wish could just turn off ASPM for the igc NICs in the BIOS/firmware.

another alternative i thought was to just use ax1 for WAN using a SFP to RJ45 adapter and ignore the i226 ports.

Any idea what "errors out" indicates? (Looks like "oerrs" from "netstat -i".) Does it increment during use? Speculating: Could be something as innocuous as cosmetic buffer evictions. I don't know of much that Ethernet hardware can detect on transmit, so I would tend to suspect a driver-level issue. If you want to look at it, a managed switch (inserted into the link) might show useful data... or not.

Quote from: pfry on April 10, 2026, 07:49:11 PMAny idea what "errors out" indicates? (Looks like "oerrs" from "netstat -i".) Does it increment during use? Speculating: Could be something as innocuous as cosmetic buffer evictions. I don't know of much that Ethernet hardware can detect on transmit, so I would tend to suspect a driver-level issue. If you want to look at it, a managed switch (inserted into the link) might show useful data... or not.

Name        Mtu Network           Address                           Ipkts Ierrs Idrop      Opkts Oerrs  Coll
...
vlan0.201  1500 <Link#16>         f4:90:ea:01:ef:cd             444345350     0     0  277320308   767     0

though, 767 errors out of 277,320,308 packets.. that is probably on the range of normal i would think...

i haven't been monitoring that, because well, before, my WAN would just completely out without any message, until i unplugged the WAN cable physically (no amount ifdown/ifup ever fixed it), so i just been waiting for the WAN to go out and not watching things extremely close, since prior i got 5x WAN outages in 1.5 days.

Quote from: dirtyfreebooter on April 10, 2026, 09:55:10 PMthough, 767 errors out of 277,320,308 packets.. that is probably on the range of normal i would think...

Yet, any number significantly greater than zero in production would spark my curiosity. With Gigabit and above full duplex, flow control, etc. layer 2 errors should just not happen.

If you reboot the connected switch for an update or unplug and replug the cable(s) while the firewall is in full operation - yes. But you should be able to name the cause easily because it was something like that. As soon as the interface goes down no output errors should occur, either. So in these reboot/unplug cases they come from the fraction of a second when the other side is disconnected and OPNsense has not yet noticed that.

On my system I have single digit numbers probably from my last Mikrotik update. If 767 errors out of 277,320,308 fits that bill in your scenario is yours to judge.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

April 10, 2026, 10:30:05 PM #41 Last Edit: April 10, 2026, 10:46:35 PM by dirtyfreebooter
Quote from: Patrick M. Hausen on April 10, 2026, 10:16:40 PM
Quote from: dirtyfreebooter on April 10, 2026, 09:55:10 PMthough, 767 errors out of 277,320,308 packets.. that is probably on the range of normal i would think...

Yet, any number significantly greater than zero in production would spark my curiosity. With Gigabit and above full duplex, flow control, etc. layer 2 errors should just not happen.

If you reboot the connected switch for an update or unplug and replug the cable(s) while the firewall is in full operation - yes. But you should be able to name the cause easily because it was something like that. As soon as the interface goes down no output errors should occur, either. So in these reboot/unplug cases they come from the fraction of a second when the other side is disconnected and OPNsense has not yet noticed that.

On my system I have single digit numbers probably from my last Mikrotik update. If 767 errors out of 277,320,308 fits that bill in your scenario is yours to judge.

yea, i am going to watch these values from now on, i had the ONT powered off when i booted the DEC3920, then after it was booted, i restored power, so those could have all been from that initial power on of the ONT, maybe the port flaps while initializing, who knows how this cheapo ONT reacts when initializing.

IMO, i still think this is related to i226 firmware/driver and ASPM. looking at my VMs i do some opnsense development on, they all have zero errors, but much less data going through and they are not connected to the cheapest hardware i own, which is the ONT. i wish intel would bring the freebsd igc driver up to par with linux, as i have never had a single issue with i226 and linux.

i will give this about a week, then i might try and use the other SFP port with an UniFi 1G SFP to RJ45 adaptor (it listed on the supported SFP transceivers), just to compare.

This is a commercial device with great performance especially in relation to power consumption, well worth the money, IMHO, but still a commercial solution. So if ASPM is an issue, it's Deciso's job to provide a BIOS that either categorically disables it or at least gives you the option to.

Just saying ...

Not a big fan of 2.5 Gbit/s anyway. 1 G is enough for all connected desktop systems and 10 G fibre is just better than anything else up to that speed.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

ok it already increased...

vlan0.201  1500 <Link#16>         f4:90:ea:01:ef:cd             446025589     0     0  278611733  1609     0

it did seem to happen on the WAN DHCP renewal.. i have quantum fiber, the DHCP leases are for 30 min, so it RENEWs every 15 minutes...

2026-04-10T14:55:01-06:00 Notice dhclientdhclient-script: Reason RENEW on vlan0.201 executing

so i am going to see it happen there is anything going on there. i guess i could also put my VP2440 back in service, i never looked at this values before, so i have no clue if this is normal for my setup or was happening all along. DEC3920 WAN going completely out was new though..

Then again an ONT is not a switch - so as long as the numbers are zero or close to it for everything internal, it's quite plausible it's not Deciso's fault. I would insist on getting a statement on that ASPM setting, though. I don't have their latest generation of devices, never had a problem with the 1 G versions.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)