DEC697 vs virtual build / Inbound Performance difference

Started by lukas.liechti, December 13, 2025, 02:46:44 PM

Previous topic - Next topic
December 13, 2025, 02:46:44 PM Last Edit: December 15, 2025, 09:18:14 AM by lukas.liechti Reason: better reading
Shortly some information what i had and what happends.

Befor i bought the DEC697 i was running opnsense on a VM (4 Cores and 4 Gig RAM) with a NIC (you can find it in the link).
NIC: 2-Port- Gigabit Ethernet-Networkcard

The VM runs on Proxmox and i was getting 1 Giga IN and OUT without any problems. I only had low VPN-Connections (aprox. 10Mbit).
I thought it would be about the encrypting and so on which is no well possible in a VM.
This was running now around 1 year and i wanted to get better VPN Performance and also get the Firewall out oft he virtualisation.
The VM-Firewall was also running on 25.7 the lasts months.


Now i got the DEC697, with ist 5 Gbit of Firewall and 600 Mbit of VPN (IPSec).
I did the following:
  • 2 days ago i installed and update the firmware (also to 25.7 over serial-interface).
  • Imported the Backup from the old Firewall (VM-Version Community) also into the Community (DEC697)
  • Reasigned the Interfaces, changed the rules.
  • Confirmed the changes and was after a few minutes back online
  • I did a speedtest and had seen the following results


Traffic Inbound / Outbound / First 600Mbit download / Second 950Mbit Upload ??


Virtual Machine / Performance


DEC679 / Performance

How to finde the bottelneck of the probleme here?

December 15, 2025, 07:10:08 PM #1 Last Edit: December 15, 2025, 07:17:51 PM by lukas.liechti
after some more research i found the problem. There is still one question about.


Offical Docs Opnsense:
https://docs.opnsense.org/troubleshooting/performance.html


net.isr.maxthreads = -1
net.isr.bindthreads = 1
Add another tunable. This time, we're allowing NIC drivers to use ISR queues.
net.isr.dispatch = deferredNext up is to add tunables enabling RSS. (Note that net.inet.rss.bits should be set to the square root of how many cores you have.)
net.inet.rss.enabled = 1
net.inet.rss.bits = 2

for other systems (DEC697 has 4-cores):
net.inet.rss.bits = xfor 4-core systems, use '2'
for 8-core systems, use '3'
for 16-core systems, use '4'
Etc.


next is working on latency but that has now a bit time.
can be closed :)

Quote from: lukas.liechti on December 15, 2025, 07:10:08 PMAdd another tunable. This time, we're allowing NIC drivers to use ISR queues.
net.isr.dispatch = deferred

Lukas, I was aware of the other tunables but I did not find this particular one in the Opnsense docs, whether on the page yourreference or in a search. I did find it offered by Gemini.

Are you able to comment further on the source for this one please, and its actual effects? My reading of the referenced page is that it may be unnecessary.
Deciso DEC697

The setting alters the way the incoming packets (which are signaled via interrupts) are handled, namely directly or deferred, by putting it into a queue to handle multiple packets more effiently in one go rather than immediately. There are some more tuneables in net.isr to limit how long the queue can get and others:

net.isr.numthreads
net.isr.maxprot
net.isr.defaultqlimit
net.isr.maxqlimit
net.isr.bindthreads
net.isr.maxthreads
net.isr.dispatch

See this discussion: https://github.com/opnsense/core/issues/5415 and many others about the neccessity of this setting (i.e. either deferred or hybrid) for ppp-type links.

The general recommendation for PPPoE on WAN is to use:

net.isr.dispatch: deferred or hybrid
net.isr.maxthreads: -1
net.isr.bindthreads: 1

However, different NIC types also have different handling. Some NICs coalesce multiple packets into only one interrupt in hardware already, so a hardware switch can make things different.
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

Thank you for the additional explanation, meyergru. From the links I conclude that it is a case of test in your own environment. I had maxthreads and bindthreads set, with dispatch set now. I might re-do the process with testing.
Deciso DEC697

When you have a longstanding experience, you know how such low-level things work.

I remember that I once wrote a printer buffer for MS-DOS. The machine hardware had a parallel port, which gave you a status of if it can handle the next byte to output. The default implementation of the "print one character" system call was to do a busy wait for a clear status and only then output the character. This was a synchronous process.

Needless to say, that with a non-concurrent OS like MS-DOS, the whole machine was blocked until the print process was finished. The remedy was to use a memory buffer queue and append every character there. In addition to that, there was a timed event that then tried if the status was clear and send as many bytes as it could from the buffer.

That way, if the buffer size was sufficient, you could "print" a job in virtually no time and continue work, while the real printing was done asynchronously in the background. Also, there were much less busy waits, so the overall overhead was reduced.

In reality, a try to send as many bytes as possible was also made at the end of the "print one character" call, but those byte were of course first taken from the buffer in order.

I guess you can see the similarity to net.isr.dispatch = "deferred": With "hybrid", the NIC interrrupt is potentially handled immediately. That way, you get the best of both worlds, because with "deferred", there can be small latencies.

P.S.: I know - "war stories"... ;-)
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

I understand. I did similar for serial printer output (and keyboard input) on a CP/M machine, but that was a long time ago.
Deciso DEC697