Traffic (DSCP) priority- Normalization, shaper, or interface rules?

Started by Lakkiada, November 05, 2024, 08:25:59 PM

Previous topic - Next topic
Greetings! Here's what I'm attempting to accomplish:

I want qos/sqm on to fight bufferbloat thus eliminating lag/stuttering/buffering in voip/conference calls for work machines, and for game systems such as Nintendo Switch and PS5. I also want to prioritize work/game machines over IoT devices, smart phones etc.

Here's where I'm at:

Shaper is enabled with fq_codel to combat bufferbloat. Which is working as expected in that I receive an A+ using the waveform test, as well as cloudflare, ookla etc.

Where I'm running into issue is that I cannot get Opnsense to actually prioritize traffic to/from specific machines - above other traffic, i.e. IoT devices. The VLAN for IoT is already flagged for lowest priority at the interfaces->other types->VLAN "PCP = Background(1,lowest)". Config in the shaper, queuing weights are used and in rules apply a DSCP value.

I have also made rules in the firewall->settings->normalization that mimic the shaper rules, i.e. same direction/source/destination, dscp value etc. to raise work machines to C5 and lower IoT to default (0x00).

My question is this:

What is the correct way to do this? Do normalization rules override shaper rules? Do you need both? Do DSCP values also need to be set in the interface (LAN/WAN/Floating) rules?

I'm on a asymmetrical 600/20 coax line, however, heaven forbid someone starts a Netflix stream on an Ipad - calls / games latency is all over the place.. any help, tips or advice is appreciated  8)

I think there is a bit misconception how this stuff works.

QuoteThe VLAN for IoT is already flagged for lowest priority at the interfaces->other types->VLAN "PCP = Background(1,lowest)"

This is a L2 feature in order to utilize this a switch would have to sent a frame with the PCP value to OPNsense.


QuoteConfig in the shaper, queuing weights are used and in rules apply a DSCP value.

Rules in Shaper only match e.g classify they do not mark a packet with a specific DSCP value.

Quote
I have also made rules in the firewall->settings->normalization that mimic the shaper rules, i.e. same direction/source/destination, dscp value etc. to raise work machines to C5 and lower IoT to default (0x00).

This does mark the packet with the specific DSCP if I am not wrong, on which the shaper can classify the packet.

However there is no reason to do this as you can classify based on 5-tuple. If you would have another device that would mark the packets before it hits the OPNsense or an application on an Endpoint would do it, you could then use OPNsense to take advantage of rules based on DSCP values.


QuoteWhere I'm running into issue is that I cannot get Opnsense to actually prioritize traffic to/from specific machines - above other traffic, i.e. IoT devices.

if you use FQ_Codel, e.g. Flow Queueing with Controlled Delay, this is expected behavior. Its called as well Fair Queueing with Controlled Delay.

https://docs.opnsense.org/manual/how-tos/shaper_bufferbloat.html


FQ_C will divide the BW capacity equally amongst all endpoints. Lets say you have 600Mb download, If 3 hosts wants to go fast as possible each of them will get 300Mb, etc.


If you have still performance issues with FQ_C while the WAN is in congestion, you can tune it further. Keep in mind one of the most important parameters to have when using FQ_C is to set the BW properly into the shaper > pipe

If you want to do BW prioritization you have to use WFQ not FQ_C.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
APU2D2 - deceased
N5105 - i226-V | Patriot 2x8G 3200 DDR4 | L 790 512G - VM HA(SOON)
N100   - i226-V | Crucial 16G  4800 DDR5 | S 980 500G - PROD

Thank you for clarification on some of the feature sets.

QuoteThis does mark the packet with the specific DSCP if I am not wrong, on which the shaper can classify the packet.

However there is no reason to do this as you can classify based on 5-tuple. If you would have another device that would mark the packets before it hits the OPNsense or an application on an Endpoint would do it, you could then use OPNsense to take advantage of rules based on DSCP values.

The issue is on the WAN asymmetrical connection. On the LAN side, everything is connected 1Gbs full duplex and see no issues to/from the media server (JellyFin) etc. I have a Netgear 308gs that is currently configured for port based priority (high, medium, low) but does allow DSCP values.

All the rules in OPNSense are based on 5-tuple info, except given the nature of usage- "any" must be used for either source or destination as it is unknown (no set server, etc) and there seems to be an issue if ports or ips are not specifically declared as seen here: https://forum.opnsense.org/index.php?topic=24756.0

QuoteIf you have still performance issues with FQ_C while the WAN is in congestion, you can tune it further. Keep in mind one of the most important parameters to have when using FQ_C is to set the BW properly into the shaper > pipe

If you want to do BW prioritization you have to use WFQ not FQ_C.

In all of my testing with QoS setups - WFQ does not mitigate bufferbloat and latency is all over the place for real time traffic. Fq_Codel does, and definitely helps when the house is busy.

However, the issue is actually prioritizing individual devices for throughput priority NOT bandwidth sharing (i.e. all traffic from my desktop-> process first). Which is why I need to flag specific things as voice etc. Bandwidth is not the issue.

For a contrast: DDWRT QoS allows for FQ_Codel (uses HTB or HFSC) to be used directly in conjunction with priority based on traffic type, ip address, mac address etc and can very simply accomplish my goal all on one page. Unless I have misinterpreted what is happening there and it is merely bandwidth sharing as well.

So my original question remains: What is the correct way to do this in OPNSense?

To further illustrate my point, most of the applications in use (even real time game traffic) are only using between 250-500Kb. However, if anyone starts a show on Netflix or any other streaming service, that traffic simply trashes the throughput of everything else. Given today's culture with so many people working from home etc, I imagine this is a very common request and/or scenario..

Am I missing something silly or is a second device/other shaper needed to accomplish per machine MAC/IP priority?

The sort answer to your question is no, not with FQ_C.

The long answer >

Quote from: Lakkiada on November 06, 2024, 12:14:53 PM
Thank you for clarification on some of the feature sets.

The issue is on the WAN asymmetrical connection. On the LAN side, everything is connected 1Gbs full duplex and see no issues to/from the media server (JellyFin) etc. I have a Netgear 308gs that is currently configured for port based priority (high, medium, low) but does allow DSCP values.

All the rules in OPNSense are based on 5-tuple info, except given the nature of usage- "any" must be used for either source or destination as it is unknown (no set server, etc) and there seems to be an issue if ports or ips are not specifically declared as seen here: https://forum.opnsense.org/index.php?topic=24756.0


The point is DSCP as such doesn't do any prioritization, its only used for classify the traffic. The schedulers and queue management does that. You only use the DSCP or 5-tuple to tell into which packet goes where. Usually when you have a large network and several Routers, what you do is to define per 5-tuple what packets belongs to which DSCP value, this is then simpler to configure across network. Because you mark the packet on the start of its journey "GW" and on rest of the routers you classify per DSCP. Same goes for applications if they can already set a DSCP value into the packet you can use it to classify them into the proper queue.


Quote
In all of my testing with QoS setups - WFQ does not mitigate bufferbloat and latency is all over the place for real time traffic. Fq_Codel does, and definitely helps when the house is busy.

However, the issue is actually prioritizing individual devices for throughput priority NOT bandwidth sharing (i.e. all traffic from my desktop-> process first). Which is why I need to flag specific things as voice etc. Bandwidth is not the issue.

For a contrast: DDWRT QoS allows for FQ_Codel (uses HTB or HFSC) to be used directly in conjunction with priority based on traffic type, ip address, mac address etc and can very simply accomplish my goal all on one page. Unless I have misinterpreted what is happening there and it is merely bandwidth sharing as well.

So my original question remains: What is the correct way to do this in OPNSense?

This is the expected behavior, WFQ is not an SQM or AQM. FQ_C is designed to combat bufferbload. As mentioned you can not do prioritization with FQ_C as it equally shares the BW across hosts.

There is no such thing as throughput priority. There is a however a queue priority, basically and usually priority queues sent packets before all other queues. In FQ_C there is no Priority queue. FQ_C is a scheduler  (FQ scheduler + Codel AQM) that creates to user invisible sub-queues e.g flows per 5-tuple. Its a bit funny because what you actually have is:

Queue >> Sub-queues/flows (Codel)
                                      > Scheduler (FQ)  > Pipe

The 1st queue is the queue you can configure and setup via Queue, the Flows are created and managed by FQ_C. And in this Sub-queues is the bufferbloat magic happening as described in the docs.

HFSC has a possibility to configure a priority queue or rate limit. These use the FreeBSD ALTQ framework. However its not possible to configure them via GUI (or at least I didn't see the option). Nor is possible to use HFSC with FQ_Codel on OPnsense you can have only one scheduler.



Based on this you have 3 options:
1. Tune FQ_C properly, the tests on internet are not 100% bullet proof. They can not cover all edge scenarios.

2.  Use FQ_C, but try to classify traffic into separate Queues. If you use only a Single Queue per DL and UP this is usually enough, but in certain edge cases this can be a bottleneck before it reaches FQ_C flows. So creating Manual Queues and classify traffic into them can help.

3. If you care about prioritization, you can go one step back and use WFQ as scheduler and turn on Codel in the Queue. This may not perform as good as does FQ_C, but you will get prioritization based on Weights.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
APU2D2 - deceased
N5105 - i226-V | Patriot 2x8G 3200 DDR4 | L 790 512G - VM HA(SOON)
N100   - i226-V | Crucial 16G  4800 DDR5 | S 980 500G - PROD

First and foremost @Seimus -- Thank you for staying patient with me and not interpreting my questions as argumentative or getting frustrated. I greatly appreciate the explanations and you have helped clarify some of the things I've read- such as Fq_Codel ignoring queue weights. Which also explains why I haven't been able to accomplish the goal.

Quote3. If you care about prioritization, you can go one step back and use WFQ as scheduler and turn on Codel in the Queue. This may not perform as good as does FQ_C, but you will get prioritization based on Weights.

Thanks again for pointing in the right direction and clarifying information in my brain  ::)

Okie Dokie-- At least in initial testing, WFQ does not alleviate the issues at all. Latency spikes through the roof and nearly unusable for real time traffic (even with Codel enabled and weights configured etc).

Since we cannot configure an AQM via Gui- How can this be done via console?? Any resources to point me to?

It seems to me, perhaps this is a feature the Dev team should consider adding.. We can fine tune Fq_Codel, yet have zero options in regard to the mechanisms it relies on... just sayin :)

In Pipe you configure the scheduler, the Weights are configured in the Queues.

Do not enable Codel in the Pipe, cause the Codel in Pipe is only turned on for dynamic queues e.g if you dont use manually created Queues.

So do this:

Pipe:
- Configure BW
- Scheduler WFQ
- Everything else blank

Queues:
- Create Queues how many you need per a specific service
- Set proper Weights, the higher the weight the more BW chunk the classified traffic will get. Image Weight as a ratio of the total BW configured in a Pipe
- Create separate DL and UP Queues
- Enable Codel on the Queues

Rules:
- Create Queues how many you need per a specific Queue to classify the packets into that specific Queue
- Create separate DL and UP Rules to classify the packet to the specific Queues


Go thru > https://docs.opnsense.org/manual/how-tos/shaper_prioritize_using_queues.html

Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
APU2D2 - deceased
N5105 - i226-V | Patriot 2x8G 3200 DDR4 | L 790 512G - VM HA(SOON)
N100   - i226-V | Crucial 16G  4800 DDR5 | S 980 500G - PROD

QuoteSince we cannot configure an AQM via Gui- How can this be done via console?? Any resources to point me to?

You can configure AQM via GUI, its either FQ_Codel or Codel.

Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
APU2D2 - deceased
N5105 - i226-V | Patriot 2x8G 3200 DDR4 | L 790 512G - VM HA(SOON)
N100   - i226-V | Crucial 16G  4800 DDR5 | S 980 500G - PROD

Quote from: Seimus on November 08, 2024, 04:01:52 PM
In Pipe you configure the scheduler, the Weights are configured in the Queues.

Do not enable Codel in the Pipe, cause the Codel in Pipe is only turned on for dynamic queues e.g if you dont use manually created Queues.

So do this:

Pipe:
- Configure BW
- Scheduler WFQ
- Everything else blank

Queues:
- Create Queues how many you need per a specific service
- Set proper Weights, the higher the weight the more BW chunk the classified traffic will get. Image Weight as a ratio of the total BW configured in a Pipe
- Create separate DL and UP Queues
- Enable Codel on the Queues

Rules:
- Create Queues how many you need per a specific Queue to classify the packets into that specific Queue
- Create separate DL and UP Rules to classify the packet to the specific Queues


Go thru > https://docs.opnsense.org/manual/how-tos/shaper_prioritize_using_queues.html

Regards,
S.

This is exactly the current testing config. No go. Latency is much higher and very inconsistent.

excerpt from: https://openwrt.org/docs/guide-user/network/traffic-shaping/sqm

Quote"SQM is an integrated system that performs per-packet/per-flow network scheduling, active queue management (AQM), traffic shaping, rate limiting, and QoS prioritization. In comparison, "classic" AQM only manages queue length and "classic" QoS only does prioritization."

excerpt from: https://www.bufferbloat.net/projects/cerowrt/wiki/Smart_Queue_Management/

Quote"Smart Queue Management", or "SQM" is shorthand for an integrated network system that performs better per-packet/per flow network scheduling, active queue length management (AQM), traffic shaping/rate limiting, and QoS (prioritization).

"Classic" QoS does prioritization only.

"Classic" AQM manages queue lengths only.

"Classic" packet scheduling does some form of fair queuing only.

"Classic" traffic shaping and policing sets hard limits on queue lengths and transfer rates

"Classic" rate limiting sets hard limits on network speeds.

In reading: https://gist.github.com/bradoaks/940616

It appears that what I'm wanting to adjust in OPNSense- which is currently not including in the GUI, is the "congestion avoidance algorithms" (HFSC vs HTB etc). Is that correct or have I misunderstood some terminology or definitions here?

This is not fully true >
QuoteIn comparison, "classic" AQM only manages queue length and "classic" QoS only does prioritization."
FQ_C does per flow per packet scheudling and queue management.

Anyway,

That what they call SQM e.g OpenWRT SQM is basically HTB + FQ_C.

This is a special take on bufferbloat, where they combined HTB + FQ_C It was created to overcome some short comings that FQ_C has, such as rate limiting.

OpenWRT SQM is not available in FreeBSD. HTB & HFSC are not available in OPNsense because I think the ALTq is not supported in OPNsense.

The closes you can go to what OpenWRT SQM offers is to use WFQ or other weighted scheduler with Codel in the Queues. But WFQ is not HTB nor HFSC, and Codel is not as good as FQ_C.



Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
APU2D2 - deceased
N5105 - i226-V | Patriot 2x8G 3200 DDR4 | L 790 512G - VM HA(SOON)
N100   - i226-V | Crucial 16G  4800 DDR5 | S 980 500G - PROD

After reading (https://forum.opnsense.org/index.php?topic=15652.0) and looking over (https://en.wikipedia.org/wiki/Differentiated_services) I have my shaper rules to match based on DSCP values that react to AQM and set my switch to also act based on DSCP values.

Is the correct way to actually set the DSCP values in OPNSense to use normalization rules (which can be applied to all interfaces) or is it better to create a floating rule or per interface rules?

Just found this:

Quote from: mimugmail on July 21, 2019, 07:45:23 AM
Yes, maybe you missintrepreted something.
Just a quick recap:

There are two methods to filter in FreeBSD: ipfw and pf (there are some older ones too).
In the beginning pf was the standard and all GUI stuff is based on this. Sadly pf under FreeBSD isn't very active (compared to OpenBSD) and more development goes into ipfw. But the work to rewrite all the GUI stuff would take too much time and is way too error prone since so many ppl use this in very complex ways. Don't get me wrong, pf is still the way to go, no downsides in security. Current shaping technology is only developed on ipfw so the OPN guys build a way to use both, pf for filtering and ipfw for shaping. You can in theory mark packets with DSCP values via pf (firewall rules), but you can't match them afterwards. May I have to recheck this when I find more time, perhaps I didn't test everything.

So in sum, if you want to speed up DNS, you don't need EF, you can just use the rule as a condition to give DNS more weight or bandwidth. The only way where DSCP really makes sense is in big enterprises where edge switches already mark the packets with DSCP. Then you don't need tons of rules to and link them to queues/pipes. You can just have any/any rules with a gives DSCP match.

It would seem that I'm wanting to blend two incompatible systems, unfortunately my switch cannot tag the DSCP only filter based on them.

**EDIT: After investigating further with WireShark, I was able to see DSCP values of "Class Selector 5" etc at the clients. Marking via normalization appears to be working. Additionally, I adjusted my shaper rules to be any/any and merely match on the DSCP values- the pipes are showing activity and the desktop is achieving A+ on bufferbloat - Shaping based off DSCP appears to be working. Further testing to be had..