Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - emzy

#1
> 1. Use iperf -P8 to measure real throughput.
I'll try this out, but even if it performs better there's still the question of why a single stream performs so much worse than stock FreeBSD or Linux.

> 2. Disable add-ons like crowdsec, suricata and zenarmor.
I'm not running any of these.

> 3. Try vtnet, not pass-through NICs.
I'm using vtnet (i.e. virtio), not passthrough.

> 4. Enable RSS.
I've tried this previously and didn't notice any difference. I'll give it another try in my next round of testing though.
#2
I'm running all of the VMs on a Synology NAS using their somewhat basic VMM platform which I'm 99% sure runs KVM and QEMU. I think the comparisons between OPNsense and the Linux and stock FreeBSD machines are important since the results indicate that there's something unique happening with OPNsense (it runs on FreeBSD after all).

I don't think the issue has anything to do with the hardware NICs because I'm using virtio and not passing them through to directly OPNsense. The fact that I can't exceed ~1.3 Gbps between the VM host and the OPNsense VM when the traffic only passes through the vswitch is also suspect since that traffic shouldn't touch the physical NIC.

I am planning to test OPNsense on another VM host when I have time, possibly next weekend, to gather more data. My best guess now is that OPNsense is doing more heavyweight packet processing, but that doesn't make complete sense because it only uses 2x the CPU of FreeBSD but has ~10x less throughput in the host to guest iperf3 test.
#3
There are many similar posts on this forum, Reddit, and others complaining about poor performance running OPNsense as a VM. I've read through many of them which describe the same symptoms, but haven't seen any which actually root-caused the issue. I'm hoping the developers or someone more familiar with OPNsense than me can offer some suggestions for what the culprit may be.

Using iperf3, I can easily saturate a 2.5 Gbps line between the client machine and both Linux and FreeBSD VMs running on my VM host when both are on the same VLAN. If I run the iperf3 server on the VMs and the client on the VM host, I get around 18 Gbps of throughput. I checked top on the FreeBSD VM during the test and see around 40% CPU usage on interrupts. For FreeBSD, I ran the test with network hardware offloading disabled.

CPU:  0.4% user,  0.0% nice,  9.8% system, 37.1% interrupt, 52.7% idle

For OPNsense however, I can only achieve around 1.3 Gbps of throughput, regardless of whether the OPNsense box is the iperf3 server or simply routing traffic between vlans for the iperf3 client and server on different machines. Even while running iperf3 on the OPNsense VM and running the client on the VM host I get the same 1.3 Gbps of throughput. With stock FreeBSD and Linux I get around 18 Gbps in that scenario.

Under these loads OPNsense shows relatively high cpu usage for interrupts, around 84% vs 35-40% for stock FreeBSD. Even still, that core is still 15% idle, so I'm not sure if I'm hitting a cpu limit.

CPU 0:  0.8% user,  0.0% nice,  1.2% system,  0.4% interrupt, 97.7% idle
CPU 1:  0.0% user,  0.0% nice,  0.8% system,  0.4% interrupt, 98.8% idle
CPU 2:  0.0% user,  0.0% nice,  0.4% system, 84.0% interrupt, 15.6% idle
CPU 3:  4.3% user,  0.0% nice,  0.4% system,  0.8% interrupt, 94.5% idle


All of the test VMs are using the same hypervisor configuration with virtio nics, but OPNsense has substantially lower throughput. I'm not running any sort of IDS or IPS on OPNsense. I've tried applying the various tuneables that are often mentioned in these threads, but nothing has helped in any substantial way.

I'm not sure this can be fixed, but I'd love to understand why it's happening. OPNsense is burning substantially more CPU than stock FreeBSD during the iperf3 test, but it's not burning 10x the CPU even though it's getting at least 10x less throughput in the VM host to guest test.

Does anyone have any insight into what's happening here? Is OPNsense just doing heavyweight processing for every packet and topping out at 1.3 Gbps?
#4
I asked this as a more general question about the algorithm on the codel mailing list and learned a lot. You can read the thread here:

- https://lists.bufferbloat.net/pipermail/codel/2024-December/002512.html

Quoting Dave Taht who implemented fq_codel on Linux:

> Further most ISPs use a non-native rate for their customer interfaces,
> either using a policer or FIFO shaper and thus the rise of combatting
> that with shaping via fq_codel to slightly below the ISPs' rate to
> move the bottleneck to your own hardware.

So essentially you need to move the bottleneck from the ISP to your hardware so that fq_codel has the back pressure it needs to work properly.
#5
As far as I can tell, the fq-codel algorithm (https://www.rfc-editor.org/rfc/rfc8290.html) doesn't use link bandwidth as a parameter. From reading the description, it seems like it should be able to work without knowing the link bandwidth since it decides to drop packets based on the amount of time they have sat in a queue. In fact, the original codel rfc (https://www.rfc-editor.org/rfc/rfc8289#section-4.1) even says that it's designed to be non-starving and work over variable bandwidth links.

So, why does configuring fq-codel in opnsense require the link bandwidth to be specified? I tried setting the bandwidth to a large number above the actual link bandwidth and then manually setting the fq-codel parameters to their defaults, but that resulted in poor performance. My concern, of course, is that I'm leaving performance on the table by artificially limiting the bandwidth at a value that may be less than the actual available bandwidth depending on the time of day, etc.

So concretely, what does opnsense do with the bandwidth number and how does that affect the performance of fq-codel? Is there a way to configure fq-codel without needing to know the link bandwidth?
#6
Ok Maurice, I dug through the code this morning and I think I've figured out exactly what's happening. I think you might have suspected something like this could be the problem.

I filed a bug report with a full description and pointers to the code.

- https://github.com/opnsense/core/issues/7202

Here is the summary:

If the ISP only delegates an ipv6 prefix but no GUA address for the WAN interface, the rc.newwanipv6 script exits early and does not configure or restart radvd. This means it never starts to advertise a prefix on the LAN, and clients don't get ipv6 addresses.

The script can't tell if the ISP has only delegated a prefix or not, so when it fails to find the WAN GUA it exits early. If you check the "Request only an IPv6 prefix" option on the WAN interface then the script does not exit early, and radvd is properly configured.

I would have thought that the "Request only an IPv6 prefix" option only affects the solicitation opnsense sends to the ISP, not that it is necessary to check this box if the ISP is only delegating a prefix. I'm not sure how difficult it would be to actually check if the ISP only delegated a prefix, but if it's possible I think that would be less surprising to the user.

Anyways, I think we can say case closed for now and follow up on the bug to see if there's a way to make opnsense smarter about the prefix delegation. Thanks for all the help!

#7
Here's logs from when I restarted opnsense instead of reloading the WAN interface. vtnet1 is the WAN interface here.

2024-02-04T12:46:40   Notice   opnsense   /usr/local/etc/rc.newwanipv6: Failed to detect IP for interface wan
2024-02-04T12:46:37   Notice   kernel   <118>>>> Invoking start script 'freebsd'   
2024-02-04T12:46:37   Notice   kernel   <118>Reconfiguring IPv6 on vtnet1   
2024-02-04T12:46:37   Notice   kernel   <118>Reconfiguring IPv4 on vtnet1   
2024-02-04T12:46:37   Notice   kernel   <118>>>> Invoking start script 'newwanip'
2024-02-04T12:46:23   Notice   opnsense   /usr/local/etc/rc.newwanipv6: IP renewal deferred during boot on 'vtnet1'   
2024-02-04T12:46:23   Notice   dhcp6c   dhcp6c_script: REQUEST on vtnet1 renewal   
2024-02-04T12:46:23   Notice   dhcp6c   dhcp6c_script: REQUEST on vtnet1 executing   
2024-02-04T12:46:21   Notice   kernel   <118>Starting router advertisement service...done.   
2024-02-04T12:46:21   Notice   opnsense   /usr/local/etc/rc.bootup: plugins_configure dns (execute task : unbound_configure_do(1))   
2024-02-04T12:46:21   Notice   opnsense   /usr/local/etc/rc.bootup: plugins_configure dns (execute task : dnsmasq_configure_do(1))   
2024-02-04T12:46:21   Notice   opnsense   /usr/local/etc/rc.bootup: plugins_configure dns (1)   
2024-02-04T12:46:21   Notice   opnsense   /usr/local/etc/rc.bootup: plugins_configure dhcrelay (execute task : dhcpd_dhcrelay_configure(1))   
2024-02-04T12:46:21   Notice   opnsense   /usr/local/etc/rc.bootup: plugins_configure dhcrelay (1)   
2024-02-04T12:46:21   Notice   dhcp6c   RTSOLD script - Sending SIGHUP to dhcp6c   
2024-02-04T12:46:21   Warning   opnsense   /usr/local/etc/rc.bootup: dhcpd_radvd_configure(manual) found no suitable IPv6 address on lan(vtnet0)


I'm not certain, but it seems like what's happening is that opnsense starts radvd before it's gotten ipv6 configured on the WAN. After the ipv6 prefix is established, the newwanipv6 procedure runs and doesn't detect a WAN ipv6 address, so it doesn't trigger any of the service restarts that normally happen when the WAN ip changes. If the new ip procedure was aware or listening for prefix changes this probably wouldn't be an issue.

I'm still not sure if there's a race happening or not between the radvd setup and the wan/lan ipv6 address assignment. I thought that everything was working for me at some point, but I haven't been able to get to that state again.
#8
Steps to reproduce:

1. Set LAN ipv6 to "track interface"

2. Click "Reload' on WAN in the "Commands" column of Interfaces > Overview on 24.1

Looking at the logs, it looks like opnsense sends a RELEASE for the ipv6 prefix after reloading the interface, and radvd tries to configure itself before dhcp6c sends a request for a new prefix. Perhaps, like you said, since no ipv6 address is assigned to the WAN interface radvd doesn't get reset again, and fails to advertise.

This is with "prevent release" checked in the interface settings, by the way. Maybe "prevent release" doesn't do anything if you manually reset the interface. I am definitely getting a new prefix every time I hit the button.

I will test this again later with "prevent release" checked, but instead of using the reload command I'll just reboot opnsense completely.


2024-02-04T02:27:48   Notice   opnsense   /usr/local/etc/rc.newwanipv6: Failed to detect IP for interface wan   
2024-02-04T02:27:45   Notice   opnsense   /usr/local/etc/rc.newwanipv6: Failed to detect IP for interface wan   
2024-02-04T02:27:45   Notice   dhcp6c   dhcp6c_script: REQUEST on vtnet1 renewal   
2024-02-04T02:27:45   Notice   dhcp6c   dhcp6c_script: REQUEST on vtnet1 executing

2024-02-04T02:27:43   Warning   opnsense   /usr/local/etc/rc.configure_interface: dhcpd_radvd_configure(auto) found no suitable IPv6 address on lan(vtnet0)

2024-02-04T02:27:41   Notice   dhcp6c   RTSOLD script - Sending SIGHUP to dhcp6c   
2024-02-04T02:27:41   Notice   dhcp6c   dhcp6c_script: RELEASE on vtnet1 executing   
2024-02-04T02:27:41   Notice   opnsense   /usr/local/etc/rc.configure_interface: ROUTING: entering configure using 'wan'   
2024-02-04T02:27:41   Notice   dhcp6c   RTSOLD script - Sending SIGHUP to dhcp6c
#9
Sorry, I think I grabbed an RA as well. I don't think I see anything about a /64 there.

Internet Control Message Protocol v6
    Type: Router Advertisement (134)
    Code: 0
    Checksum: 0xe011 [correct]
    [Checksum Status: Good]
    Cur hop limit: 64
    Flags: 0x80, Managed address configuration, Prf (Default Router Preference): Medium
        1... .... = Managed address configuration: Set
        .0.. .... = Other configuration: Not set
        ..0. .... = Home Agent: Not set
        ...0 0... = Prf (Default Router Preference): Medium (0)
        .... .0.. = ND Proxy: Not set
        .... ..00 = Reserved: 0
    Router lifetime (s): 1800
    Reachable time (ms): 0
    Retrans timer (ms): 0
    ICMPv6 Option (Source link-layer address : 46:f4:77:93:eb:be)
        Type: Source link-layer address (1)
        Length: 1 (8 bytes)
        Link-layer address: 46:f4:77:93:eb:be (46:f4:77:93:eb:be)


I'll do some more testing to see if I can reproduce the LAN RA issue and figure out what causes it. It seemed like leaving DHCPv6 enabled on LAN helped, but maybe that was just a coincidence.
#10
Being able to see the delegated prefix seems pretty important. It would be really nice if it was added back somewhere in the UI. I'm not even sure what if any command I can run in the terminal to retrieve it as a workaround. Does anyone know?
#11
Shoot... I spoke too soon. The issue with radvd not advertising properly has returned. Manually restarting it resolves the issue.
#12
In the process of grabbing that pcap I turned ipv6 on and off both for the wan and lan interfaces. I'm not sure what's changed, and I'm pretty sure that I have everything configured the same way as it was before at this point. But, I'm not able to reproduce the issue where opnsense doesn't send router announcements on lan anymore. I've rebooted several times and my lan devices have consistently gotten ipv6 addresses without me needing to manually restart radvd.

Go figure... I'll keep an eye on things and see if anything changes.
#13
24.1, 24.4 Legacy Series / Re: IPv6 Prefix Alias
February 03, 2024, 07:22:36 PM
Hmm, I'm not sure if you can create an alias like that. Maybe someone else knows.

But there might be another way to achieve your end goal. What are you trying to do?
#14
Ok, I'm pretty sure I grabbed the RA from my ISP. I'm not 100% how to read this, but I think it indicates that my ISP isn't going to assign an address to the router itself.

DHCPv6
    Message type: Advertise (2)
    Transaction ID: 0x7c6989
    Client Identifier
        Option: Client Identifier (1)
        Length: 14
        DUID: 000100012d492931021132229627
        DUID Type: link-layer address plus time (1)
        Hardware type: Ethernet (1)
        DUID Time: Jan 28, 2024 10:03:13.000000000 EST
        Link-layer address: 02:11:32:22:96:27
        Link-layer address (Ethernet): MS-NLB-PhysServer-17_32:22:96:27 (02:11:32:22:96:27)
    Server Identifier
        Option: Server Identifier (2)
        Length: 26
        DUID: 00020000058334343a66343a37373a39333a66323a3030000000
        DUID Type: assigned by vendor based on Enterprise number (2)
        Enterprise ID: Juniper Networks/Funk Software (1411)
        Identifier: 34343a66343a37373a39333a66323a3030000000
    Identity Association for Non-temporary Address
        Option: Identity Association for Non-temporary Address (3)
        Length: 59
        IAID: 00000000
        T1: 0
        T2: 0
        Status code
            Option: Status code (13)
            Length: 43
            Status Code: NoAddrAvail (2)
            Status Message: No addresses have been assigned for IA_NA
    Identity Association for Prefix Delegation
        Option: Identity Association for Prefix Delegation (25)
        Length: 41
        IAID: 00000000
        T1: 3600
        T2: 5760
        IA Prefix
            Option: IA Prefix (26)
            Length: 25
            Preferred lifetime: 7200
            Valid lifetime: 7200
            Prefix length: 56
            Prefix address: 2600:1234:1234:1234::


I redacted the prefix address to 2600:1234:1234:1234::, but it's a valid prefix and I'm able to use it to get GUAs for my LAN.
#15
My ipv6 connectivity is working fine and I can see that I'm getting a /56 from my ISP with pcap.

But since upgrading to 24.1 I don't see the prefix listed for the wan interface under Interfaces > Overview.

Is anyone else having this issue / is there something I can do to fix it?