Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Topics - OPNenthu

#21
I came across this mailing list thread while searching online about FreeBSD instabilities with N100, as many have been reporting upgrade issues.  I'm not sure if this is related to the problematic microcode updates.

https://lists.freebsd.org/archives/freebsd-current/2025-January/006984.html

ChatGPT (for what it's worth) describes the issues like this:

Quote2. PCID / Cache Corruption Bug

    The N100 has a known CPU erratum: INVLPG instruction with PCID enabled fails to flush TLB entries, causing data corruption on UFS file systems (sometimes panics or inode mangling) [ref]

    The workaround: add

    vm.pmap.pcid_enabled=0 

    to loader.conf, ideally tested in production. Users report stability regained after disabling PCID [ref]

3. UFS Filesystem Instability

    Severe issues such as inode corruption, filesystem panics, or UFS failure have been seen repeatedly when PCID remains enabled and UFS is used [ref]

    ZFS appears to avoid these issues entirely.

Quote⚠️ Why Might You Want to Disable It?

Some CPUs (including Intel N100/Alder Lake-N) exhibit hardware bugs when PCID is used. Specifically:

    A known CPU erratum causes INVLPG (used to invalidate specific TLB entries) to fail when PCID is active.

    This can result in stale or corrupted memory mappings, leading to:

        Filesystem corruption (especially UFS)

        Kernel panics

        Data loss

        Subtle stability problems

Disabling PCID (vm.pmap.pcid_enabled=0) avoids using the broken logic path.
🧪 Who Should Set It?

If you're using:

    Intel N100 or other Alder Lake-N CPUs

    UFS as a filesystem

    FreeBSD 13.x or 14.x

👉 You should absolutely set vm.pmap.pcid_enabled=0 to ensure stability.

Seemed a little concerning and I thought I'd bring it up here for more technical insight.

I'm not affected personally as I don't have an N100 at this time.
#22
Using the 25.7 installer on a test VM, I used console option 2 to set the LAN interface IP and a custom DHCP range of 192.168.160.100 - 192.168.160.199.  I then launched the installer and installed to disk.  This was a fresh install over an existing one (ZFS pool overwrite, no config import).

After logging in to the GUI and completing the initial setup wizard, Dnsmasq was enabled but the DHCP range was set to 192.168.160.41 - 192.168.160.245.

(No argument from me that .41-.245 is not a better default since in Dnsmasq the static reservations should be within the pool range, unlike in ISC.)

Just reporting it for consideration in the 26.1 installers :)  Maybe someone else can also verify if they saw this since I was kind of in a hurry.
#23
I don't know if this issue affects more units than just mine, or if this is maybe on my end (defective PC motherboard?), but I wanted to document the recovery steps.  I did forward my observations to Protectli as well.

Specs:
Vault V1410 (Intel N5105, 8GB) with Protectli coreboot version 0.9.3.  The coreboot detail is important because there's no option to set the RTC that I'm aware of, so will need to be done through the OS.

I don't know if this affects units with the stock AMI UEFI.

The issue:
Following an extended power loss event where the Vault remains disconnected from mains power or battery backup for some time (>1hr in my case), the USB COM port becomes inoperable and fails to list in Windows Device Manager or in the Linux /dev filesystem.  The firewall remains accessible only via GUI and SSH. The issue persists across power cycles.

(EDIT: I haven't tried the vga console as mine runs headless.)

Evidence:
You can observe that the device is not being recognized on your PC that is connected with the USB COM cable.  For example, in Linux 'dmesg' output there will not be any reference to the expected device (/dev/ttyUSB0) that is typically there.  Also there will be no such character device on the /dev on the filesystem:

$ sudo dmesg | grep tty
[    0.152646] printk: legacy console [tty0] enabled
[    0.791432] serial8250: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A

$ ls -l /dev/ttyUSB*
ls: cannot access '/dev/ttyUSB*': No such file or directory

On Windows there will be an error indicator in Device Manager.  The COM port will have become an unrecognized device that fails to initialize.

The fix:
1) Open the device and perform a CMOS reset by shorting two pins.  https://kb.protectli.com/kb/cmos-reset/

This will restore the serial port, but will also wipe out the system time.

2) DNS resolution may be blocked due to the time error and the NTP service in OPNsense will be unable to sync.  Manually set the date to the current wall clock time to within 1-2 minutes.  On the OPNsense console (as root):

$ date yymmddhhMM

For example, enter "2507301300" for "Jul 30 2025 at 1pm"

This will get NTP and other services unstuck, but may take a few minutes to sync.

Validation:

$ sudo dmesg | grep tty
[    0.152646] printk: legacy console [tty0] enabled
[    0.791432] serial8250: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[ 8856.625541] usb 1-1: FTDI USB Serial Device converter now attached to ttyUSB0     <--- this reappears

$ ls -l /dev/ttyUSB*
crw-rw---- 1 root dialout 188, 0 Jul 30 00:37 /dev/ttyUSB0

If on Windows, you should again see the COM0 port in Device Manager.


-----

I did see this issue multiple times already as we have been having more frequent power outages in our region, and also I've seen it on two separate V1410s.  I had gotten a replacement for my original one due to a different issue, but that one exhibited this problem as well. 

On the most recent occurrence I did happen to have a serial terminal session open when the power went out, so it could be a factor.  I don't recall if I was able to close the session cleanly and I don't remember if that was always the case the other times it happened.  If you own this device and depend on the USB COM then maybe consider keeping around a small screwdriver.
#24
The 25.7 release announcement references this change:

Quoteo system: allow experimental feature to run web GUI privilege separated as "wwwonly" user

I don't see any option to enable this in the web GUI settings, unless I missed it.  How do we try this?
#25
I've been meaning to ask about this for some time to satisfy my curiosity.  I upgraded to 25.7 and let it run for 24 hours, to see if some recent changes around ICMP (like this one: https://forum.opnsense.org/index.php?topic=45991.0) would possibly affect this, but it hasn't.

Let's call my local network 'Site 1' and with ISP provider 'A'.  The ISP is a large cable internet provider with coaxial to the home, terminated at the DOCSIS modem, and then Ethernet to the OPNsense box.  This firewall is using dark UI theme.

Let's call a remote network 'Site 2' and with ISP provider 'B'.  The ISP is a large mobile telecom provider with fiber to the home, terminated at an exterior ONT, and then Ethernet to the OPNsense box.  This firewall is using light UI theme.

Both sites are using UniFi brand switches for LAN clients, just different models.  Everything purchased within the last year.  All new Cat6 cabling (pre-made).  Router specs are as listed in my sig.

---

Site 1 almost never has a perfectly "smooth" GW health graph, at least as compared to Site 2 which is the opposite- it almost never has a "spikey" one.  These are examples of the typically observed graphs (1 minute granularity views), although Site 1 is behaving quite well today.  It usually has more loss spikes, often times with clusters of short-duration spikes throughout the day.

You cannot view this attachment.

You cannot view this attachment.


Only when I zoom in closer on a section without those loss spikes does Site 1 start to resemble Site 2.  I consider this the baseline.


You cannot view this attachment.


It's a similar thing on both of the gateways, DHCP and DHCPv6, so doesn't much matter which one I'm looking at.  The curious thing though is that the loss spikes in Site 1 don't always line up between the gateways.  Sometimes the v4 gateway will have them at different times than the v6 gateway and vice-versa.  So, based on that, I think I can rule out exterior cabling (ISP lines) as the culprit, as I would expect both gateways to experience disruptions at the same times if there were an ISP infrastructure issue.  (I could be naive about this.)

The other clue is that the losses are typically very short (1 min. duration) and very uniform (vast majority of the time have the same amplitude of just under 1% loss, but never exceeding 20% AFAIK).  The uniformity of it does make me wonder if this is a behavior of the router or of the different spec Intel NICs?  Maybe even an issue with the ISP provided modem.  I'm not sure how to even begin diagnosing.

What do your graphs look like?  Are these spikes typical, for those of you with varied experience?  Is it normal for some residential ISPs to do this?

TIA!


P.S. - I don't notice the effect of these loss spikes subjectively, but doesn't mean they aren't problematic for connection quality or stream disruptions.  I don't have data to prove that.
#26
I did a fresh install with the 25.7-r1 DVD image in a test VM and upgraded it to -r2.  The Router Advertisement service is not listed in the left side menu, nor in the list of running services on the Lobby Dashboard, however the 'radvd' package is installed according to System->Firmware->Packages.

During installation I had only configured IPv4 for WAN & LAN.  After the setup wizard completed, DNSmasq was enabled for me by default for DHCP and Unbound for DNS (I chose it from the Wizard).  The DNSmasq RA function was disabled, as is to be expected.

I then enabled IPv6 from Interface settings and chose "Allow manual adjustment..." in the LAN settings.  At this point I expect to see the Router Advertisement service listed in the menu but it's not showing.  Not sure if this is expected (?)
#27
Hi all,

I followed the section DHCPv4 with DNS registration in the guide as closely as I could, so that Unbound is the default resolver for clients and forwards to Dnsmasq for internal domains.  For external domains Unbound forwards to Quad9 over TLS (unchanged from my previous setup with ISC).

These are the issues I am seeing so far with the latest update today.  If any of this is deemed valid here I can submit scoped issue(s) in GitHub.

For all of these examples, I only have IPv4 configurations in Dnsmasq.  Presently I am still using Services->Router Advertisements for IPv6 RAs.

My system default domain in System->Settings->General is "h1.home.arpa" (to distinguish from a remote site "h2.home.arpa").  I am using this domain for my LAN.

For each of the VLANs where clients connect, I defined a respective ".internal" domain in Dnsmasq per the examples in the guide.

You cannot view this attachment.

In Unbound I configured the forwarding as follows:

You cannot view this attachment.

Unbound is configured on all interfaces ('All (recommended)' in GUI options) at port 53.

Dnsmasq is on all explicit interfaces (LAN, GUEST, etc.) at port 53053 with "Strict Interface Binding" disabled.

root@firewall:~ # sockstat -l | grep :53
unbound  unbound     4479 5   udp6   *:53                  *:*
unbound  unbound     4479 6   tcp6   *:53                  *:*
unbound  unbound     4479 7   udp4   *:53                  *:*
unbound  unbound     4479 8   tcp4   *:53                  *:*
unbound  unbound     4479 9   udp6   *:53                  *:*
unbound  unbound     4479 10  tcp6   *:53                  *:*
unbound  unbound     4479 11  udp4   *:53                  *:*
unbound  unbound     4479 12  tcp4   *:53                  *:*
unbound  unbound     4479 13  udp6   *:53                  *:*
unbound  unbound     4479 14  tcp6   *:53                  *:*
unbound  unbound     4479 15  udp4   *:53                  *:*
unbound  unbound     4479 16  tcp4   *:53                  *:*
unbound  unbound     4479 17  udp6   *:53                  *:*
unbound  unbound     4479 18  tcp6   *:53                  *:*
unbound  unbound     4479 19  udp4   *:53                  *:*
unbound  unbound     4479 20  tcp4   *:53                  *:*
nobody   dnsmasq    19536 13  udp4   *:53053               *:*
nobody   dnsmasq    19536 14  tcp4   *:53053               *:*
nobody   dnsmasq    19536 15  udp6   *:53053               *:*
nobody   dnsmasq    19536 16  tcp6   *:53053               *:*
root     mdns-repea 50866 3   udp4   *:5353                *:*
root     mdns-repea 50866 4   udp4   192.168.20.1:5353     *:*
root     mdns-repea 50866 6   udp4   192.168.30.1:5353     *:*
root     mdns-repea 50866 7   udp4   192.168.40.1:5353     *:*

I do not have any system default DNS servers in System->Settings->General and I am not allowing DNS overrides from WAN.


Observation #1: Incorrect DNS options in DHCP offer

Per the guide, these DHCP options do not need to be explicitly defined and are defaulted as follows:

Quoterouter[3] -> IPv4 address of the receiving interface
dns-server[6] -> IPv4 address of the receiving interface
domain-search[119] -> Domain set in DHCP range

This is the DHCP offer as captured in Wireshark to my client on the HOME network, which has a static reservation (192.168.30.2):

Dynamic Host Configuration Protocol (Offer)
    Message type: Boot Reply (2)
    Hardware type: Ethernet (0x01)
    Hardware address length: 6
    Hops: 0
    Transaction ID: 0x20ab4ae0
    Seconds elapsed: 0
    Bootp flags: 0x8000, Broadcast flag (Broadcast)
    Client IP address: 0.0.0.0
    Your (client) IP address: 192.168.30.2
    Next server IP address: 192.168.30.1
    Relay agent IP address: 0.0.0.0
    Client MAC address: ASUSTekCOMPU_xx:xx:xx (24:4b:fe:xx:xx:xx)   (*redacted)
    Client hardware address padding: 00000000000000000000
    Server host name not given
    Boot file name not given
    Magic cookie: DHCP
    Option: (53) DHCP Message Type (Offer)
    Option: (54) DHCP Server Identifier (192.168.30.1)
    Option: (51) IP Address Lease Time
    Option: (58) Renewal Time Value
    Option: (59) Rebinding Time Value
    Option: (1) Subnet Mask (255.255.255.0)
    Option: (28) Broadcast Address (192.168.30.255)
    Option: (3) Router
        Length: 4
        Router: 192.168.30.1
    Option: (15) Domain Name
        Length: 12
        Domain Name: h1.home.arpa
    Option: (6) Domain Name Server
        Length: 4
        Domain Name Server: 192.168.30.1
    Option: (255) End

- 'router[3]' is correct
- 'dns-server[6]' is correct
- 'domain-seearch[119]' is missing
- 'domain-name[15]' is incorrect  (should be 'home.internal')


Observation #2: Frequent DNS timeouts  / slow resolution

It doesn't matter whether the internal host being resolved is static (for example, 'firewall' is in /etc/hosts) or not, the requests are experiencing a lot of timeouts and resolution takes several seconds.

C:\>nslookup firewall
DNS request timed out.
    timeout was 2 seconds.
Server:  UnKnown
Address:  192.168.30.1

Non-authoritative answer:
Name:    firewall.h1.home.arpa
Addresses:  2601:xx:xxxx:xxxx:xxxx:xxxx:xxxx:39a0
          192.168.1.1

C:\>nslookup firewall.h1.home.arpa
Server:  UnKnown
Address:  192.168.30.1

DNS request timed out.
    timeout was 2 seconds.
DNS request timed out.
    timeout was 2 seconds.
Non-authoritative answer:
Name:    firewall.h1.home.arpa
Addresses:  2601:xx:xxxx:xxxx:xxxx:xxxx:xxxx:39a0
          192.168.1.1

The same is happening for external requests, which previously had no issue:

C:\>nslookup opnsense.org
Server:  UnKnown
Address:  192.168.30.1

DNS request timed out.
    timeout was 2 seconds.
DNS request timed out.
    timeout was 2 seconds.
Non-authoritative answer:
Name:    opnsense.org
Addresses:  2001:1af8:2050:a001:1::1
          89.149.225.137

Observation #3: Intermittent resolution failures

Sometimes there is no response, even for statically defined hosts in Dnsmasq:

C:\>nslookup unifi
DNS request timed out.
    timeout was 2 seconds.
Server:  UnKnown
Address:  192.168.30.1

DNS request timed out.
    timeout was 2 seconds.
*** UnKnown can't find unifi: Server failed

Ditto for fully qualified queries:

C:\>nslookup unifi.h1.home.arpa
DNS request timed out.
    timeout was 2 seconds.
Server:  UnKnown
Address:  192.168.30.1

DNS request timed out.
    timeout was 2 seconds.
DNS request timed out.
    timeout was 2 seconds.
*** Request to UnKnown timed-out

You cannot view this attachment.


Observation #4: Static addresses not registered

My Proxmox node (pve) has a static IP which I also defined as a static reservation in Dnsmasq for tracking purposes. My UniFi controller (running on Proxmox) is also static on the host and as a static reservation.  Neither of these are reflected in the Dnsmasq leases table, although both are running and responding.

A Gitea instance also running on Proxmox, but with a dynamic lease, is shown in the table.

A static reservation for my desktop PC is also shown in the table.

In general, it appears that static reservations are only shown for hosts which receive their IPs from DHCP but are omitted for hosts which have static IPs set on the host itself even if a static entry is present in Dnsmasq.


 
#28
I'm starting the migration from ISC to Dnsmasq w/ Unbound upstream on OPN 25.1.6_4 and I've quickly hit a bind error:

2025-05-18T11:28:54-04:00 Critical dnsmasq FAILED to start up
2025-05-18T11:28:54-04:00 Critical dnsmasq failed to bind DHCP server socket: Address already in use

Checking sockstat I see that service 'dhcpd' is listening on *:67.  I believe this is used by ISC?

root@firewall:~ # sockstat -4 -l
USER     COMMAND    PID   FD  PROTO  LOCAL ADDRESS         FOREIGN ADDRESS
_flowd   flowd      68062 3   udp4   127.0.0.1:2056        *:*
root     mdns-repea 66175 5   udp4   *:5353                *:*
root     mdns-repea 66175 6   udp4   192.168.20.1:5353     *:*
root     mdns-repea 66175 8   udp4   192.168.30.1:5353     *:*
root     mdns-repea 66175 9   udp4   192.168.40.1:5353     *:*
nobody   samplicate 24413 5   udp4   127.0.0.1:2055        *:*
nobody   samplicate 24413 6   udp4   *:5269                *:*
root     ntpd       26908 21  udp4   *:123                 *:*
root     ntpd       26908 23  udp4   xx.xxx.xxx.xxx:123    *:*   *(public IP - redacted)
root     ntpd       26908 27  udp4   127.0.0.1:123         *:*
root     ntpd       26908 29  udp4   10.2.2.1:123          *:*
root     ntpd       26908 32  udp4   192.168.1.1:123       *:*
root     ntpd       26908 36  udp4   192.168.20.1:123      *:*
root     ntpd       26908 39  udp4   192.168.30.1:123      *:*
root     ntpd       26908 43  udp4   192.168.40.1:123      *:*
root     ntpd       26908 46  udp4   192.168.50.1:123      *:*
root     ntpd       26908 49  udp4   192.168.60.1:123      *:*
unbound  unbound     6410 7   udp4   *:53                  *:*
unbound  unbound     6410 8   tcp4   *:53                  *:*
unbound  unbound     6410 11  udp4   *:53                  *:*
unbound  unbound     6410 12  tcp4   *:53                  *:*
unbound  unbound     6410 15  udp4   *:53                  *:*
unbound  unbound     6410 16  tcp4   *:53                  *:*
unbound  unbound     6410 19  udp4   *:53                  *:*
unbound  unbound     6410 20  tcp4   *:53                  *:*
unbound  unbound     6410 21  tcp4   127.0.0.1:953         *:*
dhcpd    dhcpd      82937 14  udp4   *:67                  *:*
root     lighttpd   71873 7   tcp4   *:443                 *:*
root     sshd       47538 7   tcp4   *:22                  *:*
?        ?          ?     ?   udp4   *:51820               *:*

I have Dnsmasq set to listen only on the specific interface that I'm migrating and its DNS service is on 53053.  Unbound is on port 53 (All interfaces).  I get the same error both with and without the "Strict Interface Binding" option under advanced settings.  I also tried restarting all services from the console with Option 11.

Is it possible to migrate a live system one interface at a time?  I was expecting that if I disable an interface from Services->ISC DHCPv4, then there wouldn't be any conflicts.

Thanks!
#29
I didn't see a similar topic in this section but only did a cursory title search with some keywords.  Apologies if this is covered elsewhere.

By default OPNsense filters private IP ranges on incoming connections.  My goal was to also filter outgoing connections as an added layer so that all private IPv4 and local IPv6 destinations would be rejected on WAN out, thus avoiding leaks.  This might also be prudent on shared ISP lines with neighbors.

In addition I wanted an alias for management clients, such as my PC/laptop, to have an exclusion so that those could continue to reach my leased cable modem UI.  In my case the modem is at 10.0.0.1 and I cannot change this, but I don't have any 10.x networks in OPNsense nor any static route to it.  I rely on Outbound NAT to reach the modem UI.  I suspect this is a common scenario for a lot of home internet users with basic setups.

Normally you'd have to put OPNsense into Manual NAT mode and add some translation rules because you cannot simply add a WAN rule with a "source" address to include your management PC/alias.  That field gets overwritten by NAT.  I wish to avoid this and leave it in the default Automatic NAT mode.  That way I don't have to worry about managing the NAT rules as interfaces and networks change over time.  Fortunately, pf/OPNsense support IP exclusions in alias tables and we can leverage this.  (Read more at OPNsense docsfeature PR'pf' handbook).

Disclaimer: You might have some more complicated routing setup that breaks with these instructions, so be sure to do your homework if this applies.  I tested with my Wireguard site-to-site setup and saw no issues there.


The procedure:

1) Create an Alias to specify your management PCs. I called it "management_clients".

This will be used in step 5.

(This one is optional as you may already be using a dedicated management network or can even use a trusted network (e.g. LAN) for this purpose.)

2) Create an Alias to specify your modem configuration IP.  I called it "modem_ui_address".

This will be used in step 5.

3) Create an Alias of type "Network group" to define RFC1918 private IPv4 ranges, excluding the modem management IP.

In order to use IP exclusions you have to use a nested alias, so it's really a combination of two aliases here into a final third one.

The first one is for the private IPv4 ranges.  You may already have one like this as it's commonly used in interface rules:

Name: IPv4_private_ranges
Type: Network(s)
Content: 192.168.0.0/16, 172.16.0.0/12, 10.0.0.0/8
Description: RFC1918 private ranges

The second one is the exclusion (negated IP) of the modem or upstream device that you connect to over WAN.  This one also needs to be a "Network(s)" alias in order to combine with the previous one, so just give it a /32 notation if it's a single IP:

Name: modem_ui_exclusion
Type: Network(s)
Content: !10.0.0.1/32
Description: Inversion of modem MGMT IP

The "!" prefix is important here. Substitute whatever IP you need.

The final one is the combined alias:

Name: IPv4_dont_NAT
Type: Network group
Content: IPv4_private_ranges, modem_ui_exclusion
Description: Private IPv4 ranges to filter outbound

You can inspect the final alias in Firewall->Diagnostics->Aliases to confirm the exclusion IP is there sorted among the RFC1918 ranges.

You cannot view this attachment.

3) Create an Alias of type "Network(s)" to define at least IPv6 ULA (fc00::/7) and link-local (fe80::/10) prefixes.  The latter shouldn't be routable but I like having it.

You cannot view this attachment.

4) Add two respective WAN rules with direction "out" (post-NAT) to reject these destinations.

   *Important to use "reject" and not "block" here, so that LAN clients get notification and don't hang. RFC4193 for IPv6 ULA also recommends this.

You cannot view this attachment.

I made mine non-Quick rules to act as defaults on WAN, but you can keep them as Quick also.

5) Add a Floating interface rule to block all except the management hosts/network to reach the modem management IP.

Action: Block
Quick: checked
Interface: <none selected>
Direction: In
TCP/IP Version: IPv4
Protocol: TCP
Source invert: checked
Source: management_clients  (or network)
Destination: modem_ui_address
Destination port: HTTP
Description: Block all except MGMT hosts to modem UI

Clone and add an HTTPS rule if your modem uses it.

You cannot view this attachment.

6) Test with some pings.  You should get timeouts on all private IP addresses which are not defined in OPNsense, except for the modem IP.

C:\>ping 10.1.2.3

Pinging 10.1.2.3 with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.

Ping statistics for 10.1.2.3:
    Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),

The firewall live view should reflect that these pings are dropped.  The "Source" address should be your WAN public IP, confirming that the rule was applied post-NAT.

For IPv6 you can ping a random ULA address:

C:\>ping -6 [fd50:2e32:f043::]

Pinging fd50:2e32:f043:: with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.

Ping statistics for fd50:2e32:f043:::
    Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),

In this case the "Source" IP in the FW logs will be your public IPv6 address (GUA) or a ULA if you have one configured on your local interface, as these aren't NAT-ed.  (Note: the intention is not to confirm filtering of ICMPv6 here, as that's generally frowned upon in IPv6.  This is just testing the rule.)

If you open a browser and try to navigate to these IPs you should not get stuck waiting.  You should get a "Connection Refused" error after a few seconds, confirming the Reject action on the WAN rules.

Finally test the modem admin UI is working.

7) Profit!

From now on you only need to update the management client and modem IPs in the respective aliases, if those change.  In summary those are:

Name: management_clients
Name: modem_ui_address
Name: modem_ui_exclusion
#30
I usually consume about 25% of the total firewall table entries under Aliases, but I just noticed it took a huge dive to 9%.  Looking closer, I see that I have suddenly lost the majority of the bogons.

Another firewall I admin still has thousands of bogons (pasting for comparison).  Both are 25.1.5_5 and set to update monthly under Firewall -> Settings ->Advanced -> Bogon Networks.

Is there a recent change?

You cannot view this attachment.

You cannot view this attachment.
#31
EDIT: As explained in the thread below, this is not technically a work-around as I originally thought.  It is an implementation of an IPv6 control plane (a valid technique) for ICMP traffic; an example of Multi-color Shaping.  Please ignore references to "work-around."

------

This is a work-around for those of us wanting to combat bufferbloat with FQ-CoDel and ECN as per the OPNsense guide, but are seeing high packet loss on the IPv6 gateway (specifically on upload) with the shaping applied.  This issue is discussed here and here, as well as in several forum posts.

You cannot view this attachment.

(Note: some have experienced loss of IPv6 connectivity altogether although it's not clear if it's the same underlying cause.  In some cases the ISP may not be supporting ECN, as observed by @meyergru.  This won't help in those situations.)

I took the inspiration to try this from the comments in https://github.com/opnsense/core/issues/6714.  Thanks to GitHub user @aque for the hint.

Starting with the configuration from the OPNsense guide as the basis:

1. Under Firewall->Shaper->Pipes add an additional upload pipe named something like "Upload-Control".  We'll be using it to separate ICMP and ICMPv6 traffic from the CoDel shaper. You can name this more specifically like "Upload-ICMP" but you may wish to use this pipe for additional control protocols (e.g. DHCP, NTP, DNS) in the future so I went with a generic name.

I set the bandwidth for this pipe to 1 Mbit/s in my case, which seems more than enough for my home internet usage (your mileage may vary). So for example if your existing upload pipe was 40 Mbit/s, you'll reduce it to 39 Mbit/s and give the 1 Mbit/s to the new pipe.

Leave everything else default.

I personally did not create a manual queue for this (it's working without one) so I will skip over Firewall->Shaper->Queues.

2. Under Firewall->Shaper->Rules, clone the existing Upload rule and make the following edits:

- Sequence: <upload rule sequence> - 2
- Protocol: icmp
- Target: Upload-Control (the pipe you created in step 1)

Save the rule with a descriptive name like "Upload-Rule-ICMP".  The sequence needs to be at least 1 less than the default Upload rule and you may need to adjust the other rule sequence values accordingly.

3. Repeat step 2 for the ICMPv6 rule:

- Sequence: <upload rule sequence> - 1
- Protocol: ipv6-icmp
- Target: Upload-Control

Save as "Upload-Rule-ICMPv6". 

Make sure "Direction" is "out" for both of these rules (under the advanced settings).

Now when you run a speed test you should no longer see the high packet loss on the IPv6 gateway and you should see the ICMP traffic starting to get tallied under the respective rules in Firewall->Shaper->Status.

You cannot view this attachment.

You cannot view this attachment.

Hope this helps.  Do let me know if I've done something stupid here.  I am not an expert.

(If you're curious about the TCP ACK rules in the screenshot, I followed the advice given by @Seimus in this post.)
#32
My parents recently got Verizon service at their home and I have remote access.  I logged in today to upgrade their OPNsense and saw something unusual in the firewall logs which is still on-going.  Screens attached.

I've never seen this pattern before (I don't have Verizon myself) so am not sure if this is typical for the ISP, or is this part of an attack attempt?  I have no idea what service corresponds to port 35313/udp or what they might be trying to do here.  The reason I'm assuming it's Verizon is because the IPv6 prefix (2600:4040:7e...) belongs to them.

Mixed in with these are a bunch of foreign IPv4 addresses trying to reach 8080, 443, 22, etc., but I see this kind of activity all the time.  One of them is a FireHOL-listed source so those could be unrelated to whatever the Verizon IPs are doing.  Or maybe coordinated, I don't know.

The interface IP they are trying to access (:fe4c:19b) is non-existent AFAIK.  I don't find any reference to it in the firewall, nor is it listed in the NDP table under Interfaces->Diagnostics.  I also cannot ping it.  The local UniFi controller does incidentally have :fe4c: in its address, but the remaining bits are different.

The firewall is doing its job so the only reason I'm concerned is mostly out of abundance of caution for my parents.  This doesn't look typical to me.  I've advised them to return the ISP-provided gateway so they wouldn't be charged monthly for it, though I hope I'm providing adequate replacement security with just a plain OPNsense install.  There are no services exposed on the internet except for a WG port for the s2s tunnel, so I felt no need for anything more (IDS/IDP, Crowdsec, etc.)
#33
I'm looking for a way to adjust my DNS rules so that I can unblock some websites.  I have a simple rule that blocks requests from network clients to known public DoH providers on port 443.  It looks like this:

Action: Block
Interface: HomeSubnets (Group)
Proto: IPv4+6 TCP
Source: ! This Firewall
Dest: Public_DNS_Providers
Port: HTTPS

The sources for the "Public_DNS_Providers" alias are:
- https://public-dns.info/nameservers.txt
- https://raw.githubusercontent.com/jameshas/Public-DoH-Lists/refs/heads/main/lists/doh_ips_plain.txt

The second list contains an entry for a GitHub IP (185.199.111.153) according to WhoIs.

I don't know if this IP is really a DNS or not.  I tried querying it from the DNS diagnostic in OPNsense and it failed to connect, so could be a bad entry.  Or it could be a stealth DNS that certain web packages are embedding.  One affected site that is trying to make requests to this IP is http://networkupstools.org/ (the NUT project site) and my firewall is blocking it.  As a result I cannot access the site at all.

Is there a way I can adjust my ruleset, without resorting to more advanced application-level firewall tools, to conditionally allow connections to 185.199.111.153:443 only when I'm visiting the NUT project site (or other apps/sites as needed)?  I don't want to unblock it for all connections.
#34
Today's adventure is a small home network for my parents and I'm trying with an inexpensive Netgear GS308EP managed switch. Unlike my UniFi switch, Netgear doesn't provide an obvious way create a tags-only trunk through its GUI, as recommended for OPNsense.  It enforces that every port has a PVID which I interpret as it only allows mixed mode trunks (?)

Indeed in my initial attempt I had a DHCP leak and the switch picked up an IP from the Guest network after a reboot.

What I have done now is defined a throw-away VLAN (3999) that will only act as the PVID for the OPNsense trunk.

You cannot view this attachment.


With this change I was initially seeing some icmp-v6 traffic on the 'igb2' parent interface in the live firewall view, so I went ahead and also defined a VLAN in OPNsense (igb2_vlan3999) and I assigned this to an interface named BLACKHOLE with en empty rule set (default deny).  I'm not sure if it's also necessary to assign an IP to this interface for 'pf' to function?  I've just left it enabled for now.

You cannot view this attachment.

You cannot view this attachment.

The setup at the moment uses two switch ports for OPNsense and I've not tried to consolidate them, though I'm thinking that keeping the management network on its own link has some benefits.  I'm undecided on this.

Am I on the right track with this?







#35
Noticed this after upgrade from 25.1.2 which, if I recall correctly, was working.  I had left my browser window open and logged-in to OPNsense GUI overnight.  Today when I sat down at the computer the session was still active and did not present a login prompt.  I then applied some Windows updates and rebooted, and after launching the browser again the session was still active.

GUI timeout is set to the default 240 minutes in System->Settings->Administration .  Browser is Firefox.  I did not check the shell timeout.

Will try again to reproduce.
#36
I'm working on setting up OPNsense as a NUT client to receive shutdown signals not from the UPS, but from a LAN-hosted NUT server.  The server is a Raspberry Pi connected to the UPS via the 'usb-hid' driver and has an IP in the 192.168.1.0/24 net (same as OPNsense).

In the 'os-nut' plugin configuration there are two places where an IP address is requested:

- General Settings -> Listen Address
- UPS Type -> Netclient

I'm clear on the latter one: this is the location of the NUT server.  The former is a little ambiguous.  Is it asking for the same listen address of the NUT server (in which case it's redundant) or is it asking where I would like to set up a NUT server to listen on in OPNsense (not what I want)?

It's not clear how to configure the plugin strictly as a client and not bind any listen interfaces in OPNsense itself.
#37
24.7.11_2-amd64

The IPv6 gateway goes into "offline" status with high packet loss reported when uploading to the web (observed while running online speed tests). Once upload activity ceases the gateway gradually returns to online status.  Health graphs reflect the packet loss on WAN_DHCP6. The IPv4 gateway is not impacted.





Despite what OPNsense says, the packet loss is not real.  The gateway remains online and speed tests indicate 0% actual loss.  It appears to be a reporting issue with no real consequence as far as I can tell.

I found two necessary conditions for this:

- Traffic shaping must be in use; in my case I am exactly following the guide on fixing Bufferbloat with FQ_CoDel.  I have one download pipe fixed to 760 Mbit/s and one upload pipe at 21 Mbit/s.

- The 'Monitor IP' in the gateway configuration must be default (to ping the gateway itself).

If either of these is changed, e.g. disabling the shaping or setting a public DNS as the monitor IP, then the issue is not observed.

Only uploads cause the symptom.  I confirmed with "speedtest-cli --no-download" from a wireless client.  Doing the inverse test with "--no-upload" has zero impact.  I'm seeing exactly the same from my wired clients also when using e.g. speedtest.net or CloudFlare speed test.

It doesn't matter if the upload is over IPv4 or IPv6; both routes will cause the v6 gateway (only) to virtually go offline as the packet "loss" accumulates.

For now I've set the CloudFlare DNS as the gateway monitor IP to work around the issue.

#38
UPDATE (26 Dec., 2024):

For owners of UniFi "Pro Max" switches facing this issue, see page 2 for the solution that worked for me.  It involves enabling IGMP Snooping and configuring the switch as the IGMP Querier.  This is counter-intuitive as those are IPv4 technologies, however there is some reason to believe that UniFi have, at the time of this writing, co-mingled those features with IPv6 MLD under the hood.  Enabling the former may also impact the latter, which is needed for reliable IPv6 NDP.


Edit: The fix didn't hold and as of today this is still happening.  Please disregard any "fix" advice posted prior to 13 Feb. 2025.

======

Based on my searches I understand that clients (Windows, Linux, etc.) are responsible to regenerate their own SLAAC temporary IPv6 addresses after the configured 'preferred' or 'valid lifetime' has elapsed.  This is a host configuration that should be enabled and is usually set between 1-7 days.

I have 1x Windows 10 and 2x Linux Debian 12 clients all configured to regen temp IPs.  Initially this works and all the clients are showing temporary in addition to global and link-local IPv6 addresses.  They even seem to invalidate and regenerate automatically.  After a few days however, all the clients mysteriously lose their temporary IPs and fail to generate new ones.  Releasing and renewing the DHCP leases doesn't do anything, which is expected I think since I'm using SLAAC and not DHCPv6 (just trying anyway).

It's strange that all the 3 clients are showing this behavior at the same time.  I am thinking either there is some dependency on Router Advertisements / OPNsense, or my understanding is incomplete.

OPNsense configuration:

- WAN DHCPv4 and DHCPv6 with /60 prefix delegation
- 5 VLANs with static DHCPv4, IPv6 'Track Interface' with unique prefix IDs and 'Allow manual adjustment of RAs'
- ISC DHCPv4 service enabled
- ISC DHCPv6 service disabled
- Router Advertisements, all VLANs -  Unmanaged (A flag).


Windows 10 client:
> netsh interface ipv6 show privacy
Querying active state...

Temporary Address Parameters
---------------------------------------------
Use Temporary Addresses             : enabled
Duplicate Address Detection Attempts: 3
Maximum Valid Lifetime              : 1d
Maximum Preferred Lifetime          : 1d
Regenerate Time                     : 5s
Maximum Random Time                 : 10m
Random Time                         : 4m14s

Linux clients:
$ nmcli connection show "Wired connection 1" | grep ipv6
ipv6.method:                            auto
ipv6.dns:                               --
ipv6.dns-search:                        --
ipv6.dns-options:                       --
ipv6.dns-priority:                      0
ipv6.addresses:                         --
ipv6.gateway:                           --
ipv6.routes:                            --
ipv6.route-metric:                      -1
ipv6.route-table:                       0 (unspec)
ipv6.routing-rules:                     --
ipv6.replace-local-rule:                -1 (default)
ipv6.ignore-auto-routes:                no
ipv6.ignore-auto-dns:                   no
ipv6.never-default:                     no
ipv6.may-fail:                          yes
ipv6.required-timeout:                  -1 (default)
ipv6.ip6-privacy:                       2 (enabled, prefer temporary IP)      <------ HERE ------
ipv6.addr-gen-mode:                     stable-privacy      <------  HERE ------
ipv6.ra-timeout:                        0 (default)
ipv6.mtu:                               auto
ipv6.dhcp-duid:                         --
ipv6.dhcp-iaid:                         --
ipv6.dhcp-timeout:                      0 (default)
ipv6.dhcp-send-hostname:                yes
ipv6.dhcp-hostname:                     --
ipv6.dhcp-hostname-flags:               0x0 (none)
ipv6.auto-route-ext-gw:                 -1 (default)
ipv6.token:                             --

I don't currently have screenshots to prove that temporary addresses were previously active on Windows, but I can attest.

The current state is that the temp addresses are either expired (Linux) or disappeared entirely (Windows).

>ipconfig /all

Windows IP Configuration

   Host Name . . . . . . . . . . . . : BLACKBOX
   Primary Dns Suffix  . . . . . . . :
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : No
   DNS Suffix Search List. . . . . . : home.arpa

Ethernet adapter Ethernet 2:

   Connection-specific DNS Suffix  . : home.arpa
   Description . . . . . . . . . . . : Realtek PCIe 2.5GbE Family Controller
   Physical Address. . . . . . . . . : xx-xx-xx-12-5A-xx
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes
   IPv6 Address. . . . . . . . . . . : 26xx:xx:xxxx:xxx5:3147:9377:xxx:xxxx(Preferred)
   Link-local IPv6 Address . . . . . : fe80::f93c:a1b3:5a5b:1e03%13(Preferred)
   IPv4 Address. . . . . . . . . . . : 192.168.50.100(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Lease Obtained. . . . . . . . . . : Monday, December 2, 2024 7:07:13 PM
   Lease Expires . . . . . . . . . . : Wednesday, December 4, 2024 2:07:13 AM
   Default Gateway . . . . . . . . . : fe80::xxxx:xxxx:xxxx:c2e%13
                                       192.168.50.1
   DHCP Server . . . . . . . . . . . : 192.168.50.1
   DHCPv6 IAID . . . . . . . . . . . : xxxx164xx
   DHCPv6 Client DUID. . . . . . . . : xx-xx-xx-xx-xx-xx-3C-91-78-2D-xx-xx-xx-xx
   DNS Servers . . . . . . . . . . . : 192.168.50.1
                                       fd83:cc80:4fc3::1
   NetBIOS over Tcpip. . . . . . . . : Enabled
   Connection-specific DNS Suffix Search List :
                                       home.arpa

$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether xx:xx:xx:d0:9a:xx brd ff:ff:ff:ff:ff:ff
    inet 192.168.40.100/24 brd 192.168.40.255 scope global dynamic noprefixroute eth0
       valid_lft 4054sec preferred_lft 4054sec
    inet6 26xx:xx:xxxx:xxx4:bb4c:2c5e:d39:6125/64 scope global temporary deprecated dynamic
       valid_lft 85993sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xxx4:bd85:3fe6:6d2e:7f9b/64 scope global temporary deprecated dynamic
       valid_lft 85993sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xxx4:f932:c89:dd5d:6a53/64 scope global temporary deprecated dynamic
       valid_lft 85993sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xxx4:4f6e:xxxx:xxxx:xxx/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 85993sec preferred_lft 13993sec
    inet6 fe80::1009:f06b:fa78:524e/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
3: wlan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
    link/ether xx:xx:xx:85:cf:xx brd ff:ff:ff:ff:ff:ff

Is there a misconfiguration here, or some bug?  Thanks!
#39
Hi all,

I have an IoT network with some older devices that only support 2.4GHz but the network is shared with newer devices and mobile phones.  To get around this I have created two SSIDs for the subnet (IoT-2.4 and IoT-5).

Pertaining to the mobile phones, they need to sometimes connect to IoT-2.4 so that they can control the first-gen Google Chromecast devices.  For whatever reason they need to be on the same SSID to inter-operate and it's not enough to be on the same subnet.  Other times, they are on IoT-5 to take advantage of the bandwidth.

All is fine except I noticed that when the phones switch bands they may present with different MACs and thus new leases from the DHCP pool, so I cannot just add them to a rule alias for filtering.

The documentation specifies that a persistent random MAC should be used for the same SSID, except in some software-defined circumstances I think are outside my control.

I thought to just connect all the phones to each of the SSIDs one by one and then converting them to static leases.  I'm not sure how reliable this would be, but I would end up with 2x static leases per phone and the method seems a bit ridiculous (it certainly wouldn't scale, but that is not much of a problem in the home).

Are there better ways to track these clients in an alias?  I want to stick with WPA2 Personal because a Freeradius setup wouldn't be practical at home and I think many of the IoT things would be incompatible.
#40
I have an OPNsense appliance with a UniFi L3 switch and an Asus RT-N66U WiFi router converted to an AP with FreshTomato firmware.  Strange bed-fellows, but I think they can get along.

Attached is the topology I'd like and need your input, particularly on using VLAN 1. This VLAN ID is a source of confusion.

I want all the network devices on a subnet, which I refer to as the "Default" network as that is what UniFi calls it and assigns as VLAN 1.  I'd like to manage devices in this network from my PC on the "Home" subnet, VLAN 20, and would like only a single tagged trunk from OPNsense.   All the subnets, including Default, should be set up on igc0 (what is typically the default LAN interface parent).

A small complication here is that I don't have a dedicated host for the UniFi controller to keep on the Default network, so I'm having to run it within a VM on my PC on the "Home" net. I know of some tricks to host it like this using DHCP option 43 in ISC, DNS overrides for the "unifi" host name, and firewall rules to allow the inter-VLAN traffic.  There is a chicken-and-egg problem here though, as the switch needs to be adopted before the VLANs can be set up.  I might need to migrate my desktop PC between subnets while setting things up.

My main questions are relating to VLAN 1.  I've read many comments about it, and remain unsure what to do with it.  I'm thinking that I can use it just like any other VLAN tag.  Please correct me.  Is VLAN 1 special in some way, or is it just conventionally used for untagged frames?  Can I safely use it for tagged traffic instead on the OPNsense trunk?

In UniFi I would tag all the VLANS (1, 10, 20, 30) on the OPNsense trunk and leave nothing as Default.  On the AP trunk I leave VLAN 1 as default (required) and tag only 10, 20, 30. 

I'm thinking to reset OPNsense and when it asks for manual interface configuration, I will tell it to create 4 VLANs with igc0 as the parent:  VLAN 1, 10, 20 and 30.  I assign VLAN 1 (igc0_vlan01) as the "LAN" with the IP 192.168.1.1.  I configure WAN as usual on igc1, and do absolutely nothing with igc2 and igc3 (leave them unassigned and disabled).  My router has 4 NICs but I think I only need to use 2.

Am I on the right track with this or am I misusing VLAN 1?