Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - thelittleblackbird

#1
Quote from: cookiemonster on July 01, 2026, 12:17:06 AMIf you have set RSS for that performance, it might need revisiting. Maybe it does not help and is detrimental in your case.

The problem was present before that optimization, in fact i thought that the problem was there because I selected "safe defaults".

#2
Quote from: pfry on June 30, 2026, 11:37:34 PMHow about the PCI query?

not a single error logged, see attached file

Quote from: pfry on June 30, 2026, 11:37:34 PMIt could be a software issue. My experience is limited to a couple FreeBSD and a couple OPNsense machines, all bare-metal. Recent/relevant experience, that is. Most of the pf messages could point to really wacky network issues, but (in particular) I wouldn't expect a duplicate flow ID to be a soft issue (within the scope of a standard OPNsense install). I just don't have enough experience poking into pf to say with certainty.

same for me, i dont have a lot of experience with BSD in general and opnsense in particular, but in  my particular experience all these messages about bad states in the state connection table sound strange.

Thanks for the ideas anyway
#3
Quote from: pfry on June 30, 2026, 08:44:26 PMYak! I haven't debugged pf (no need thus far), but I would not expect such output on a functioning system. Unless someone else has a better idea, I'd run a thorough set of tests on the computer. "pciconf -lcev" (correctable errors seem to be normal), memtest86+, mprime (I run mprime from a live Linux image, as the executable is handy), etc.

Oh, from the firewall itself, pf activity will (normally) be "let anything out..." - very simple state creation/matching. The CLI test likely doesn't spew sessions like a web app, but that's an assumption.

it is not hw...
Quoteroot@OPNsense:~ # stress-ng --cpu 0 --vm 2 --vm-bytes 80% -t 30m
stress-ng: info:  [19049] setting to a 30 mins run per stressor
stress-ng: info:  [19049] dispatching hogs: 4 cpu, 2 vm
stress-ng: info:  [19295] vm: using 1.52G per stressor instance (total 3.03G of 3.79G available memory)
stress-ng: info:  [19049] skipped: 0
stress-ng: info:  [19049] passed: 6: cpu (4) vm (2)
stress-ng: info:  [19049] failed: 0
stress-ng: info:  [19049] metrics untrustworthy: 0
stress-ng: info:  [19049] successful run completed in 30 mins

some other idea? honestly unless somebody is coming with something i am starting to think like a bsd problem, because i can not really udnerstand what is going on...
#4
Quote from: pfry on June 27, 2026, 10:31:22 PMI'd watch the live log (rule logging must be enabled) and make sure the ruleset is working as expected. (I'm lazy, and also look at the "Firewall States" dashboard widget for a total, as well as the "Sessions" and "States" GUI diags.) With so many rules I would not expect a functional loop. Also, "netstat" - "-m", "-i", perhaps "-Q", "-T", "-x", "-s" options (most have to be issued separately), and see if anything looks bad.

I didnt see anything in the vmstat, the irq rates didnt move higher in comparision with other tests.

to add some more confusion to this problem i did a small test with the speedtest-cli utility and got those results:
- the pf output is almost silent
- the cpu usage is < 15% for a 350 mbps line

in comparision when i use the webversion in a firefox i got this:
- pf is quite verbose (see attached file)
- the cpu goes to 100% in the interrupts

by the way, i realize that the cpu spikes happens when the test is finishing (closing/cleaning connections?)

is this level of RST and bad states messages normal?
#5
Quote from: Monviech (Cedrik) on June 28, 2026, 09:36:09 PMIn the ipv4 and ipv6 subnets, enable advanced mode and set Domain Type to Interface.

If you run into validation issues, delete the v6 subnet, then change domain type on the v4 subnet, then recreate the v6 subnet.

Reason is that partial ipv6 networks (::...) do not match the configured dhcp domain otherwise.

Its explained in the second attention box here:
https://docs.opnsense.org/manual/dnsmasq.html#dhcpv6-and-router-advertisements

Thanks, it really solved the issue.
#6
Quote from: dseven on June 28, 2026, 07:19:11 PMIf you want Unbound (as your resolver) to be able to lookup internal domains managed by dnsmasq, you'll need to configure query forwarding as described at https://docs.opnsense.org/manual/dnsmasq.html#dhcpv4-with-dns-registration

If i udnerstood the howto correctly, this is not going to solve the issue, because in this case unbound is going to forward the dns request to dnsmasq and then we will hit again the problem i am describing.

My problem is that there is not an internal association inside dnsmasq between dhcpv4 and v6 and therefore i dont get a consolidated A and AAAA record for a specific domain name
#7
so thats the question, I rather prefer to avoid static pinning or assignments because that defeats the purpose of the DHCP server.

some extra details:

I have a network with dual stack and 2 separate segments (LAN and DMZ), my dnsmasq is acting as DHCP server for both protocols and dns server, the upstream in the unbound.

up to now, i am not able to get to get both records for a name resolution, does anyone know to achieve it?

NOTE: as a side effect of the investigation i realized that in the leases file the domain name is not included even if the required full fqdn is set. is this the expected behavior? --> every independent segment of the network has a different domain name and it may happen that there is a collision of the names of the devices in different segments.
#8
Quote from: nero355 on June 27, 2026, 10:27:54 PMWhy not boot the system with the Live Image of OPNsense and see how that performs with a very basic setup just good enough to get your WAN working ?

Perhaps you made some weird loop somewhere or something got corrupted over time ?!

Yeah, thats a possibility, but if even i can do that test i need a way how to profile the rules, so this will only indicate no implementation problem in opnsense/BSD

by the way, remember that when the FW is disabled everythings works good.

Quote from: nero355 on June 27, 2026, 10:27:54 PMAnother thing to keep in mind is this : https://www.tomshardware.com/news/intel-apollo-lake-refresh-degradation-cpu-failure,40362.html
Your CPU might be affected by something like that too IIRC from a very long time ago, but I could be wrong...

ummm, I dont think this is the problem, If the CPU were dying or having problems i would see strange behaviours in all options and not only with a specific interface in a specific protocol.

In any case thanks for the answers
#9
it could be, but how to check it?

honestly, my FW ruleset is quite small (<200 rules) and most of them are the autogenerated ones, I will be surprised if it is something like this.

I dont know where or how to look at this point

EDIT: test speed in wan using ipv6 done, cpu usage inside limits (65%idle),I think I have a serious performance issue in the NAT table or some ipv4 filters....
#10
as extra information i run a speed test LAN - DMZ via iperf3:

Quoteiperf3 -c clouddocker.dmz.home.internal -p 4000 -M 400 -P 8 -l 9000

I tried to simulate a high packet rate with a small payload to see if this reproduces the issue, and while the interrupts and system tasks were significantly higher, both of them reached only around 70% of the processor (not great but it could be acceptable).

Could this test suggests that it has something to do with the NAT or perhaps with any WAN rule? (can we discard pf general performance issue?)
#11
Nope, intel NIC
root@OPNsense:~ # pciconf -lv | grep -A4 ethernet
    subclass   = ethernet
igb1@pci0:2:0:0: class=0x020000 rev=0x03 hdr=0x00 vendor=0x8086 device=0x1539 subvendor=0x8086 subdevice=0x0000
    vendor     = 'Intel Corporation'
    device     = 'I211 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
igb2@pci0:3:0:0: class=0x020000 rev=0x03 hdr=0x00 vendor=0x8086 device=0x1539 subvendor=0x8086 subdevice=0x0000
    vendor     = 'Intel Corporation'
    device     = 'I211 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
igb3@pci0:4:0:0: class=0x020000 rev=0x03 hdr=0x00 vendor=0x8086 device=0x1539 subvendor=0x10f3 subdevice=0x0101
    vendor     = 'Intel Corporation'
    device     = 'I211 Gigabit Network Connection'
    class      = network
    subclass   = ethernet

I dont think it has something to do with the NIC, remember that when i disable the FW the load of the system under the same test is < 18%....

I am pretty sure it is something related to the processing of the rules / NAT.  But i am surprised by the numbers i get and i can not imagine what is going on...
#12
here you have it, in the attached file.

I tried to implement the tunnable described in the opnsense documentation about performance:
https://docs.opnsense.org/troubleshooting/performance.html

if you need something else just ask

thanks

#13
Hi all,

I think i tried to debug it until the  limit of my knowledge but i reached a point i will need a bit of support and guidance.

During a regular speed test via internet (a 350mbps connection) i realized that the "swi1: netisr" routines take 100% of the cpu, this is only noticeable if the FW is enabled, if the FW is disabled then the cpu usage for the same test is not going beyond 18% (the expectation here)

i dont have any idp/ids active, my services are limited to DNSmasq, unbound and tailscale and the nic are intel. so I am out of ideas.

could somebody be so kind to point me to where to look to see where the problems is originated?

thanks in advance
#14
fuck!, I feel ashamed of myself.

I promise i checked that for hours and i didnt see anything wrong.

thanks for the help.

For the IPv6 I am not so worried, I only wanted to ahve a rule that could be triggered when one of the device of the network is not behaving "nominally". I dont care if dns over ipv6 are not resolved when not directed to the FW
#15
Hi all,

I hope i can get some of the collective intelligence about a port forwarding rule, to explain me what i am doing wrong.

I set a port forwarding rule to redirect every DNS request to port 53 NOT addressed to the firewall to be redirected to the firewall itself. I want to avoid that some devices are forced to use other dns server that the default one.

but in the firewall log i can see that the rule is triggered always.

am i doing something wrong? important info, the rule is only triggered by Ipv6 and not ipv4.