Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - Saarbremer

#1
Hi,

that sounds like an issue on your docker host. You should verify whether there is any traffic reaching OPNsense and if so, does it get blocked? Also check your IP ranges for possible overlap, just in case.
#2
Hi,

could you please elaborate more on your setup?

What network segments do you have (VLANs and/or WIFI SSIDs) and where do they occur in your setup? Is your modem just a bridge to your ISP or is it actually doing the PPPoe (or whatever it is you're having)?

#3
Hi,

I am currently dealing with an annoying kernel panic on a Deciso DEC750.

Situation:
- Device was used for 12+ months behind a router "Digitalisierungsbox Premium 2", provided on a fiber business account by Deutsche Telekom - static IPv4 address and static /56 net + /64 address included. The router provided RFC1918 IPv4 addresses and NAT and set the Decisio Box as "exposed". Worked OK. (up to latest release of the 25.1 branch). Since the business router wouldn't provide DHCPv6 PD we decided to change the setup.

- The "Digitalisierungsbox" came with a plugged in "Digitalisierungsbox Smart 2" (aka Zyxel's GPON SFP module) which was then plugged in directly to the DEC750 on X0. After configuring PPPoE as usual the connection was established and work perfectly on IPv4 and IPv6, tracking the WAN for IPv6 subnets worked on all local VLANs. Firewall rules were extended to IPv6 and that's how it worked again, no problems detected.

- The OPNsense box also provides OpenVPN and IPv6 was working here as well. This includes OpenVPN access via IPv6 as well as handing out IPv6 addresses on OpenVPN.

- After a couple of days the box restarted itself after a kernel panic and repeated to do so every 2-7 days. Trying to trigger a panic by heavy load on traffic and/or OpenVPN was not successful. There is no indication on when the restart happens. Upgraded from the latest 25.1 to 25.7.3_7-amd64 and still it takes a couple of days for the next kernel panic to occur.

- Based on https://forum.opnsense.org/index.php?topic=41808.15 we tried disabling IPv6 on WAN and removed it from all VLANs and OpenVPN. And now the box runs smoothly as usual for at least 10 days now. No further kernel panic so far.

- The stack trace is like this:

panic: vm_fault_lookup: fault on nofault entry, addr: 0xfffffe003c242000
cpuid = 3
time = 1757468281
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00aa80c780
vpanic() at vpanic+0x131/frame 0xfffffe00aa80c8b0
panic() at panic+0x43/frame 0xfffffe00aa80c910
vm_fault() at vm_fault+0x15af/frame 0xfffffe00aa80ca30
vm_fault_trap() at vm_fault_trap+0x81/frame 0xfffffe00aa80ca80
trap_pfault() at trap_pfault+0x1be/frame 0xfffffe00aa80cad0
calltrap() at calltrap+0x8/frame 0xfffffe00aa80cad0
--- trap 0xc, rip = 0xffffffff80f7ec41, rsp = 0xfffffe00aa80cba0, rbp = 0xfffffe00aa80cbb0 ---
vm_radix_remove() at vm_radix_remove+0x51/frame 0xfffffe00aa80cbb0
vm_page_object_remove() at vm_page_object_remove+0x69/frame 0xfffffe00aa80cbd0
vm_page_free_prep() at vm_page_free_prep+0x24/frame 0xfffffe00aa80cbf0
vm_page_free_toq() at vm_page_free_toq+0x12/frame 0xfffffe00aa80cc20
vm_object_page_remove() at vm_object_page_remove+0x6a/frame 0xfffffe00aa80cc80
vm_map_entry_delete() at vm_map_entry_delete+0xf5/frame 0xfffffe00aa80ccc0
vm_map_delete() at vm_map_delete+0x7b/frame 0xfffffe00aa80cd30
vm_map_remove() at vm_map_remove+0x96/frame 0xfffffe00aa80cd60
vmspace_exit() at vmspace_exit+0xab/frame 0xfffffe00aa80cd90
exit1() at exit1+0x53a/frame 0xfffffe00aa80cdf0
sys_exit() at sys_exit+0xd/frame 0xfffffe00aa80ce00
amd64_syscall() at amd64_syscall+0x10e/frame 0xfffffe00aa80cf30

- The failure mode suggests some memory corruption that does not immediately show itself but causes a delayed kernel panic. So other than "switching off IPv6" I have no idea how to proceed right now. I checked system.log and did not find anything helpful or interesting other than a signal 11 on a python process


...
(repeating for some days until the crash)
2025-09-21T23:17:05 Notice dhcp6c dhcp6c_script: RENEW on pppoe0 executing
2025-09-21T23:02:05 Notice dhcp6c dhcp6c_script: RENEW on pppoe0 executing
2025-09-21T22:51:00 Notice kernel <6>[449403] pid 85765 (python3.11), jid 0, uid 0: exited on signal 11 (no core dump - bad address)
2025-09-21T22:47:05 Notice dhcp6c dhcp6c_script: RENEW on pppoe0 executing
2025-09-21T22:32:05 Notice dhcp6c dhcp6c_script: RENEW on pppoe0 executing
...



My questions:
- Is there anything I can do to locate the problem and may work around it? I would really like to write a bug report but just having a kernel panic in a page handler is not really helpful I guess.

- Can I do something about the DEC750, I am not an expert on this piece of hardware and it's properties and/or quirks (if any).

- Does anybody else have issues with that kind of setup and IPv6 and possibly a workaround?

#4
25.1, 25.4 Series / Re: Alias database [resolved]
June 05, 2025, 11:55:24 AM
Quote from: cookiemonster on June 02, 2025, 05:46:07 PMI don't know then with the limited info available for two firewalls and their setup.

Sorry for being unclear: Both instances were not running in HA mode or anyhow connected. One is the network's edge router the other one acts as a DevOps testing protection gateway. The only common thing they had: Same software version. But different behaviour.

#5
25.1, 25.4 Series / Re: Alias database [resolved]
June 02, 2025, 05:12:46 PM
Quote from: cookiemonster on June 02, 2025, 04:19:24 PMAre you running them as a HA setup with CARP and pfsync enabled?

No, I don't.
#6
25.1, 25.4 Series / Re: Alias database
June 02, 2025, 03:51:24 PM
Yes, I can ping the IP address.

Had the chance to run configctl filter refresh_aliases
On the proxmox machine with no issues:
{"status": "ok"}
On the barematel machine with issues:

(yes, just an empty line)

#7
25.1, 25.4 Series / Alias database [resolved]
June 02, 2025, 03:27:21 PM
Hi,

I am running out of ideas what to check with the following issue:

I have two instances of OPNSense, running on 25.1.7_4. One is within a proxmox VM and works fine. The other is my edge router (bare metal) and this is unable to handle new aliases.

What I did to exercise the problem:
1. Create new Alias "PC" (Host, 1 IPv4 LAN). Yes, clicked "Apply"!
2. Create a rule on LAN (Source "PC", Protocol enabled), pass. Yes, clicked "Apply"!
3. Trigger some traffic, nothing in the LiveView Log
4. Updated the rule using the verbatim IP address.
5. LiveView is showing a lot of traffic from the protocol rule.

Observations:
- In the alias section in firewall, the "last updated" column remains empty for "PC", load count is 0
- In the alias section in diagnostics, PC shows up as selectable item but shows no contents.
- Global configuration in /conf/config.xml contains the alias definition
- Checked /var/db/aliastables, no entry for "PC" - the filesystem has plenty of space left and permissions seem ok
- Checked backend log: Nothing of a warning or higher severity, nothing relevant (from my perspective) in less severe levels.
- Checked firewall log: No warning or higher, nothing about alias (had to search for the term "alias")
- Cloudflare, Spamhaus DROP and GeoIP seem to regenerate  as usual, timestamp of /var/db/aliastables matches log entries

The only "interesting" part about this machine is that I replaced the SSD 4 weeks ago, ran a full install and reloaded the last known config / backup. Updated to 24.1.7_4 in the process afterwards.

I know I can stick to hard coded IP addresses for now - and I will not reboot until the next weekend at least, so testing it is currently not possible. My second instance on Proxmox does not have this issue and updates everything as required.

EDIT: (See reply below for more) running configctl filter refresh_aliases returned no output other than an empty line.

Are there any other locations I might have a look for diagnostics or trigger an alias re-generation from the shell?

Thanks.

EDIT2/Resolution: flock was blocking forever on a lock existing for more than 21 days. I'd expect however the firewall to not silently do nothing in such a case.

#8
You may want to provide more information.

Please show your exact rules and diagnsotics information.
#9
From the lack of information you provided here I could guess that you're using public DNS to connect to your proxy which resolves to the WAN IP and results in 403 due to NAT on the client's side. When you use the local IP of nginx you're routed through the tunnel and due to missing NAT everything works.

But again, I might be wrong as you do neither describe IP ranges, possible DNS resolutions nor IP protocols involved.
#10
Another solution is to use

ifconfig [-]rxcsum,[-]txcsum

etc.

check its manpage for more options. Or the source code of the web ui for its usage.

https://github.com/opnsense/core/blob/3cbc7927db174f51eec007739b4fcf4247a18948/src/etc/inc/interfaces.lib.inc#L548

#11
24.7, 24.10 Legacy Series / Re: NTOP problem & REDIS
September 16, 2024, 06:21:13 PM
First of all:
ntopng default credentials are admin and the password of root from opnsense.

Second:
redis sucks a lot in combination with ntopng. Restart redis. On my machine it crashes from time to time. Probably a memory issue. But due to no need for ntopng any more, I removed both.
#12
There's no need to flush the state table of the firewall as the firewall remains unaffected by unbound's black listing.

What unbound does is to return NXdomain (if selected) or 0.0.0.0 (default, if not another ip is entered). In both cases, your client's local dns resolver will cache that result for the TTL (time to live). So, when you update the behaviour of unbound due to white or black lists you might want to restart unbound and flush the DNS cache on all affected clients. Or wait for the TTL to expire.

E.g. on windows you can do ipconfig /flushdns if I remember correctly.
#13
Interesting scenario.

However, if you want full IPv6 deployment you need to delegate IPv6 address spaces. Enabling IPv6 just for LAN just gives you IPv6 connectivity for the switch.

Hence:
- Get an /48 from HE if not already there
- Configure DHCPv6 on OPNsense to delegate a (let's say) /56 (taken by the switch)
- On the switch make sure it can deal with a delegated prefix
- Check that your OPNsense local networks (LAN, DMZ, WAN) do NOT have the same prefix as the delegated one.

Did you do any of that and if yes, with which outcome?
#14
> Will add screenshots of nat, wan and portforward rules below since max 4 attachments

Would be really helpful.
#15
In 24.7.2 I found ICMPv6 134 Destination Unreachable before the messages like these, my comments in ():
fe80::xxx (Android device)   -> 2a00:1450:4005:801::200a (google as)   ICMPv6   134   Destination Unreachable (no route to destination)

A neighbor advertisement for the device's gua is not shown until after that destination unreachable. I wonder if that was the issue.

Nevertheless, 24.7.3. works and I will now update the unifi ap's back to the most recent fw.