Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - Ben S

#31
Thank you for this quick fix, I'm trying 24.1.10_1 now.

Last night I did:
opnsense-revert -r 24.1.9 dhcp6c
opnsense-revert -r 24.1.9 opnsense


And _one_ of these did resolve the problem, it was late and I didn't have time to do them separately and confirm which one.

I'm not sure though if mine is the same problem - just because when IPv6 is working well, the DHCP replies all seem to come from a link-local address, so not sure if/how the GUA fix here is relevant.  Still, I'll wait and see.

Thanks
Ben
#32
Hi,

This problem may be related to https://forum.opnsense.org/index.php?topic=41508.0 but my problems appear slightly different.

After the update to 24.1.10 I rebooted and everything appeared fine.  After 30 minutes (first IPv6 renewal) I seemed to loose IPv6 connectivity.  Logs showed dhcp6c sending RENEW and then SOLICIT messages.  When trying to diagnose this I could see the packets were being allowed out - checked with
$ sudo tcpdump -ve -i pflog0 udp port 546 or udp port 547
tcpdump: listening on pflog0, link-type PFLOG (OpenBSD pflog file), capture size 262144 bytes
21:37:43.196073 rule 105/0(match) [uid 0]: pass out on igb0: (hlim 1, next-header UDP (17) payload length: 89) fe80::2e0:xxx.dhcpv6-client > ff02::1:2.dhcpv6-server: [bad udp cksum 0x29e8 -> 0x1be3!] dhcp6 solicit (xid=2feda3 (client-ID hwaddr/time type 1 time xx xx) (elapsed-time 65535) (option-request DNS-server DNS-search-list) (IA_PD IAID:0 T1:0 T2:0 (IA_PD-prefix ::/56 pltime:4294967295 vltime:4294967295)))
21:39:35.293493 rule 105/0(match) [uid 0]: pass out on igb0: (hlim 1, next-header UDP (17) payload length: 89) fe80::2e0:xxx.dhcpv6-client > ff02::1:2.dhcpv6-server: [bad udp cksum 0x29e8 -> 0x2881!] dhcp6 solicit (xid=e134 (client-ID hwaddr/time type 1 time xx xx) (elapsed-time 0) (option-request DNS-server DNS-search-list) (IA_PD IAID:0 T1:0 T2:0 (IA_PD-prefix ::/56 pltime:4294967295 vltime:4294967295)))
21:41:41.236162 rule 103/0(match) [uid 0]: pass out on igb0: (hlim 1, next-header UDP (17) payload length: 89) fe80::2e0:xxx.dhcpv6-client > ff02::1:2.dhcpv6-server: [bad udp cksum 0x29e8 -> 0x484f!] dhcp6 solicit (xid=419032 (client-ID hwaddr/time type 1 time xx xx) (elapsed-time 12531) (option-request DNS-server DNS-search-list) (IA_PD IAID:0 T1:0 T2:0 (IA_PD-prefix ::/56 pltime:4294967295 vltime:4294967295)))


But running a similar tcpdump command on igb0 (WAN) did not show the packets actually being sent, despite them showing as 'pass out' in the pf log.  Which I find rather confusing.  I'm not sure if the bad checksum notices in the pflog output are significant.

After rebooting the IPv6 has come back, but I don't yet know if it will stay up.  Reloading the WAN interface from the UI > Interfaces > Overview didn't bring it back.

My IPv6 has been working fine on 24.1.7 and 24.1.9.

Any suggestions would be much appreciated.

Thanks
Ben
#33
Glad I could help and you got it sorted.  :)
#34
Did you check permissions on all directories in the path?

ls -ld / /usr /usr/local /usr/local/sbin /usr/local/opnsense /usr/local/opnsense/service

Maybe one of those has become restricted somehow.  If it's to do with setfacl then I probably can't help as I'm not familiar with that.
#35
crontab -l will only show root's cron jobs, that's expected.  Check if the cron job shows under crontab -l -u nobody

(I think that just shows the contents of the file you already checked so I expect it will)

Also try running it from the prompt using su -m nobody -c 'configctl acmeclient cron-auto-renew'


in case it is a permissions problem or something running the command as nobody.  Not sure why it would fail as nobody but perhaps worth checking.

I have this same command as a custom cron job and it runs fine as expected.
#36
When the logger process gets stuck the end of the ktrace file is as below.  So it's stuck trying to get a lock which is already held, possibly by itself, from the earlier fcntl F_SETLK call?  Hard to be sure though.  But I don't know what else would hold a lock on this file, and fstat indicates that nothing else is.

This is also the only call to flock at any time over many hours however the process had not been running for 24 hours so it doesn't seem to be due to the daily export/import which the logger script does.

Could it be a bug in duckdb?

67596 python3.11 CALL  fcntl(0x9,F_SETLK,0x820ff3940)
67596 python3.11 RET   fcntl -1 errno 35 Resource temporarily unavailable
67596 python3.11 CALL  fcntl(0x9,F_GETLK,0x820ff3940)
67596 python3.11 RET   fcntl 0
67596 python3.11 CALL  fcntl(0x9,F_SETLK,0x820ff3940)
67596 python3.11 RET   fcntl 0
67596 python3.11 CALL  openat(AT_FDCWD,0x882d53840,0x100001<O_WRONLY|O_CLOEXEC>)
67596 python3.11 NAMI  "/var/unbound/data/unbound.duckdb"
67596 python3.11 RET   openat 10/0xa
67596 python3.11 CALL  flock(0xa,0x2<LOCK_EX>)
root@barium:~ # tail -20 kdump.txt
67596 python3.11 CALL  fstatat(AT_FDCWD,0x87d89bed0,0x820ff3bd0,0)
67596 python3.11 NAMI  "/var/unbound/data/unbound.duckdb"
67596 python3.11 STRU  struct stat {dev=101, ino=3445703, mode=0100640, nlink=1, uid=59, gid=59, rdev=6911328, atime=1716448568.853894000, mtime=1716448559.667945000, ctime=1716448559.667945000, birthtime=1716370706.399868000, size=6303744, blksize=32768, blocks=12416, flags=0x0 }
67596 python3.11 RET   fstatat 0
67596 python3.11 CALL  openat(AT_FDCWD,0x87d89be70,0x100002<O_RDWR|O_CLOEXEC>)
67596 python3.11 NAMI  "/var/unbound/data/unbound.duckdb"
67596 python3.11 RET   openat 9
67596 python3.11 CALL  fstat(0x9,0x820ff39e0)
67596 python3.11 STRU  struct stat {dev=101, ino=3445703, mode=0100640, nlink=1, uid=59, gid=59, rdev=6911328, atime=1716448568.853894000, mtime=1716448559.667945000, ctime=1716448559.667945000, birthtime=1716370706.399868000, size=6303744, blksize=32768, blocks=12416, flags=0x0 }
67596 python3.11 RET   fstat 0
67596 python3.11 CALL  fcntl(0x9,F_SETLK,0x820ff3940)
67596 python3.11 RET   fcntl -1 errno 35 Resource temporarily unavailable
67596 python3.11 CALL  fcntl(0x9,F_GETLK,0x820ff3940)
67596 python3.11 RET   fcntl 0
67596 python3.11 CALL  fcntl(0x9,F_SETLK,0x820ff3940)
67596 python3.11 RET   fcntl 0
67596 python3.11 CALL  openat(AT_FDCWD,0x882d53840,0x100001<O_WRONLY|O_CLOEXEC>)
67596 python3.11 NAMI  "/var/unbound/data/unbound.duckdb"
67596 python3.11 RET   openat 10/0xa
67596 python3.11 CALL  flock(0xa,0x2<LOCK_EX>)
#37
Hi,

I'm quite new to OPNsense so hopefully I haven't missed anything obvious here.   I'm running Unbound with statistics enabled, and since updating to 24.1.7 I'm seeing the stats sometimes just stop.  It keeps answering queries just fine.  I didn't notice this on 24.1 but I'd only run it for a few days, however since the upgrade it seems to stop within an hour usually.  Restarting the service will generally get them working again, until they stop again.  I have reset the DNS data since the upgrade, and that didn't help any more than a restart does.

I can see that the timestamp on /var/unbound/data/unbound.duckdb matches the time that the UI graphs stop, so it appears to be a data collection problem and not just a UI display problem.

If I ktrace the /usr/local/opnsense/scripts/unbound/logger.py process then it appears to be doing nothing while the stats are not updating.  If I then restart the service the first line from kdump is

55384 python3.11 RET   flock -1 errno 4 Interrupted system call

which suggests to me it was stuck trying to get a lock on a file, unfortunately it's not clear which file, or why it couldn't get the lock (presumably another process held it, but what/why?)

I've checked /var/log/resolver and there is nothing useful around the time that stats stop.

This is just a home installation so there is not a huge volume of DNS queries being handled, the size of /var/unbound/data/unbound.duckdb is currently only around 2MB after resetting earlier today.

Does anyone have any suggestions of anything else I can check please?  It seems to stop fairly often so I should be able to find out if any suggestions help fairly soon.

Thanks
Ben
#38
Have you tried Interfaces > Settings > IPv6 DHCP > Prevent release?  Turning this on helped my IPv6 stability, although your problem sounds different (I was losing the address completely, you say you're keeping the address, but connectivity fails?).  Still, could be worth a try.