Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - ikkeT

#1
I had dnsmasq in use in opnsense, and moved the config to kea and unbound. It's no biggie just one evening useless work. I don't need to do anything to keep the current setup as it sounds.
#2
Damn, then I made the migration from dnsmasq to kea for nothing, I thought kea was the way forward. Well it works now...
#3
thanks, makes sense now that you point it out :D
#4
Hi,

I have had some unstableness in my opnsense for over a year now. After long digging, I found it is likely caused by ARP jumping IP from device to another in my laptop. Why does this keep happening?

I have laptop with two interfaces, wlan and usbdongle ethernet when in wire:

2: wlp0s20f3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 6a:cb:3f:c6:c9:09 brd ff:ff:ff:ff:ff:ff permaddr 9c:67:d6:0f:8f:c0
    inet 192.168.117.59/24 brd 192.168.117.255 scope global dynamic noprefixroute wlp0s20f3
       valid_lft 4000sec preferred_lft 4000sec
    inet6 fe80::66f9:af89:6d28:703a/64 scope link tentative noprefixroute
       valid_lft forever preferred_lft forever
4: enp0s13f0u1u2u1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 48:65:ee:15:7f:c2 brd ff:ff:ff:ff:ff:ff
    inet 192.168.117.56/24 brd 192.168.117.255 scope global dynamic noprefixroute enp0s13f0u1u2u1
       valid_lft 2936sec preferred_lft 2936sec
    inet6 fe80::4993:59f0:25d:240a/64 scope link noprefixroute
       valid_lft forever preferred_lft forever

Both MACs are fixed with separate IP addresses in KEA reservations page:

192.168.117.0/24   192.168.117.56   48:65:ee:15:7f:c2   satechi
192.168.117.0/24   192.168.117.59   6a:cb:3f:c6:c9:09   iklap

Satechi is the usbdongle brand. Iklap is the Fedora laptop name.

While I have the both connected (docking), I see this bouncing in OPNSense:

arp: 192.168.117.56 moved from 6a:cb:3f:c6:c9:09 to 48:65:ee:15:7f:c2 on igb2
arp: 192.168.117.59 moved from 6a:cb:3f:c6:c9:09 to 48:65:ee:15:7f:c2 on igb2
arp: 192.168.117.56 moved from 48:65:ee:15:7f:c2 to 6a:cb:3f:c6:c9:09 on igb2
arp: 192.168.117.56 moved from 6a:cb:3f:c6:c9:09 to 48:65:ee:15:7f:c2 on igb2
arp: 192.168.117.56 moved from 6a:cb:3f:c6:c9:09 to 48:65:ee:15:7f:c2 on igb2

And I believe that will drain the opnsense out of mem soonish. What causes the IP to bounce outside of their mac? I suspect it's somehow the laptop sending dhcpc query with laptop name in it (NetworkManager), which then KEA uses to overrule what Reservations page is saying.

Is this a bug somewhere? Why does KEA allow ip to go from MAC to another not respecting reservations?

Any idea what should be done here? It's annoying needing to toggle wlan off each time while docking due this. Do I have some misconfig in a) in my Fedora laptop or b) KEA, or c) bug somewhere?

#5
Sorry only now noticed your reply, and thanks. I have tried to disable the collection of them, and I recall it still hung. I will disable it again after the next memleak to verify again.
#6
My guess is it just reads lot of files, thus leaving them into memory buffers for quick access until memorybis needed for something else. Hence the jump. But why >40 php-cgi, is that normal?

Normally before the box dies something starts leaking mem and system goes down in half an hour.
#7
I noticed this as well today, as I've monitored a bit memory usage trying to figure out why opnsense runs out of mem every two weeks.

I also notice there are 43 cgi-bins. And problem occurs around this time:

root@OPNsense:~ # grep 3.\*configctl /var/cron/tabs/root
1       3       1       *       *       (/usr/local/sbin/configctl -d filter schedule bogons) > /dev/null

I wonder what does it do?

See mem graph:
#8
Hi,

I have been experiencing this for quite long, but would now get to the roots of it. I installed telegraf, influxdb and grafana to see when and what starts going wrong. I see flowd_aggregate.py script at least keeps using lot cpu. But I can't find from logs what causes sudden memory usage, and raises cpu usage too. See grafana:

I didn't know where to put the image, as I can't upload it here, but see from mastodon: https://mementomori.social/@ikkeT/113957621410576425


  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND                                                                                                                                     
99283 root          1 120    0    51M    38M CPU0     0  46.7H  99.05% python3.11                                                                                                                                   


root@OPNsense:~ # ps awfux|grep 99283
root     99283  83.5  0.9   52676  39024  -  Rs   24Jan25  2804:23.00 /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.11)

Any ideas what could cause this, or how to find the problema from logs?
#9
I toggled the nics off and back on in netflow, and also disbabled the local service and cleared the netflow data few times. Now I got the cpu usage down at least for a while. Let's see if it stays that way now.
#10
Hi,

I've had this problem for several months, but now getting more often. OPNsense works several days just fine, but all the sudden home traffic starts slowind down and then I can't access it any longer and network dies. I keep it up to date, it's nothing sudden, the problem has been around for several releases. Now I'm running 24.7.11.

I just had to pull the plug and reboot. I thought I look around a bit. I disabled rrd collection just to make sure it's not that. No help. I run the following services at home, not much traffic:
- HAproxy (mainly traffic to nextcloud instance
- dnsmasq for home gadgets
- kea dhcp
- captive portal for guest VLAN, hardly ever used.

I used to have IPv6 enabled, but after moving the new connection only has IPv4.

So not much running. Immediately I notice some problems:

1. Flowd is eating CPU:


76462 root          1 135    0    58M    44M CPU0     0  16:38 100.00% python3.11
# ps awfux|grep 76462
root   76462 100.0  1.1  59844 44944  -  Rs   09:23   16:57.09 /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.11)



2. Config.d Errors in logs

(I have never touched unbound, it's not running)

2024-12-18T09:44:55 Error configd.py [8741e584-e8e0-47d1-940e-639b0fe9a307] Script action failed with Command '/usr/local/opnsense/scripts/unbound/wrapper.py -s ' returned non-zero exit status 1. at Traceback (most recent call last): File "/usr/local/opnsense/service/modules/actions/script_output.py", line 78, in execute subprocess.check_call(script_command, env=self.config_environment, shell=True, File "/usr/local/lib/python3.11/subprocess.py", line 413, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '/usr/local/opnsense/scripts/unbound/wrapper.py -s ' returned non-zero exit status 1.
2024-12-18T09:30:11 Error configd.py Timeout (120) executing : system diag log '20' '0' '' 'core' 'audit' 'Emergency,Alert,Critical,Error,Warning' '1734420490.461'
2024-12-18T08:55:33 Error configd.py [eb377147-ead9-4e22-b070-4066dc2a5e25] Script action failed with Command '/usr/local/opnsense/scripts/interfaces/list_macdb.py ' died with <Signals.SIGBUS: 10>. at Traceback (most recent call last): File "/usr/local/opnsense/service/modules/actions/script_output.py", line 78, in execute subprocess.check_call(script_command, env=self.config_environment, shell=True, File "/usr/local/lib/python3.11/subprocess.py", line 413, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '/usr/local/opnsense/scripts/interfaces/list_macdb.py ' died with <Signals.SIGBUS: 10>.
2024-12-18T08:55:33 Error configd.py [47cd8873-4e90-45dd-81a7-66fa3dfee38c] Script action failed with Command '/usr/local/sbin/pluginctl -D ''' died with <Signals.SIGBUS: 10>. at Traceback (most recent call last): File "/usr/local/opnsense/service/modules/actions/script_output.py", line 78, in execute subprocess.check_call(script_command, env=self.config_environment, shell=True, File "/usr/local/lib/python3.11/subprocess.py", line 413, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '/usr/local/sbin/pluginctl -D ''' died with <Signals.SIGBUS: 10>.
2024-12-18T08:53:14 Warning configd.py Stopping daemon.
2024-12-18T08:53:14 Error configd.py Configd disconnected while executing : interface list macdb
2024-12-18T08:52:52 Error configd.py Configd disconnected while executing : openvpn connections client,server
2024-12-18T08:52:52 Warning configd.py Stopping daemon.
2024-12-18T08:50:06 Error api no active session, user not found
2024-12-18T08:45:08 Error configd.py Timeout (120) executing : firmware remote
2024-12-18T08:43:06 Error configd.py Timeout (120) executing : firmware tiers
2024-12-18T08:41:28 Error configd.py Timeout (120) executing : firmware remote
2024-12-18T08:38:06 Error configd.py Timeout (120) executing : firmware remote
2024-12-18T08:38:05 Error configd.py Timeout (120) executing : firmware tiers
2024-12-18T08:36:05 Error configd.py Timeout (120) executing : firmware tiers
2024-12-18T08:33:04 Error configd.py Timeout (120) executing : firmware tiers
2024-12-18T08:23:11 Error configd.py Timeout (120) executing : firmware remote
2024-12-18T08:20:03 Error configd.py Timeout (120) executing : firmware tiers
2024-12-18T08:16:03 Error configd.py Timeout (120) executing : firmware tiers
2024-12-18T08:12:01 Error configd.py Timeout (120) executing : firmware tiers


3. Disk space should be OK

root@OPNsense:~ # ls -ltrh /var/crash && df -hT
total 4
-rw-r--r--  1 root wheel    5B Dec  2 21:45 minfree
Filesystem       Type     Size    Used   Avail Capacity  Mounted on
/dev/gpt/rootfs  ufs       13G    8.1G    4.3G    65%    /
devfs            devfs    1.0K      0B    1.0K     0%    /dev
tmpfs            tmpfs    2.0G    3.5M    2.0G     0%    /tmp
devfs            devfs    1.0K      0B    1.0K     0%    /var/dhcpd/dev
devfs            devfs    1.0K      0B    1.0K     0%    /var/captiveportal/zone0/dev


So question, what the heck is this flowd doing, and how to disable it? Perhaps it's that overcooking the CPU. I found some old thread about deleting and putting interfaces back to it, I'll try. Let's see what else is there.
#11
I just lost ipv4 dhcp. i updated already days ago, but today all the sudden it won't serve. I will pull in the very latest updated and see. Overall I have had to reboot my apu2 box several times after the upgrade due network being sluggish.

Dis you find out any reason for the failure? By quick look I see nothing in logs.
#12
Problem solved after several hours of wondering. And of course, it was a user problem again (me!). I found this issue, where someone had the same problem and he reminded it's not enough to save peers and apply, but they need to be listed in server peers list separately.

I know it's my bad, but it is easy to miss. It might be worth adding a remindender text in the dialog where one creates new peers. Or better yet, ask there to which servers you want to assign the peer to, having the list there too. As such it's super easy to miss.

https://github.com/opnsense/plugins/issues/2926
#13
See attached client list screenshot.
#14
Hi,

I got my first two wg peers to connect. But as I added the third peer, it won't get picked from gui to system configs. OPNsense is the latest version at the date, OPNsense 22.1.10-amd64. The config is just the same as the two previous ones, listing name, public key and allowed ip (192.168.116.22/24).

But when I apply the settings, only the two first ones are written into wg0 config file, also seen from gui in peer list. The third one never gets there. See Peer List view:

interface: wg1
  public key: (hidden)
  private key: (hidden)
  listening port: 55555

peer: (hidden)
  endpoint: 1.1.11.24:24472
  allowed ips: 192.168.116.21/32
  latest handshake: 10 minutes, 34 seconds ago
  transfer: 4.89 MiB received, 1.00 GiB sent

peer: (hidden)
  endpoint: 1.1.1.24:26682
  allowed ips: 192.168.116.20/32
  latest handshake: 36 minutes, 6 seconds ago
  transfer: 340.61 KiB received, 480.98 KiB sent


How can this be? I have tried saving and applying it several times, but the third one never get's there. Also restarted the wg several times. The config of peer is just like the others, only pubkey and ip are different. What am I missing?
#15
Aaaand found the final error. There some copy paste problem, I had the server pub key also set to android peer in opnsense. do'h, some hours well spent :D

I try to see if I get to delete the post.