Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - rwhitton

#1
When the problem came to my attention I was running 25.1.4
#2
Well I found allow.py was invoked from: /usr/local/opnsense/service/conf/actions.d/actions_captiveportal.conf

I took a backup and then edited the file to replace all instances of "--" with "-". This fixed up allow.py and set_session_restrictions.py which seemed to have the same issue.

I couldn't quickly see a way to restart the captive portal service so I rebooted the server.

Problem solved, but rather concerning how it got into this state.

If anybody has any ideas or further info I'd love to hear from you.

#3
I've been using a captive portal for a guest wifi network for a long time without a problem. Recently due to a hardware failure I had to move to new (identical) hardware. I restored my config from backup. Subsequently I just found, some weeks later, that my captive portal no longer works. I get the landing page without a problem but all attempts to sign in (local database) get "authentication failed". I've tried deleting and recreating the captive portal, deleting the database etc all with no joy. Nothing in the captive portal log file at all.

Then I noticed that in the backend log file I get these clearly related errors:

2025-04-16T17:55:36   Error   configd.py   [d556c8a3-baa1-4358-b8fa-fb2d3b421855] Script action failed with Command '/usr/local/opnsense/scripts/OPNsense/CaptivePortal/allow.py --zoneid='0' --username='guest' --ip_address='192.168.200.103' --authenticated_via='Local Database'' returned non-zero exit status 2. at Traceback (most recent call last): File "/usr/local/opnsense/service/modules/actions/script_output.py", line 78, in execute subprocess.check_call(script_command, env=self.config_environment, shell=True, File "/usr/local/lib/python3.11/subprocess.py", line 413, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '/usr/local/opnsense/scripts/OPNsense/CaptivePortal/allow.py --zoneid='0' --username='guest' --ip_address='192.168.200.103' --authenticated_via='Local Database'' returned non-zero exit status 2.

Huh - but those arguments should be passed with a single "-" and not with "--". Let's test it from the command line:

root@bosk:/usr # /usr/local/opnsense/scripts/OPNsense/CaptivePortal/allow.py --zoneid='0' --username='guest' --ip_address='192.168.200.103' --authenticated_via='Local Database'root@bosk:/usr # /usr/local/opnsense/scripts/OPNsense/CaptivePortal/allow.py --zoneid='0' --username='guest' --ip_address='192.168.200.103' --authenticated_via='Local Database'
usage: allow.py [-h] -username USERNAME -zoneid ZONEID [-authenticated_via AUTHENTICATED_VIA] [-ip_address IP_ADDRESS]
allow.py: error: the following arguments are required: -username, -zoneid
root@bosk:/usr # echo $?
2
root@bosk:/usr #

So now if I try it with single dashes:

root@bosk:/usr # /usr/local/opnsense/scripts/OPNsense/CaptivePortal/allow.py -zoneid='0' -username='guest' -ip_address='192.168.200.103' -authenticated_via='Local Database'
{"zoneid":"0","authenticated_via":"Local Database","userName":"guest","ipAddress":"192.168.200.103","macAddress":null,"startTime":1744823522.2908046,"sessionId":"YYEtFOnxAm9ihn+yuvvgPg==","clientState":"AUTHORIZED"}
root@bosk:/usr # echo $?
0
root@bosk:/usr #

So that works fine. Does anybody know how to fix this? Or know how it got into this state? Does anybody know where the allow.py script is invoked from?

Many thanks,

Rob





#4
Are you on 22.1.6? I upgraded to this a few days ago and I've been unable to get any port-forwards to work since.

It's consistently failing in the firewall being caught by the default deny rule even though there is a matching rule in the firewall for the port-forward.
#5
I tried any previously. I just tried "this firewall" and unfortunately I get the same response.

I think this is some sort of recent regression or change in behaviour. Possibly with 22.1.6 which I only upgraded to the other day. I set up port forward rule two weeks ago and it was fine.
#6
(Version 22.1.6)

Having spent several hours I'm unable to get a simple NAT port forward rule working. It's always caught by the default deny rule.

It's a really simple NAT rule from WAN:5051 -> MY_INTERNAL_IP:5051 TCP. See attached.

I have the associated rule created and if I look at the firewall rules then I can see that the rule is there.

When I attempt to connect then looking at the live view I can see that it's being consistently caught by the default deny rule as shown below:

__timestamp__   2022-04-18T12:59:52
ack   
action    [block]
anchorname   
datalen   0
dir    [in]
dst   x.x.x.x
dstport   5051
ecn   
id   30452
interface   pppoe1
interface_name   WAN
ipflags   DF
ipversion   4
label   Default deny / state violation rule
length   52
offset   0
protoname   tcp
protonum   6
reason   match
rid   02f4bab031b57d1e30553ce08e0ec131
rulenr   9
seq   2845703226
src   y.y.y.y
srcport   51702
subrulenr   
tcpflags   S
tcpopts   
tos   0x0
ttl   121
urp   64240

I've had port forwards working previously without any issues. I've tried all the usual things such as rebooting; deleting the NAT rule and recreating; using different ports; changing NAT reflection, but the problem persists. Does anybody have any idea what might be wrong and how to fix this?

Many thanks


#7
I can restart and it and the CPU hog seems to go away for a while but it soon comes back and runs like this 24/7. Let's take a look with top:

root@bosk:~ # top

last pid:  1685;  load averages:  1.74,  1.54,  1.34                                                                     up 5+21:38:23  11:13:46
60 processes:  3 running, 57 sleeping
CPU: 29.7% user,  0.0% nice,  1.7% system,  0.3% interrupt, 68.3% idle
Mem: 388M Active, 5572M Inact, 202M Laundry, 1416M Wired, 735M Buf, 312M Free
Swap: 8192M Total, 8192M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
51222 root          1 103    0    37M    26M CPU3     3 141.5H  99.72% python3.8
86842 root          7  20    0  3076M  1166M nanslp   2 662:06   6.80% suricata
49827 root          1  20    0    61M    36M select   1   0:01   1.21% php-cgi
  426 root          2  52    0   105M    59M accept   3   0:27   0.53% python3.8
61091 root          1  20    0    61M    37M accept   2   0:03   0.12% php-cgi
35054 root          1  20    0    14M  4052K CPU0     0   0:00   0.07% top
88400 root          4  20    0    43M    12M kqread   1  30:55   0.06% syslog-ng
// SNIP


So that's PID 51222:

root@bosk:~ # ps 51222
  PID TT  STAT       TIME COMMAND
51222  -  Rs   8487:33.77 /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.8)
root@bosk:~ #


So searching around I'm not the first to report such a thing so this looks like a longstanding problem. I'm not seeing any plausible solutions to the problem though.

If I had strace available I'd take a look to see what the process was doing.

Any thoughts? I'm sure if I disable netflow the problem will go away but that does remove rather an important feature of opnsense.

#8
In the ubound->blacklist disable the blacklisting of "windowspyblocker (extra)"
#9
I recently switched to opnsense (fantastic decision). I'm using unbound DNS as a local DNS server. In general it all works perfectly but I noticed that certain domains failed to resolve. In particular login.microsoftonline.com (which I need for my work) wouldn't resolve from an Android client - although it would resolve fine from a Linux machine on the same network - curious.

The android client was fine if I switched it to use 1.1.1.1 as the DNS server.

I didn't make much progress with the problem until I saw this eloquent article that describes the exact problem I'm seeing:

https://techcommunity.microsoft.com/t5/office-365/dns-resolution-issues-when-attempting-to-connect-to-login/m-p/146379

I couldn't find any control over TCP DNS requests in opnsense except for the number of incoming/outgoing TCP packets (I had the defaults of 10). On a whim I increased to 20 of each. Making this change seems to have fixed the issue. This seems very surprising. I wondered if perhaps there was a bug in the GUI such that 10 was in fact setting 0 which I believe would disable TCP DNS requests.

Thoughts?

Rob