Captive Portal seems to be broken following update to 23.1.2

Started by BondiBlueBalls, March 08, 2023, 12:13:10 AM

Previous topic - Next topic
Hey, all. I noticed that after updating to 23.1.2, the captive portal is no longer working. I checked the logs and found these entries.


2023-03-07T17:06:55-06:00 Error captiveportal sqlite3 repair /var/captiveportal/captiveportal.sqlite [done]
2023-03-07T17:06:55-06:00 Error captiveportal sqlite3 repair /var/captiveportal/captiveportal.sqlite
2023-03-07T17:06:55-06:00 Error captiveportal Forcefully repair database (Traceback (most recent call last): File "/usr/local/opnsense/scripts/OPNsense/CaptivePortal/cp-background-process.py", line 219, in main bgprocess.sync_zone(zoneid) File "/usr/local/opnsense/scripts/OPNsense/CaptivePortal/cp-background-process.py", line 115, in sync_zone concurrent_users = self.db.find_concurrent_user_sessions(zoneid) File "/usr/local/opnsense/scripts/OPNsense/CaptivePortal/lib/db.py", line 245, in find_concurrent_user_sessions cur.execute(""" select cc.sessionid sessionId sqlite3.OperationalError: no such column: )


Please let me know if I can provide any additional info.

Thanks much!

Database broken and beyond repair. Not good, but also not really avoidable. Are you using UFS?


Cheers,
Franco

Thank you for the reply!

No, I'm using a ZFS mirror. I disabled the portal, deleted the db, enabled the portal again, but the newly created db started returning the same errors immediately.

Is there a different process I should use to get things up and running again?

Thanks!

Just to provide some additional info, I just stopped the service, deleted the zone and the database, recreated the zone, and started the service. It immediately started throwing the same three errors every minute.

Is the best move to just nuke, reinstall OPNsense, and restore from config? If so, it sounds like more than the portal db is corrupted.

Thanks again!

Hmm, I heard that sqlite may have screwed something up. Can you try this?

# opnsense-revert -r 23.1.1 sqlite3


Cheers,
Franco

I ran the command successfully, deleted the existing db, rebooted just in case, added a new zone, and I'm seeing the same three error messages once a minute.

I also confirmed that I'm using sqlite3 3.41.0,1, along with (stock) php81-sqlite3 8.1.16_1 and py39-sqlite3 3.9.16_7.

I also saw the other conversation about sqlite and had my fingers crossed, but no go. :)

Thanks!

Just realized that I'm still running the new version of sqlite3 (3.41.0,1) after running "opnsense-revert -r 23.1.1 sqlite3". It doesn't seem to revert back to 3.40.1,1 (as was used in 23.1.1_2).


Fetching sqlite3.pkg: .... done
Verifying signature with trusted certificate pkg.opnsense.org.20221213... done
sqlite3-3.41.0,1: already unlocked
Updating OPNsense repository catalogue...
OPNsense repository is up to date.
All repositories are up to date.
Checking integrity... done (0 conflicting)
The following 1 package(s) will be affected (of 0 checked):

Installed packages to be REINSTALLED:
        sqlite3-3.41.0,1

Number of packages to be reinstalled: 1
[1/1] Reinstalling sqlite3-3.41.0,1...
[1/1] Extracting sqlite3-3.41.0,1: 100%


I guess I'm not sure if there are dependencies preventing the rollback of the package or something. Thoughts?

Thank you again!

Hi

Same problem for me ( database broken , revert to previous sqllite3 not working)

Thanks.


Yep I have the same problem.
Same logs about corrupt database. Cannot downgrade sqlite either.

I CAN login to my portal (I just have a simple login button with no password authentication or anything else).
But, I do have my portal setup with some allowed IPs - these usually show permanently in the session list, and they still do, but they do not work. They are blocked and I have to login from an IP, that should otherwise be allowed.
After login, I can connect to Internet again.

But it defeats the purpose for me. I need to be able to whitelist my IPs.

attaching screenshots

Same problem here as well.

The ability to whitelist clients on the guest network is a crucial component of the network's design. I would greatly appreciate a solution to this issue.

So I just mocked around with sqlite on my instance, and there seem to be no problem with my database.
I can manually select all entries in the table.

So I tried to manually fire off the query generated in db.py - and it seems sqlite don't like double quotes in a query:
sqlite> select cc.sessionid sessionId, cc.username userName from cp_clients cc where cc.zoneid = 0 and cc.deleted = 0 and cc.username is not null and cc.username <> "" order by case when cc.username is not null then cc.username else cc.ip_address end, cc.created desc;
Parse error: no such column:
   0 and cc.username is not null and cc.username <> "" order by case when cc.use
                                      error here ---^
sqlite>


Code at line 245 in /usr/local/opnsense/scripts/OPNsense/CaptivePortal/lib/db.py (from the error message) looks like this:
cur.execute(""" select   cc.sessionid   sessionId
                        ,        cc.username    userName
                        from     cp_clients cc
                        where   cc.zoneid = :zoneid
                        and     cc.deleted = 0
                        and     cc.username is not null
                        and     cc.username <> ""   <---- problem here !!!
                        order by case when cc.username is not null then cc.username else cc.ip_address end
                        ,        cc.created desc
                        """, {'zoneid': zoneid})


Workaround:
In /usr/local/opnsense/scripts/OPNsense/CaptivePortal/lib/db.py line 251 - replace the double quotes (") with single quotes (').
After doing this on my box, the captive portal and my whitelist is working again, and I get no more errors about forced database repair.

Although I still get spammed in the logs with an error message about no active session, user not found.
Don't know what that is about - oh well...

2023-03-09T14:12:40 Error api [2023-03-09T14:12:40+01:00][ERROR] no active session, user not found
2023-03-09T14:12:39 Error api [2023-03-09T14:12:39+01:00][ERROR] no active session, user not found
2023-03-09T14:12:35 Error api [2023-03-09T14:12:35+01:00][ERROR] no active session, user not found
2023-03-09T14:12:35 Error api [2023-03-09T14:12:35+01:00][ERROR] no active session, user not found
2023-03-09T14:12:29 Error api [2023-03-09T14:12:29+01:00][ERROR] no active session, user not found
2023-03-09T14:12:28 Error api [2023-03-09T14:12:28+01:00][ERROR] no active session, user not found
2023-03-09T14:12:23 Notice captiveportal starting captiveportal background process


So I guess this needs to be officially fixed in the next release.
Take this info for what it is: not a permanent solution. Use at your own risk.

Can you report ir on github with the solution so the developers can make a patch ?

TBH, is is pretty impossible to catch and complicated to hotfix without redoing a full 10 hour build... https://github.com/opnsense/tools/commit/9290c79485a5


Cheers,
Franco

@Miwer
QuoteAlthough I still get spammed in the logs with an error message about no active session, user not found.
it's actually not topic related. this is messages from mvc api controller (browser session expired on some tab?)
captive  log may contain unrelated messages: syslog-ng assignes all messages with local4 facility to captive log (also). so it  may be messages from api or (for example) from firewall tables update (local4 facility also)..
wanted to make a pr to remove the 'local4 facilty' captive syslog filter and ask the devs about this. but haven't had time to test it yet

23.1.3 reverts the sqlite change. I'm not sure if it's a smooth upgrade experience from 23.1.2 but from 23.1(.1) it should be without incident.

If your 23.1.3 is still funky coming from 23.1.2 please run:

# opnsense-update -fp

This will reinstall all packages using the 23.1.3 state.


Cheers,
Franco