24.7.11_2 "Danger Unexpected error, check log for details" in gateway config

Started by PerpetualNewbie, January 14, 2025, 08:28:25 AM

Previous topic - Next topic
OPNSense: 24.7.11_2

Using WebUI

Pop-up: "Danger Unexpected error, check log for details"

When: Accessing: Web Interface: System -> Gateways -> Configuration

Client: MS Windows 11, Firefox, 134.0 (64-bit)

When pop-up is dismissed:
List of gateways shows: "No results found!"

And no gateways are listed.

Checking Systems -> Gateway -> Logs file : there are no logs

Checking Systems -> Gateway -> Groups : the 2 gateway group configurations (Ipv4 and IPv6) are listed.

However, even with the web interface error reported, the gateways both appear to be working.

Checking logs on OPNSense:
/var/log/lighttpd/lighttpd_202501??.log :
<30>1 2025-01-13T22:48:24-08:00 HOST_REDACTED lighttpd 3347 - [meta sequenceId="9"] 192.0.2.22 HOST_REDACTED - [13/Jan/2025:22:48:23 -0800]  "POST /api/routing/settings/searchGateway/ HTTP/2.0" 500 58 "https://HOST_REDACTED/ui/routing/configuration" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:134.0) Gecko/20100101 Firefox/134.0"

HTTP Status 500 agrees with server side issue, probably script issue (PHP.) Maybe an issue with parsing data from a configuration?

/var/log/audit/audit_202501??.log
<38>1 2025-01-13T22:48:23-08:00 HOST_REDACTED configd.py 2995 - [meta sequenceId="5"] action allowed interface.list.ifconfig for user root
<38>1 2025-01-13T22:48:24-08:00 HOST_REDACTED configd.py 2995 - [meta sequenceId="6"] action allowed interface.gateways.status for user root
<38>1 2025-01-13T22:48:24-08:00 HOST_REDACTED configd.py 2995 - [meta sequenceId="7"] action allowed system.status for user root

/var/log/configd/configd_202501??.log
<13>1 2025-01-13T22:48:23-08:00 HOST_REDACTED configd.py 2995 - [meta sequenceId="5"] [8fd462db-f97d-410a-a789-016930d11bed] request ifconfig
<13>1 2025-01-13T22:48:24-08:00 HOST_REDACTED configd.py 2995 - [meta sequenceId="6"] [150d18bd-c7f5-4a7e-b20d-a85cbfd3ce30] list gateway status
<13>1 2025-01-13T22:48:24-08:00 HOST_REDACTED configd.py 2995 - [meta sequenceId="7"] [9abbbdd0-0e86-43f0-a860-2f0e44f0d723] system status

Since the web-side has an HTTP Status 500, it is probably not a client issue, so changing to a different web browser probably won't change the server HTTP Status.

We completed a reboot after upgrade to 24.7.11_2 even though it was not required.

Health Audit doesn't show any obvious issues.

Security audit: "0 problem(s) in 0 installed package(s) found."

Connectivity audit was fine.

Suggestions on what to check next to diagnose this issue?

TIA!


Can you look for the PHP_ERROR (I dont know if its exactly called like that) file in:

/tmp/

And look for the last entries? If you post them we could see what the error was.
Hardware:
DEC740

Sure:

/tmp/PHP_errors.log :

[13-Jan-2025 23:26:25 America/Los_Angeles] TypeError: array_key_exists(): Argument #2 ($array) must be of type array, null given in /usr/local/opnsense/mvc/app/controllers/OPNsense/Routing/Api/SettingsController.php:116
Stack trace:
#0 /usr/local/opnsense/mvc/app/controllers/OPNsense/Routing/Api/SettingsController.php(116): array_key_exists('ixl2', NULL)
#1 /usr/local/opnsense/mvc/app/library/OPNsense/Mvc/Dispatcher.php(166): OPNsense\Routing\Api\SettingsController->searchGatewayAction()
#2 /usr/local/opnsense/mvc/app/library/OPNsense/Mvc/Router.php(156): OPNsense\Mvc\Dispatcher->dispatch(Object(OPNsense\Mvc\Request), Object(OPNsense\Mvc\Response), Object(OPNsense\Mvc\Session))
#3 /usr/local/opnsense/mvc/app/library/OPNsense/Mvc/Router.php(139): OPNsense\Mvc\Router->performRequest(Object(OPNsense\Mvc\Dispatcher))
#4 /usr/local/opnsense/www/api.php(36): OPNsense\Mvc\Router->routeRequest('/api/routing/se...', Array)
#5 {main}


From command line...
Running "ifconfig" to check interfaces shows there is an interface "ixl2" and "netstat -rn | grep ixl2" shows routes for that interface.

Using WebUI, Interfaces -> The Interface that has "ixl2" shows
# Identifier    opt5
# Device    ixl2

That interface is configured as enabled and working : traffic is being passed to and from that interface.

HTH


Can you execute this from the shell?


/usr/local/opnsense/scripts/routes/gateway_status.php

Please post the array it returns. When it returns nothing or ERR please look at the PHP_errors.log again and post the error.
Hardware:
DEC740

Quote from: Monviech (Cedrik) on January 14, 2025, 01:58:03 PMCan you execute this from the shell?


/usr/local/opnsense/scripts/routes/gateway_status.php

Please post the array it returns. When it returns nothing or ERR please look at the PHP_errors.log again and post the error.


# Results of shell call: "/usr/local/bin/php /usr/local/opnsense/scripts/routes/gateway_status.php" :

[{"name":"GW_v4_ISP_NAME","address":"$IPV4_ADDRESS","status":"none","loss":"~","delay":"~","stddev":"~","monitor":"$IPV4_ADDRESS_2","status_translated":"Online"},{"name":"GW_v6_ISP_NAME","address":"$IPV6_ADDRESS","status":"none","loss":"0.0 %","delay":"1.0 ms","stddev":"0.2 ms","monitor":"$IPV6_ADDRESS","status_translated":"Online"},{"name":"EXTERNAL_GW","address":"~","status":"none","loss":"~","delay":"~","stddev":"~","monitor":"~","status_translated":"Online"},{"name":"INTERNAL_INTERFACE_GW_V6","address":"$INTERNAL_IPv6_ADDRESS","status":"none","loss":"~","delay":"~","stddev":"~","monitor":"~","status_translated":"Online"},{"name":"INTERNAL_INTERFACE_GW_V4","address":"$INTERNAL_IPv4_ADDRESS","status":"none","loss":"~","delay":"~","stddev":"~","monitor":"~","status_translated":"Online"},{"name":"INTERNAL_MANAGEMENT_INTERFACE","address":"$INTERNAL_IPv6_ADDRESS_MANAGEMENT","status":"none","loss":"~","delay":"~","stddev":"~","monitor":"~","status_translated":"Online"}]

# Happy exit status:
echo $?
0


Values replaced in the above to not expose ISP and IP Addresses and names given to interfaces that might indicate services:

# GW_v4_ISP_NAME : was a different literal string, but same format.

# $IPV4_ADDRESS : not as printed... it is a literal IPv4 address for the IPv4 default gateway. When issuing "netstat -rn" the IPv4 address for this shows up as "default            $IPV4_ADDRESS      UGS       ixl12" where that is an actual IPv4 address not a string/variable name

# $IPV4_ADDRESS_2 : not as printed... it is a literal IPv4 address beyond the gateway, used to monitor gateway.

# GW_v6_ISP_NAME : was a different literal string, but same format.

# $IPV6_ADDRESS : not as printed... it is a literal IPv6 address for the IPv6 default gateway. When issuing "netstat -rn" the IPv6 address for this shows up as "default            $IPV6_ADDRESS      UGHS       ixl12" where that is an actual IPv6 address not a string/variable name

# INTERNAL_INTERFACE_GW_V6 : different literal string, but same format for an internal interface routing table entry on how to route to a specific subnet.

# $INTERNAL_IPv6_ADDRESS : not as printed... it is a literal IPv6 address for an internal network / subnet handled by another internal routing device.

# INTERNAL_INTERFACE_GW_V4 : different literal string, but same format for an internal interface routing table entry on how to route to a specific subnet.

# $INTERNAL_IPv4_ADDRESS : not as printed... it is a literal IPv4 address for an internal network / subnet handled by another internal routing device.

# INTERNAL_MANAGEMENT_INTERFACE : different literal string, but same format for an internal interface subnet for management services.

# $INTERNAL_IPv6_ADDRESS_MANAGEMENT : not as printed... it is a literal IPv6 address for an internal network / subnet for management services.



# In the real output, I do not see the "ifconfig" listed interface names for these interfaces. The output of that from the command-line shows interface with strings using upper-case characters with underscores which match names assigned to interface in OPNSense. Visiting any of these upper-case-with-underscore in the WebUI of OPNSense (Example Interfaces -> INTERNAL_MANAGEMENT_INTERFACE in left frame) would show content (in right frame) that shows information for "INTERNAL_MANAGEMENT_INTERFACE" like "Device $actual_ifconfig_name_of_interface" where "$actual_ifconfig_name_of_interface" is just a variable to describe the ifconfig listed interface name literal value like "ixl0" or "ixl2" or "ixl3" etc.



# Issuing a "tail -F /tmp/PHP_errors.log" then running the above command shows ZERO new entries. No errors logged by PHP.

# Changing to use "sh" and then re-run the above command as:
/usr/local/bin/php /usr/local/opnsense/scripts/routes/gateway_status.php > /tmp/php-gateway-stdout 2> /tmp/php-gateway-stderr

# Then: "ls -l /tmp/php-gateway-stdout /tmp/php-gateway-stderr" shows a zero-length stderr file (no stderr sent) and a non-zero sized stdout file. No stderr content provided when running this from sh or csh.

LMK if you need more information.

Thanks!

Appended information:
I just upgraded to Firefox 134.0.1 (64-bit) but no change.
(No change was expected with server logging HTTP Status code of "500" but thought I would mention this one change.)

Sorry I forgot to ask for

configctl interface gateways status
Even though you get an array from the above php script it executes, maybe this casts ERR?
Hardware:
DEC740

Quote from: Monviech (Cedrik) on January 15, 2025, 08:30:07 AMSorry I forgot to ask for

configctl interface gateways status
Even though you get an array from the above php script it executes, maybe this casts ERR?




# Running "configctl interface gateways status"

[{"name":"GW_v4_ISP_NAME","address":"$IPV4_ADDRESS","status":"none","loss":"~","delay":"~","stddev":"~","monitor":"$IPV4_ADDRESS_2","status_translated":"Online"},{"name":"GW_v6_ISP_NAME","address":"$IPV6_ADDRESS","status":"none","loss":"0.0 %","delay":"1.0 ms","stddev":"0.2 ms","monitor":"$IPV6_ADDRESS","status_translated":"Online"},{"name":"EXTERNAL_GW","address":"~","status":"none","loss":"~","delay":"~","stddev":"~","monitor":"~","status_translated":"Online"},{"name":"INTERNAL_INTERFACE_GW_V6","address":"$INTERNAL_IPv6_ADDRESS","status":"none","loss":"~","delay":"~","stddev":"~","monitor":"~","status_translated":"Online"},{"name":"INTERNAL_INTERFACE_GW_V4","address":"$INTERNAL_IPv4_ADDRESS","status":"none","loss":"~","delay":"~","stddev":"~","monitor":"~","status_translated":"Online"},{"name":"INTERNAL_MANAGEMENT_INTERFACE","address":"$INTERNAL_IPv6_ADDRESS_MANAGEMENT","status":"none","loss":"~","delay":"~","stddev":"~","monitor":"~","status_translated":"Online"}]


# And check exist status to see it is happy:
echo $?
0


# Same substitutions as earlier post:

GW_v4_ISP_NAME : was a different string, but same format.

$IPV4_ADDRESS : not as printed... it is a literal IPv4 address for the IPv4 default gateway. When issuing "netstat -rn" the IPv4 address for this shows up as "default            $IPV4_ADDRESS      UGS       ixl12" where that is an actual IPv4 address not a string/variable name

$IPV4_ADDRESS_2 : not as printed... it is a literal IPv4 address beyond the gateway, used to monitor gateway.

GW_v6_ISP_NAME : was a different string, but same format.

$IPV6_ADDRESS : not as printed... it is a literal IPv6 address for the IPv6 default gateway. When issuing "netstat -rn" the IPv6 address for this shows up as "default            $IPV6_ADDRESS      UGHS       ixl12" where that is an actual IPv6 address not a string/variable name

INTERNAL_INTERFACE_GW_V6 : different string, but same format for an internal interface routing table entry on how to route to a specific subnet.

$INTERNAL_IPv6_ADDRESS : not as printed... it is a literal IPv6 address for an internal network / subnet handled by another internal routing device.

INTERNAL_INTERFACE_GW_V4 : different string, but same format for an internal interface routing table entry on how to route to a specific subnet.

$INTERNAL_IPv4_ADDRESS : not as printed... it is a literal IPv4 address for an internal network / subnet handled by another internal routing device.

INTERNAL_MANAGEMENT_INTERFACE : different string, but same format for an internal interface subnet for management services.

$INTERNAL_IPv6_ADDRESS_MANAGEMENT : not as printed... it is a literal IPv6 address for an internal network / subnet for management services.



Also, switched from csh to sh and ran: "configctl interface gateways status > /tmp/configctl-gw-status_stdout 2> /tmp/configctl-gw-status_stderr"

The "stderr" file was empty, while stdout had the data. Exit status was still 0.
Nothing new was added to "/tmp/PHP_errors.log"

When I visit the OPNSense Web UI System -> Gateways -> Configuration : I still get the error mentioned earlier and the "/tmp/PHP_errors.log" gets a new line.


More data?

When I visit the Web UI Lobby -> Dashboard, there is a panel for Gateways which is displayed, showing the gateway summary information. No errors on screen in Web UI, and nothing logged in the /tmp/PHP_errors.log

The "dashboard" panel for "gateways" summary shows information for networks with the OPNsense named interfaces for:
GW_v4_ISP_NAME
GW_v6_ISP_NAME
INTERNAL_INTERFACE_GW_V6
INTERNAL_INTERFACE_GW_V4
INTERNAL_MANAGEMENT_INTERFACE
(All populated with address and monitoring details for (assuming) when monitoring is enabled)

But it also shows another OPNSense named interface with no information:
EXTERNAL_GW 

Instead of IP address and other details following "EXTERNAL_GW" in the dashboard gateway panel, it shows "~"  (tilde)

Maybe there was once an interface named "EXTERNAL_GW" but it was removed but is somehow still present without settings?

It seems represented in the array as this "element"
{"name":"EXTERNAL_GW","address":"~","status":"none","loss":"~","delay":"~","stddev":"~","monitor":"~","status_translated":"Online"},

Clicking on any of the gateway links in this gateway panel on the dashboard attempts to take me to System -> Gateways -> Configuration which results in the same error mentioned above, and a new logged entry in the /tmp/PHP_errors.log

LMK if you need more log info.

Thanks!


Can you try a different browser or incognito mode? If the PHP error reappears each time it's a bit strange the backend call that is supposed to be failing is working fine. Otherwise the browser likes to cache an old invalid response for whatever reason...


Cheers,
Franco

Quote from: franco on January 15, 2025, 10:19:21 AMCan you try a different browser or incognito mode? If the PHP error reappears each time it's a bit strange the backend call that is supposed to be failing is working fine. Otherwise the browser likes to cache an old invalid response for whatever reason...

Cheers,
Franco
[/q0uote]

I've already tried clearing cache and all cookies for the domain used by this OPNSense install, then restarted the Firefox browser, and see the same problem.

Using incognito mode also provided no difference.

I tested with Linux-based Firefox instead of windows based Firefox, and same problem, too.
That Firefox browser had never visited the OPNSence service.

AFAIK,

It looks like, in order to try a browser other than Firefox, my choices are:
 * Some Chromium-based browser (Chrome, Edge, Brave, etc.)
 * Safari


Chromium-based browsers and Safari abandoned support for SECP521 host certs, so i will need to see if my boss will agree to downgrade to a SECP384 host cert so browsers with weaker host cert support can connect with something other than what works for Firefox. Then if that is approved, get a CSR, get it signed, install a new host cert, install it, and see if I am able to reproduce this issue in other browsers.

IIRC, Safari and Chromium-based browsers are fine with a CA that uses P-521, but not host certs.

This may take a while....


Let's go by the code here because this is important:

https://github.com/opnsense/core/blob/18506613353f72abde8f9686dabbe7d986dfd464/src/opnsense/mvc/app/controllers/OPNsense/Routing/Api/SettingsController.php#L116

array_key_exists() complains about $ifconfig being null, so the assignment of $ifconfig is here:

https://github.com/opnsense/core/blob/18506613353f72abde8f9686dabbe7d986dfd464/src/opnsense/mvc/app/controllers/OPNsense/Routing/Api/SettingsController.php#L56

So my guess is json_decode() may fail and it would be very important to know the exact return value of configctl in order to be able to catch the problem here.  Feel free to send a PM.

As far as the output you posted it decodes fine and creates an array representation of the data which is not null.

https://www.php.net/manual/en/function.json-decode.php#refsect1-function.json-decode-returnvalues

"null is returned if the json cannot be decoded or if the encoded data is deeper than the nesting limit."


Cheers,
Franco

Oh and now I see we should have focused on:

# configctl interface list ifconfig


Sorry,
Franco

Quote from: franco on January 15, 2025, 01:20:13 PMOh and now I see we should have focused on:

# configctl interface list ifconfig


Sorry,
Franco

No need to be sorry: working towards a source of troubles is a complicated process where even ideal tests yield 50/50 chance: either identifying an area to investigate or rule out 50% of possible causes. I appreciate all of your help! Thanks!

 

(The new command) Good, a possible clue: the call to "configctl interface list ifconfig" is empty:

[no content appears, just a single vertical whitespace (appears to be '\012' or ^J or control-j]



Checking exit status after running that "echo $?" show a happy exit status of "0"


Moving from csh to sh and " configctl interface list ifconfig > /tmp/config-int-list_stdout 2> /tmp/config-int-list_stderr" and the stderr file is empty, while the stdout has a single character ('\012')


Attempting to "tail -F" of "/tmp/PHP_errors.log" when calling this results in no new/added content to "/tmp/PHP_errors.log"



I'm still waiting for my boss to approve switching host cert so I can test from other browsers.

Ok, no need for browser things I think we are getting closer.

# /usr/local/sbin/pluginctl -D

This is empty as well?

You can cycle through each interface on your system to find the "bad" one, e.g.

 # /usr/local/sbin/pluginctl -D wg1

A list of interfaces you get get using:

# ifconfig -l

I'm sure there is one that's causing all the trouble.


Cheers,
Franco

Quote from: franco on January 15, 2025, 07:49:50 PMOk, no need for browser things I think we are getting closer.

# /usr/local/sbin/pluginctl -D

This is empty as well?

You can cycle through each interface on your system to find the "bad" one, e.g.

 # /usr/local/sbin/pluginctl -D wg1

A list of interfaces you get get using:

# ifconfig -l

I'm sure there is one that's causing all the trouble.


Cheers,
Franco

Command "/usr/local/sbin/pluginctl -D" output is like the previous command output, "empty," with a single whitespace '\012' or ^j or control-j

exit status is also happy "0"

No additions to "/tmp/PHP_errors.log"

Switching from csh to sh and:

"for i in `ifconfig -l` ; do echo "# $i :" ; echo -n "## ifconfig:" ; ifconfig $i ; echo "## pluginctl:" ; /usr/local/sbin/pluginctl -D $i ; echo " * ES: $?" ; done > /tmp/plugin-int_stdout 2> /tmp/plugin-int_stderr"

There are 54 interfaces (from real, to VLAN and more) , but I am reluctant to include all,  so I'll only include comments for interfaces which DO HAVE one or more IP addresses assigned according to "ifconfig", but where "/usr/local/sbin/pluginctl -D $interfacename" shows empty content, skipping interfaces that have no configuration and also provide no output to the pluginctl command. (This is based on the assumption that interfaces not yet configured with OPNSense would also show no configurations when running "/usr/local/sbin/pluginctl -D $i" against them.)



Several found. Here is an example of the format for output:

# ixl$INT:
## ifconfig:ixl$INT: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu $INTEGER_1500_or_9000
        description: OPNSENSE_INTERFACE_NAME (opt[0-9]*)
        options=4e507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG>
        ether $UNIQUE_MAC_ADDRESS
        inet $IPV4_ADDRESS_1 netmask 0x[a-f0-9]* broadcast $IPV4_ADDRESS_1_BROACAST
        inet6 $IPV6_ADDRESS_1%ixl$INT prefixlen $IPV6_PREFIX_1 scopeid 0x5
        inet6 $IPV6_ADDRESS_2 prefixlen $IPV6_PREFIX_2
        media: Ethernet autoselect (10Gbase-T <full-duplex>)
        status: active
        nd6 options=121<PERFORMNUD,AUTO_LINKLOCAL,NO_DAD>
## pluginctl:
(Empty except whitespace '\012' or ^J or control-j)

IP Addresses and prefix/subnet mask are different, as are the OPNSENSE_INTERFACE_NAME and the opt[0-9]* name. Also, some are set with JUMBO option when MTU is 9000.

Interfaces where this is the case: IP addresses assigned to an interface, but "/usr/local/sbin/pluginctl -D $interface_name" shows no output:
# ixl0 :
# ixl1 :
# ixl2 :
# ixl3 :
# ixl4 :
# ixl5 :
# ixl6 :
# ixl7 :
# ixl8 :
# ixl9 :
# ixl10 :
# ixl11 :
# ixl16 :
# ixl17 :

Those with no config reported by "/usr/local/sbin/pluginctl -D $interface_name"  but also have no IP addresses assigned to them according to ifconfig:
# ixl18 :
# ixl19 :

Possibly useful information:
Each of these are 4 port NIC: [1] ixl0 - ixl3 , [2] ixl4 - ixl7 , [3] ixl8 - ixl11 , [4] ixl12 - ixl15 , [5] ixl16 - ixl19. Of these 5 NIC, only the NIC with  ixl12 - ixl15 shows configuration with "/usr/local/sbin/pluginctl -D $interface_name" while the rest provide the "empty" result.

It is likely some of those NIC without configuration reported but ifconfig reports IP addresses set have been replaced as a result of NIC hardware failure.

ixl14 has no IP addresses assigned to it according to ifconfig and is "no carrier" but it "/usr/local/sbin/pluginctl -D $interface_name" still reports a configuration for it.

Most interfaces (including VLAN, and more) that have one or more configured IP addresses include content output with the "/usr/local/sbin/pluginctl -D $interface_name"  command.

When I use a web browser to OPNSense visiting the Web UI and look at one of the interfaces that show empty output when running "/usr/local/sbin/pluginctl -D $interface_name"  ,  I can see interface configuration settings for these interfaces, but "/usr/local/sbin/pluginctl -D $interface_name"  either doesn't have those settings, or is unable to provide them.

LMK if you need more data.

Thanks!