Upgrade to 24.1.3 (HA Backup Node) -> Unbound crash

Started by patient0, March 06, 2024, 09:59:36 PM

Previous topic - Next topic
Maybe a heath audit would help shed some light on the situation.


Cheers,
Franco

March 21, 2024, 10:09:49 AM #16 Last Edit: March 22, 2024, 10:04:40 AM by patient0
Unfortunately a health check didn't give any clues:

***GOT REQUEST TO AUDIT HEALTH***
Currently running OPNsense 24.1.4 at Thu Mar 21 08:45:43 UTC 2024
>>> Root file system: /dev/gpt/rootfs
>>> Check installed kernel version
Version 24.1.2 is correct.
>>> Check for missing or altered kernel files
No problems detected.
>>> Check installed base version
Version 24.1.2 is correct.
>>> Check for missing or altered base files
No problems detected.
>>> Check installed repositories
OPNsense
>>> Check installed plugins
os-theme-vicuna 1.45_1
os-zerotier 1.3.2_4
>>> Check locked packages
No locks found.
>>> Check for missing package dependencies
Checking all packages: .......... done
>>> Check for missing or altered package files
Checking all packages: .......... done
>>> Check for core packages consistency
Core package "opnsense" has 68 dependencies to check.
Checking packages: ..................................................................... done
***DONE***


Next I'll remove the zerotier plugin, just to make sure. (Update: removing zerotier didn't help)

Addition: I reinstalled the HA Backup server and restored the config, no more issue. Maybe now I can compare the Master <> Backup and see what different.

Quote from: patient0 on March 20, 2024, 05:52:19 PM
Quote from: franco on March 18, 2024, 02:10:33 PM
Did you guys install anything from the ports tree manually?

Sorry, missed your reply: No, I don't have ports installed at all (and 24.1.4 didn't change anything)

I came to this problem as well. And no, I did not install anything from ports. The problem rose after upgrading from 24.1.4 to 24.1.5.

Ok, let's try to find where this is coming from, does reinstalling py-idna make a difference?

pkg install -f py39-idna-3.6


If it doesn't, can you try to start python manually and run the following commands? (in python3)


import importlib.metadata
importlib.metadata.version('idna')

import idna
idna.__path__



I expect the second line to crash with the same error as reported earlier, then we will import idna to check its actual location.


Quote from: AdSchellevis on April 06, 2024, 11:03:36 AM
I expect the second line to crash with the same error as reported earlier, then we will import idna to check its actual location.

Thanks AdSchellevis for looking into it. The situation for me is right now:

  • HA master not touched, was left with the issue
  • HA slave was reset to factory defaults and restored the config, which solved it
Now, reinstaledl the package and run the commands manually.

The second line didn't crash (I guess) on the master but didn't print a version:
root@OPNsense-1:~ # pkg install -f py39-idna-3.6
Updating OPNsense repository catalogue...
OPNsense repository is up to date.
All repositories are up to date.
Checking integrity... done (0 conflicting)
The following 1 package(s) will be affected (of 0 checked):

Installed packages to be REINSTALLED:
        py39-idna-3.6

Number of packages to be reinstalled: 1

Proceed with this action? [y/N]: y
[1/1] Reinstalling py39-idna-3.6...
[1/1] Extracting py39-idna-3.6: 100%
root@OPNsense-1:~ # python3
Python 3.9.18 (main, Apr  3 2024, 19:59:28)
[Clang 14.0.5 (https://github.com/llvm/llvm-project.git llvmorg-14.0.5-0-gc1238 on freebsd13
Type "help", "copyright", "credits" or "license" for more information.
>>> import importlib.metadata
>>> importlib.metadata.version('idna')
>>>
>>> import idna
>>> idna.__path__
['/usr/local/lib/python3.9/site-packages/idna']


On the reset-and-restored slave the idna version is correctly displayed:
root@OPNsense-2:~ # pkg install -f py39-idna-3.6
Updating OPNsense repository catalogue...
OPNsense repository is up to date.
All repositories are up to date.
The following 1 package(s) will be affected (of 0 checked):

Installed packages to be REINSTALLED:
        py39-idna-3.6

Number of packages to be reinstalled: 1

65 KiB to be downloaded.

Proceed with this action? [y/N]: y
[1/1] Fetching py39-idna-3.6.pkg: 100%   65 KiB  66.8kB/s    00:01
Checking integrity... done (0 conflicting)
[1/1] Reinstalling py39-idna-3.6...
[1/1] Extracting py39-idna-3.6: 100%
root@OPNsense-2:~ # python3
Python 3.9.18 (main, Apr  3 2024, 19:59:28)
[Clang 14.0.5 (https://github.com/llvm/llvm-project.git llvmorg-14.0.5-0-gc1238 on freebsd13
Type "help", "copyright", "credits" or "license" for more information.
>>> import importlib.metadata
>>> importlib.metadata.version('idna')
'3.6'
>>>
>>> import idna
>>> idna.__path__
['/usr/local/lib/python3.9/site-packages/idna']

Quote from: AdSchellevis on April 06, 2024, 11:03:36 AM
Ok, let's try to find where this is coming from, does reinstalling py-idna make a difference?

pkg install -f py39-idna-3.6


If it doesn't, can you try to start python manually and run the following commands? (in python3)


import importlib.metadata
importlib.metadata.version('idna')

import idna
idna.__path__



I expect the second line to crash with the same error as reported earlier, then we will import idna to check its actual location.


# pkg install -f py39-idna-3.6
Updating OPNsense repository catalogue...
OPNsense repository is up to date.
All repositories are up to date.
Checking integrity... done (0 conflicting)
The following 1 package(s) will be affected (of 0 checked):

Installed packages to be REINSTALLED:
py39-idna-3.6

Number of packages to be reinstalled: 1

Proceed with this action? [y/N]: y
[1/1] Reinstalling py39-idna-3.6...
[1/1] Extracting py39-idna-3.6: 100%

# python3.9
Python 3.9.18 (main, Apr  3 2024, 19:59:28)
[Clang 14.0.5 (https://github.com/llvm/llvm-project.git llvmorg-14.0.5-0-gc1238 on freebsd13
Type "help", "copyright", "credits" or "license" for more information.
>>> import importlib.metadata
>>> importlib.metadata.version('idna')
>>>
>>> import idna
>>> idna.__path__
['/usr/local/lib/python3.9/site-packages/idna']


but on these machines (after reinstalling the package), is unbound still crashing if you stop/start it from gui?

Quote from: AdSchellevis on April 07, 2024, 11:40:48 AM
but on these machines (after reinstalling the package), is unbound still crashing if you stop/start it from gui?

In my case Unbound still crashes after stop/start, yes.

ok, very weird, it looks like this only happens inside the chroot environment for unbound in that case.
Can you run the following for me and report back the output?


find /var/unbound/usr/local/lib/python3.9 -name "*idna*"


Quote from: AdSchellevis on April 08, 2024, 06:34:12 PM

find /var/unbound/usr/local/lib/python3.9 -name "*idna*"


Here is the output the command run on the HA master where the issue exists:
root@OPNsense-1:~ # find /var/unbound/usr/local/lib/python3.9 -name "*idna*" | sort
/var/unbound/usr/local/lib/python3.9/encodings/__pycache__/idna.cpython-39.opt-1.pyc
/var/unbound/usr/local/lib/python3.9/encodings/__pycache__/idna.cpython-39.opt-2.pyc
/var/unbound/usr/local/lib/python3.9/encodings/__pycache__/idna.cpython-39.pyc
/var/unbound/usr/local/lib/python3.9/encodings/idna.py
/var/unbound/usr/local/lib/python3.9/site-packages/idna
/var/unbound/usr/local/lib/python3.9/site-packages/idna-3.4.dist-info
/var/unbound/usr/local/lib/python3.9/site-packages/idna-3.6.dist-info
/var/unbound/usr/local/lib/python3.9/site-packages/idna/__pycache__/idnadata.cpython-39.opt-1.pyc
/var/unbound/usr/local/lib/python3.9/site-packages/idna/__pycache__/idnadata.cpython-39.pyc
/var/unbound/usr/local/lib/python3.9/site-packages/idna/idnadata.py


And the output of the command run on the HA slave where there's no issue:

root@OPNsense-2:~ # find /var/unbound/usr/local/lib/python3.9 -name "*idna*" | sort
/var/unbound/usr/local/lib/python3.9/encodings/__pycache__/idna.cpython-39.opt-1.pyc
/var/unbound/usr/local/lib/python3.9/encodings/__pycache__/idna.cpython-39.opt-2.pyc
/var/unbound/usr/local/lib/python3.9/encodings/__pycache__/idna.cpython-39.pyc
/var/unbound/usr/local/lib/python3.9/encodings/idna.py
/var/unbound/usr/local/lib/python3.9/site-packages/idna
/var/unbound/usr/local/lib/python3.9/site-packages/idna-3.6.dist-info
/var/unbound/usr/local/lib/python3.9/site-packages/idna/__pycache__/idnadata.cpython-39.opt-1.pyc
/var/unbound/usr/local/lib/python3.9/site-packages/idna/__pycache__/idnadata.cpython-39.pyc
/var/unbound/usr/local/lib/python3.9/site-packages/idna/idnadata.py


The {/var/unbound}/usr/local/lib/python3.9/site-packages/idna-3.4.dist-info was empty. And deleting /usr/local/lib/python3.9/site-packages/idna-3.4.dist-info resolved issue for me.

Thanks a lot for your help!

I am still a bit lost on why this operates differently from within unbound, but glad this worked for you.

At least we know where to look next time, a structural fix is likely more difficult as we don't know when this got stuck, pkg should have removed the old one with the upgrade (likely some time ago).

Thanks for reporting back,

Best regards,

Ad

There are a ton of other left-over package directories in that folder (and none on the other OPNsense instance). Maybe a failed update attempt, although I don't remember having one (but I'm old so that doesn't say much).

Does Python have a way to verify that there are no old leftover site-packages?

root@OPNsense-1:/usr/local/lib/python3.9/site-packages # ls
Babel-2.12.1-py3.9.egg-info cffi-1.16.0.dist-info idna requests-2.31.0-py3.9.egg-info
Babel-2.14.0-py3.9.egg-info charset_normalizer idna-3.6.dist-info service_identity
Bottleneck-1.3.8-py3.9.egg-info charset_normalizer-3.2.0-py3.9.egg-info jinja2 service_identity-23.1.0.dist-info
Cython-0.29.36-py3.9.egg-info charset_normalizer-3.3.2-py3.9.egg-info markupsafe setuptools
Jinja2-3.1.3-py3.9.egg-info cryptography netaddr setuptools-63.1.0-py3.9.egg-info
MarkupSafe-2.1.5-py3.9.egg-info cryptography-3.4.8-py3.9.egg-info netaddr-0.8.0-py3.9.egg-info six-1.16.0-py3.9.egg-info
OpenSSL cryptography-42.0.5.dist-info netaddr-1.2.1.dist-info six.py
PySocks-1.7.1-py3.9.egg-info dateutil numexpr sniffio
PyYAML-6.0-py3.9.egg-info distutils-precedence.pth numexpr-2.8.4-py3.9.egg-info sniffio-1.3.1.dist-info
PyYAML-6.0.1-py3.9.egg-info dns numexpr-2.9.0-py3.9.egg-info socks.py
README.txt dnspython-2.4.0.dist-info numpy sockshandler.py
__pycache__ dnspython-2.6.1.dist-info numpy-1.25.0-py3.9.egg-info socksio
_cffi_backend.cpython-39.so duckdb outcome socksio-1.0.0-py3.9.egg-info
_distutils_hack duckdb-0.10.1-py3.9.egg-info outcome-1.2.0-py3.9.egg-info sortedcontainers
_unbound.so duckdb-stubs outcome-1.3.0.post0-py3.9.egg-info sortedcontainers-2.4.0-py3.9.egg-info
_yaml easy-install.pth packaging trio
adbc_driver_duckdb easy-install.pth.dist packaging-23.2.dist-info trio-0.22.2-py3.9.egg-info
aioquic exceptiongroup pandas trio-0.25.0.dist-info
aioquic-0.9.21.dist-info exceptiongroup-1.1.2.dist-info pandas-2.0.3.dist-info typing_extensions-4.10.0.dist-info
aioquic-0.9.25.dist-info exceptiongroup-1.2.0.dist-info pkg_resources typing_extensions.py
anyio h11 pyOpenSSL-21.0.0-py3.9.egg-info tzdata
anyio-3.7.1.dist-info h11-0.14.0-py3.9.egg-info pyOpenSSL-23.2.0-py3.9.egg-info tzdata-2023.3.dist-info
anyio-4.3.0.dist-info h2 pyasn1 tzdata-2024.1.dist-info
async_generator h2-4.0.0-py3.9.egg-info pyasn1-0.5.0.dist-info ujson-5.8.0-py3.9.egg-info
async_generator-1.10-py3.9.egg-info h2-4.1.0-py3.9.egg-info pyasn1_modules ujson-5.9.0-py3.9.egg-info
attr hpack pyasn1_modules-0.3.0.dist-info ujson.cpython-39.so
attrs hpack-4.0.0-py3.9.egg-info pycparser unbound.py
attrs-23.2.0.dist-info httpcore pycparser-2.21-py3.9.egg-info unboundmodule.py
babel httpcore-0.17.3-py3.9.egg-info pylsqpack urllib3
bottleneck httpcore-1.0.5.dist-info pylsqpack-0.3.17.dist-info urllib3-1.26.16-py3.9.egg-info
certifi httpx pylsqpack-0.3.18.dist-info urllib3-1.26.18-py3.9.egg-info
certifi-2023.5.7-py3.9.egg-info httpx-0.24.1.dist-info python_dateutil-2.9.0-py3.9.egg-info vici
certifi-2024.2.2-py3.9.egg-info httpx-0.27.0.dist-info pytz vici-5.9.11-py3.9.egg-info
cffi hyperframe pytz-2024.1-py3.9.egg-info yaml
cffi-1.15.1-py3.9.egg-info hyperframe-6.0.0-py3.9.egg-info requests



I'm not aware of a way to identify the files that aren't needed, from python's perspective there is no register.

The most logical check would be a diff between this directory and the files registered in `pkg list`. Files not registered via pkg usually don't belong as we only use the package manager to install software.


Ok, I run a find for empty directories in site-packages and deleted them, that worked for me.

And since all that runs on Proxmox I did restore a backup of the HA master to OPNsense 23.7.

  • The mess with old versions in the site-packages folder was already there at that point (idna-3.4 etc)
  • Upgrading to 24.1.2_1 did not show the issue, the old versions where of course still there. Restarting OPNsense and or Unbound worked well
  • Upgradiing from 24.1.2_1 to the lastest 24.1.5_3 (but it 24.1.4 had the issue already) and that's when the issue started