NUT package brocken?

Started by henningkessler, June 10, 2022, 03:17:34 PM

Previous topic - Next topic
I am not really sure when this did happen, but today I realized that the nut package seams to broken for me on two installations (HA-mode version 22.1.8_1). Each of them has an APC UPS attached to them via USB. Both can't connect to the UPS but the same configuration worked for me a couple of versions ago when I configured them.
nut.conf
# Please don't modify this file as your changes might be overwritten with
# the next update.
#
MODE=standalone

ups.conf
# Please don't modify this file as your changes might be overwritten with
# the next update.
#
[ber0ups02]
driver=usbhid-ups
port=auto
ignorelb
default.battery.runtime.low = 300
default.battery.charge.low = 25


which I run
Quoteupsc -l
from the command line it hangs and when I try to start the service
# service nut start
Network UPS Tools - UPS driver controller 2.8.0
Network UPS Tools - Generic HID driver 0.47 (2.8.0)
USB communication driver (libusb 1.0) 0.43
interrupt pipe disabled (add 'pollonly' flag to 'ups.conf' to get rid of this message)
Can't claim USB device [051d:0003]@0/0: Other error
Driver failed to start (exit status=1)
/usr/local/etc/rc.d/nut: WARNING: failed precmd routine for nut

this is the result. it looks like there is a driver issue....

I see what may be a related error introduce around 22.1.7 or 22.1.8(_1)


Network UPS Tools - UPS driver controller 2.8.0
Network UPS Tools - Generic HID driver 0.47 (2.8.0)
USB communication driver (libusb 1.0) 0.43
libusb1: Could not open any HID devices: no USB buses found
No matching HID UPS found
Driver failed to start (exit status=1)


Guessing maybe there was a change in the usb library that could be causing the issue, but not sure how to confirm.

The apcupsd plugin works fine.  There's a thread about it somehwere here.

Quote from: pilotboy72 on June 10, 2022, 03:35:36 PM
Guessing maybe there was a change in the usb library that could be causing the issue, but not sure how to confirm.
Check the version of the NUT tools, head over to the NUT community on whatever platform they use, look and/or ask for any known issues with FreeBSD 13.

HTH,
Patrick
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

June 14, 2022, 07:19:58 AM #4 Last Edit: June 14, 2022, 07:48:55 AM by ZPrime
I've been having problems with this since 22.1.7, possibly a little earlier...

My first question / concern - is nut supposed to be running as the "uucp" user?

What I'm noticing, using a usbhid-ups driver, is that it chokes on access to a socket file that is supposed to live in /var/db/nut.

If I manually start that driver as root (/usr/local/libexec/nut/usbhid-ups -a <upsname> -DD -u root), it throws a complaint that the permissions on this socket file are wrong and that it has fixed them.

From that point forward, I can kill the debug copy, and then "restart" nut in the GUI and it works OK... but this doesn't survive a reboot.

So then I tried rm'ing the entire contents of /var/db/nut, and then restarting again... and this seems to work OK, but I suspect it too won't survive a reboot.

I've seen discussion elsewhere that the problem is permissions on the /dev tree (and specifically the USB entry for the UPS), but that did not need to be modified on my system.

I'm trying to figure out now how to permanently correct the permission issue so it survives a reboot...

[edit]
OK, it's not the /var/db/nut thing, permissions are fine by default.

But trying to run the usbhid-ups "nut driver" as root first is key. That forces the kernel USB HID driver to detach from the device, which then allows nut to bind to it, and everything works OK. Then you can kill off that process, get the GUI to start the driver as normal, and it all works.

Work-around:
usbconfig -d <usb device entry - mine is ugen8.2> detach_kernel_driver

After running that, you can kick the nut service in the webUI and everything works.
So, this is something weird going on with FBSD13 and the HID driver not wanting to let go of the device when nut asks for it; presumably somehow combined with permissions (maybe a change in how 13 handles this?)

Now I'm trying to see if there's a way to tell the kernel to just ignore the UPS device so I don't have to get it to release first...

Thanks a lot for your investigation! Even when it does not survive a reboot at least it ist running again and the attached devices get notified when the UPS kicks in....

June 14, 2022, 08:33:33 AM #6 Last Edit: June 14, 2022, 08:49:25 AM by ZPrime
Conversing with myself, but this felt like it should be a new post.

https://www.freebsd.org/cgi/man.cgi?query=usb_quirk&sektion=4&apropos=0&manpath=FreeBSD+13.1-RELEASE+and+Ports

You'll need to use usbcontrol to find your device's descriptor info, first.
usbconfig -d ugen0.2 dump_device_desc
ugen0.2: <American Power Conversion Smart-UPS 1500 FW:UPS 09.4 / ID18> at usbus0, cfg=0 md=HOST spd=FULL (12Mbps) pwr=ON (2mA)

  bLength = 0x0012
  bDescriptorType = 0x0001
  bcdUSB = 0x0200
  bDeviceClass = 0x0000  <Probed by interface class>
  bDeviceSubClass = 0x0000
  bDeviceProtocol = 0x0000
  bMaxPacketSize0 = 0x0040
  idVendor = 0x051d
  idProduct = 0x0003
  bcdDevice = 0x0106
  iManufacturer = 0x0001  <American Power Conversion >
  iProduct = 0x0002  <Smart-UPS 1500 FW:UPS 09.4 / ID=18>
  iSerialNumber = 0x0003  <xxxxx redacted  >
  bNumConfigurations = 0x0001

Note the "idVendor" and "idProduct" as you'll need these.

in /boot/loader.conf.local, you'll need to add, where x is a number starting at 0 (you may have other quirsk already)
hw.usb.quirk.x = "vendorID productID minprodVer maxprodver QUIRKNAME"

MinprodVer / maxProdVer are only important if you have multiple UPSes attached and are trying to be specific... otherwise, they can be "0" and "0xffff" for min and max (should catch all versions).

In our case, the problem is that the kernel is attaching the USB HID driver to the UPS, which Nut is having problems detaching. There's a "quirk" to tell the HID driver to ignore it... UQ_HID_IGNORE

Note that normally OPNsense would want us to set this "hw.usb.quirk.x" value in the GUI as a "System tunable," but I suspect that doing this won't work because tunables aren't loaded until after the kernel and modules are situated... which is too late. Option B would be to hack one of the nut startup scripts and call usbconfig and tell it to detach the kernel driver from the device (as I discovered in my earlier post)... but I like just telling the kernel to gtfo, especially since loader.conf.local should survive reboots and upgrades.

So, in my example, the final line added to /boot/loader.conf.local (which didn't exist on my system, so I created it) should be:
hw.usb.quirk.0 = "0x051d 0x0003 0 0xffff UQ_HID_IGNORE"

I can confirm that this has worked around my problem with nut + usbhid-ups (an APC) on 22.1.8_1, after a reboot. Nut loads correctly at boot and everything is working.

How do we fix this in OPNsense "correctly" instead of hacking around it? I dunno... maybe have nut run as root again instead of the uucp user? (I'm not sure why it's running as uucp in the first place?) Alternatively, figure out how to add the correct kernel-level permission to allow nut's user account to detach the kernel device driver without being UID 0. But my little work-around here is functional... just a pain in the butt since it requires manual intervention and is hard to script.  :P

OPNsense tunables do end up in loader.conf.local and are activated during kernel initialisation.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

When I add this tunable via the GUI its end up in loader.conf (not loader.conf.local) and "Type" is shown as "unsupported".
NRG Systems IPU675, Intel Core i7-7500U 2,7 GHz, 6xIntel i211AT Gigabit LAN, 16 GB RAM, 256 GB SSD

loader.conf is read even before loader.conf.local. "unsupported" is just OPNsense's way of telling you the system doesn't have a clue what this is supposed to be. In loader.conf it will be activated, anyway. Before loading of the kernel, respectively passed to the kernel at boot time.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

June 22, 2022, 08:21:56 AM #10 Last Edit: June 22, 2022, 08:25:00 AM by ZPrime
Quote from: pmhausen on June 14, 2022, 09:26:52 AM
OPNsense tunables do end up in loader.conf.local and are activated during kernel initialisation.

I appreciate the clarification, I wish this was documented more clearly in the official docs. I knew that "tunables" were just a way to set sysctl flags, but it wasn't obvious to me what the load order is. I assumed (incorrectly) that the process by which they were loaded happened after initial kernel load...

FWIW, I ended up moving to a "netclient" Nut config - I was passing through the USB device (running OPNsense virtualized on Proxmox VE). I just setup Nut on the PVE server directly and am connecting the OPNsense VM to it via Nut's "netclient" mode now instead.

It's probably all redundant in my case, since Proxmox should gracefully shutdown the OPNsense guest anyway... but I'd rather the guest shut itself down on its own terms.