Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - Werner Fischer

#1
Hi Franco,

I think there is an important comment here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=269133#c47 with a commit that should be maybe included in 23.7.4, too:
Quote
bnxt: Don't restart on VLAN changes
In rS360398, a new iflib device method was added with default of opt out
for VLAN events needing an interface reset.

This is unintentional for bnxt(4) and is causing another bug in its VLAN
initialization code to affect the common case of adding and removing
VLANs on an existing interface.

What do you think?
Best regards, Werner
#2
Regarding NICs:

  • Affected: NICs with Broadcom BCM574xx (NXE / Wh+) Chips, e.g. Broadcom P225P
  • Not affected: NICs with Broadcom BCM575xx (Thor) Chips, e.g. Broadcom P425g, P2100G

For an overview, which Chips are used in which NICs see https://www.thomas-krenn.com/de/wiki/Broadcom_Netzwerkkarten

Regarding Kevin Bowling's Patch: we currently do not know whether it fixes the problem. We have not compiled the patch. I assume other FreeBSD users might comment soon here in the Bugreport:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=269133#c34
#3
Franco has changed "unsupported" to "environment"
https://github.com/opnsense/core/commit/b1e270957c800

Thank you for this Franco :)
#4
I happened to notice that FreeBSD has SATA Link Power Management disabled by default.

This is also the case with the Linux vanilla Kernel so far. However, numerous Linux distributions (e.g. Ubuntu as of Linux kernel 5.15) enable this for mobile/embedded chipsets using "SATA_LPM_POLICY=3".

For the upstream Linux kernel "SATA_LPM_POLICY=3" was also already under discussion: https://www.spinics.net/lists/linux-ide/msg62918.html Due to a hot-plug problem with AMD-based systems (there the SATA controller is identical for EPYC and Ryzen - thus it can't be determined via the PCIe IDs of the SATA controller whether it is a mobile/embedded chipset) this has not been implemented yet.

What are the advantages of enabled SATA Link Power Management?


  • Reduced power consumption (approx. 1-1.8 watts for a system with one SATA SSD.)
  • Lower waste heat, cooler SSD controller chips (e.g. 30 °C lower temperature of the SSD controller for an ATP AF120GSTIC-T22). Especially in summer this can be very advantageous for small fanless firewall systems, if such systems are located in unair-conditioned rooms and possibly the sun shines on the device.

What are the disadvantages of enabled SATA Link Power Management?

  • Possible problems with hot-plug. This is not relevant for smaller/embedded firewalls with a M.2 SATA SSDs. Likewise, it does not affect rack systems that boot from M.2 SATA. The hot-plug limitation really only affects rack systems with hot-swap drive bays that have two SATA SSDs in a ZFS mirror.
  • Slightly increased SATA I/O latency. When the link is in "Partial" power saving state, this is max. 10 microseconds. In the "Slumber" power saving state, this is 10ms max (but this is only activated if no I/O happens for a long time). See SATA AHCI Spec https://www.intel.com/content/www/us/en/io/serial-ata/serial-ata-ahci-spec-rev1-3-1.html chapter 8.2 Power State Mappings (Partial - Phy logic is powered, but in a reduced state. Exit latency is no longer than 10µs / Slumber - Phy logic is powered, but in a reduced state. Exit latency can be up to 10ms)

Under FreeBSD there are several SATA link power management options available - see https://man.freebsd.org/cgi/man.cgi?ahci(4)

  • 0 -> Off (default)
  • 1 -> Device Initiated Power Management (DIPM)
  • 2 to 5 -> different options for Host Initiated Power Management (HIPM)

However, a combination of DIPM+HIPM would be ideal. This would achieve the highest energy savings. Linux has offered this since kernel 4.15 - see https://hansdegoede.livejournal.com/18412.html and https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f4ac6476945ff62939420bcf8266e39f8d5d54bd. This emulates the functionality of the Intel RST driver from Windows, which is explained in detail in this PDF: https://www-ssl.intel.com/content/dam/doc/reference-guide/sata-devices-implementation-recommendations.pdf. In the mid and long term, it would be ideal to implement this functionality (combination of DIPM+HIPM) in FreeBSD as well. However, this is beyond my qualifications. For now, it would already bring a lot to support DIPM.

Currently it is possible to enable DIPM for a SSD via Tuneable: hint.ahcich.0.pm_level = 1 (Attention: a system can have multiple SATA ports - if in doubt set the value for all of them). I have documented all details about this: https://www.thomas-krenn.com/en/wiki/Activate_SATA_Link_Power_Management_in_OPNsense

My questions to you in the OPNsense development:

  • (for OPNsense 23.7): Currently OPNsense shows at tuneables hint.ahcich.0.pm_level, hint.ahcich.1.pm_level, hint.ahcich.2.pm_level, ... as "Type" the info "unsupported" in red. Could you change this to "boot-time" (in black)? Background: the red display of "unsupported" could discourage users to use this function.
  • (later after OPNsense 23.7 release): Option in web interface to enable SATA link power management. Would probably also be solvable via plugin. So far I haven't developed a plugin, but I could imagine it.
  • (for discussion in the medium term): Enable DIPM by default for mobile/embedded SATA chipsets. However, since there is the limitation with hot-plug, this would have to be well-considered. However, the advantage would be that thousands (or tens/hundreds of thousands - I don't know how many OPNsense systems with SATA are in use worldwide) would then potentially need 1 watt less energy each. Globally, this would result in a relevant CO2 saving. OPNsense could call itself "Green IT deleoped Software" or something like that or just emphasize that energy optimization (in this case practically without costs on network performance) is an important point.

What do you think?
Best regards, Werner
#5
High availability / Re: Triggered scripts on failover
October 02, 2020, 05:24:14 PM
Hi,

you can check what gets triggered by following the log "clog -f /var/log/system.log" during a failover - see  https://www.thomas-krenn.com/de/wiki/OPNsense_HA_Cluster_einrichten#Ausfalls-Test

Regarding Wireguard failover I have not done any tests yet, but as far as I see from the forum there is no support possible yet: https://forum.opnsense.org/index.php?topic=16339.0

Best regards,
Werner
#6
For the reference: the issue has been solved, details see: https://github.com/opnsense/src/issues/67

Root cause was https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=242406

Thank you @mimugmail and @franco for all your help.
#7
Hi again,

I did now another test via the web interface. In parallel, I've executed "clog -f /var/log/system.log" and "clog -f /var/log/ppps.log" (both files are attached).

I ran into the same issue again as before when using the web interface to configure the LTE connection, but the info I got in those two files is interesting:

It seems indeed, that establishing the LTE connection works and the issue is caused by something that is executed afterwards.

Best regards,
Werner
#8
Hi again,

I did some further extensive testing today, and I have noticed that I have missed one step to mention in my former post (copying cp /usr/local/opnsense/scripts/interfaces/mpd.script to /var/etc/ before starting mpd5 on the command line).

So to sum up the following procedure works without any issues (using the command line):

------------
1) Install OPNsense 20.7 beta
2) Apply all updates as of July, 15th
3) Configure opt interface with LTE modem but do not enable it (I think this step would not be necessary)
4) Activate SSH access
5) Create /var/etc/mpd_opt1-wernertest.conf with the contents of the attached file from the former post above
6) Execute:
# cp -a /usr/local/opnsense/scripts/interfaces/mpd.script /var/etc/
# /usr/local/sbin/mpd5 -b -k -d /var/etc -f mpd_opt1-wernertest.conf -p /var/run/ppp_opt1-wernertest.pid -s ppp pppclient
------------

# ifconfig
[...]
ppp0: flags=88d1<UP,POINTOPOINT,RUNNING,NOARP,SIMPLEX,MULTICAST> metric 0 mtu 1492
   inet 10.170.120.117 --> 10.64.64.0 netmask 0xffffffff
   inet6 fe80::de58:bcff:fee0:16%ppp0 prefixlen 64 scopeid 0x9
   nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>


------------
Log output /var/log/system.log:

Jul 15 11:36:45 OPNsense kernel: WARNING: attempt to domain_add(netgraph) after domainfinalize()
Jul 15 11:36:45 OPNsense kernel: ng0: changing name to 'ppp0'
Jul 15 11:36:46 OPNsense opnsense-devel[3553]: /usr/local/etc/rc.newwanip: IPv4 renewal is starting on 'ppp0'
Jul 15 11:36:46 OPNsense opnsense-devel[3553]: /usr/local/etc/rc.newwanip: Interface 'opt1' is disabled or empty, nothing to do.


------------
Log output /var/log/ppps.log is attached (as it is longer)

So doing this on the command line does not trigger the issue. Therefore I have nothing what I could report on FreeBSD's bugzilla right now.

Best regards,
Werner
#9
Hi Franco,

we did some further extensive testing with both OPNsense 20.1 and 20.7.

Establishing a LTE connection from the command line works with both OPNsense 20.1 and 20.7beta.

I did now three times in a row the following tests on the command line of an OPNsense 20.7beta (all updates applied):

1. Boot OPNsense 20.7beta
2. Execute /usr/local/sbin/mpd5 -b -k -d /var/etc -f mpd_opt1-wernertest.conf -p /var/run/ppp_opt1-wernertest.pid -s ppp pppclient (see attachment for content of mpd_opt1-wernertest.conf)
3. Executing ifconfig ppp0 shows:

root@OPNsense:~ # ifconfig ppp0
ppp0: flags=88d1<UP,POINTOPOINT,RUNNING,NOARP,SIMPLEX,MULTICAST> metric 0 mtu 1492
inet 10.164.12.43 --> 10.64.64.0 netmask 0xffffffff
inet6 fe80::de58:bcff:fee0:16%ppp0 prefixlen 64 scopeid 0x9
nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>


I did then a poweroff / poweron and tried again (in total 3 times, every time it was successfully).

When configuring it via the OPNsense webinterface, I get the problem like described above.

Yesterday I had the issue, that when I tried it on the command line, the ppp0 interface did not get an IP address. But the interesting thing was, that as soon as a manual mpd5 process has been running (with my own manual pid file), activating the LTE interface via the OPNsense web interface was successfully possible (without causing the "Fatal trap 12: page fault while in kernel mode")

So my question is:

  • What commands/things does OPNsense trigger/execute when enabling an LTE interface via the OPNsense web interface? (I think it must be more then simply executing "/usr/local/sbin/mpd5 -b -k -d /var/etc -f mpd_opt1-wernertest.conf -p /var/run/ppp_opt1-wernertest.pid -s ppp pppclient" as doing this on the command line in my tests did not trigger the "Fatal trap 12: page fault while in kernel mode" error)

Best regards,
Werner
#10
Hi Franco,

currently we have not managed to reproduce the issue under FreeBSD. I seems we cannot get mpd5 running in the correct way.

I'm curious what happens in the background on an OPNsense system when I as an user activate a LTE connection.

Under a OPNsense 20.1 system I see this process running:

root@fw-home:/var/etc # ps -auxww | grep -i ppp
root    56849   0.0  0.3 1067716  6400  -  Ss   06:51      0:04.04 /usr/local/sbin/mpd5 -b -k -d /var/etc -f mpd_wan.conf -p /var/run/ppp_wan.pid -s ppp pppclient


So the configuration file for mpd is:

/var/etc/mpd_wan.conf


The contents of it is:

startup:
  # configure the console
  set console close
  # configure the web server
  set web close

default:
pppclient:
  create bundle static wan
  set bundle enable ipv6cp
  set iface name ppp0
  set iface route default
  set iface disable on-demand
  set iface idle 0
  set iface enable tcpmssfix
  set iface up-script /usr/local/opnsense/scripts/interfaces/ppp-linkup.sh
  set iface down-script /usr/local/opnsense/scripts/interfaces/ppp-linkdown.sh
  set ipcp ranges 0.0.0.0/0 10.64.64.0/0
  set ipcp enable req-pri-dns
  set ipcp enable req-sec-dns
  create link static wan_link0 modem
  set link action bundle wan
  set link disable multilink
  set link keep-alive 10 60
  set link max-redial 0
  set link disable chap pap
  set link accept chap pap eap
  set link disable incoming
  set link mtu 1492
  set auth authname "user"
  set auth password ���
  set modem device /dev/cuaU0.2
  set modem script DialPeer
  set modem idle-script Ringback
  set modem watch -cd
  set modem var $DialPrefix "DT"
  set modem var $Telephone "*99#"
  set modem var $APN "FixedIPRange1.mass.at"
  set modem var $APNum "1"
open


Could you give us some hints what steps we could do on a FreeBSD 12.1 system on the command line to create the LTE connection in the same way like OPNsense does it?

Thanks in advance for your help,
best regards,
Werner
#11
Thank you for your hint.

I will test it next week using FreeBSD 12.1 and will report the issue then here: https://bugs.freebsd.org/bugzilla/

I will keep you updated in this thread, too.

Best regards,
Werner
#12
Thank you for the hint.
I have tried this and did a reboot after I have added the content to the new file /usr/local/etc/rc.loader.d/20-netgraph and executed /usr/local/etc/rc.loader once.
I get no warning any more, but the "Fatal trap 12..." happens again:


root@OPNsense-beta:~ #

Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address   = 0x28
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80e00f86
stack pointer           = 0x28:0xfffffe00004d4500
frame pointer           = 0x28:0xfffffe00004d4540
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 16 (usbus0)
trap number             = 12
panic: page fault
cpuid = 3
time = 1590504781
__HardenedBSD_version = 1200059 __FreeBSD_version = 1201000
[...]


The full output is again in the attachment.

Best regards, Werner
#13
Thank you for your fast reply.

I have added "netgraph_load="YES"" to /boot/loader.conf.local, did a reboot and tried again.
The behaviour is still the same.
I have switched the console to the serial console and I have captured the output (see attachment here).

Here is the area where you can see the WARNING and then "Fatal trap 12: page fault while in kernel mode"

root@OPNsense-beta:~ # cat /boot/loader.conf.local
netgraph_load="YES"
root@OPNsense-beta:~ #
root@OPNsense-beta:~ # WARNING: attempt to domain_add(netgraph) after domainfinalize()


Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address   = 0x28
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80e00f86
stack pointer           = 0x28:0xfffffe00004c6500
frame pointer           = 0x28:0xfffffe00004c6540
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 16 (usbus0)
trap number             = 12
panic: page fault
cpuid = 3
time = 1590500997
__HardenedBSD_version = 1200059 __FreeBSD_version = 1201000
version = FreeBSD 12.1-RELEASE-p4-HBSD #1  6673d781c3f(master)-dirty: Wed Apr 29 05:17:47 CEST 2020
[...]
#14
Hi Franco,

> I would expect the same outcome. We can look at the crash, but if it works on 11.2 / 20.1 it may be due to new OS code.

regarding this LTE issue I have opened a new topic: https://forum.opnsense.org/index.php?topic=17417.0

Please let me know in case I should do any further testing.

Best regards, Werner
#15
Hi OPNsense team,

I did some in-depth testing today with two LTE modems (Quectel EG25-G, Quectel EG25-E, Huawei ME909u-521). All those modems work fine with OPNsense 20.1 (and they did with 19.7, 19.1, 18.7). With the current OPNsense 20.7 beta, the firewall reboots after I try to set the WAN connection to the LTE interface.

Steps to reproduce:

Immediately after that, on the console there is the following output (in bold):
WARNING: attempt to domain_add(netgraph) after domainfinalize

About 2 seconds after that, a lot of outputs runs through the console and after a while the firewall reboots. On the next login the dashboard mentions "A problem was detected. Click here for more information." (I did this and submitted the output).

I have attached the four files here, too.

Any ideas what the root cause of this issue could be? (I think this is a general ppp0/LTE issue, as both the Quectel and Huawei modems show the same issue)

Best regards,
Werner