Hi OPNsense team,
I did some in-depth testing today with two LTE modems (Quectel EG25-G, Quectel EG25-E, Huawei ME909u-521). All those modems work fine with OPNsense 20.1 (and they did with 19.7, 19.1, 18.7). With the current OPNsense 20.7 beta, the firewall reboots after I try to set the WAN connection to the LTE interface.
Steps to reproduce:
- Install the current OPNsense 20.7 beta using https://pkg.opnsense.org/FreeBSD:12:amd64/snapshots/OPNsense-devel-20.7.b-OpenSSL-vga-amd64.img.bz2
- Use igb0 as LAN, igb1 as WAN and have a working wired Internet connection on igb1
- Apply all updates as of today (May, 26th). Versions are then "OPNsense 20.7.b_108-amd64", "FreeBSD 12.1-RELEASE-p4-HBSD"
- Configure the modem via "Interfaces -> Point-to-Point -> Devices" as described in https://www.thomas-krenn.com/de/wiki/OPNsense_LTE_Verbindung#Konfiguration_Modem. For a Quectel modem use /dev/cuaU0.2, for a Huawei ME909u-521 use /dev/cuaU0.0
- Switch to "Interfaces -> Assignments" and configure for "WAN" the network port "ppp0". Click "Save"
Immediately after that, on the console there is the following output (in bold):
WARNING: attempt to domain_add(netgraph) after domainfinalizeAbout 2 seconds after that, a lot of outputs runs through the console and after a while the firewall reboots. On the next login the dashboard mentions "A problem was detected. Click here for more information." (I did this and submitted the output).
I have attached the four files here, too.
Any ideas what the root cause of this issue could be? (I think this is a general ppp0/LTE issue, as both the Quectel and Huawei modems show the same issue)
Best regards,
Werner
Can you try to add the following to your system, reboot and try again....
echo 'netgraph_load="YES"' > /boot/loader.conf.local
The warning is normal (it should also appear on 20.1), the crash not so much.
Cheers,
Franco
Thank you for your fast reply.
I have added "netgraph_load="YES"" to /boot/loader.conf.local, did a reboot and tried again.
The behaviour is still the same.
I have switched the console to the serial console and I have captured the output (see attachment here).
Here is the area where you can see the WARNING and then "Fatal trap 12: page fault while in kernel mode"
root@OPNsense-beta:~ # cat /boot/loader.conf.local
netgraph_load="YES"
root@OPNsense-beta:~ #
root@OPNsense-beta:~ # WARNING: attempt to domain_add(netgraph) after domainfinalize()
Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address = 0x28
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80e00f86
stack pointer = 0x28:0xfffffe00004c6500
frame pointer = 0x28:0xfffffe00004c6540
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 16 (usbus0)
trap number = 12
panic: page fault
cpuid = 3
time = 1590500997
__HardenedBSD_version = 1200059 __FreeBSD_version = 1201000
version = FreeBSD 12.1-RELEASE-p4-HBSD #1 6673d781c3f(master)-dirty: Wed Apr 29 05:17:47 CEST 2020
[...]
Let's go one step further then:
https://raw.githubusercontent.com/opnsense/core/130436ca745bcc2f2b4ce93c0264a2aae1cd5dbc/src/etc/rc.loader.d/20-netgraph
Add this to /usr/local/etc/rc.loader.d/20-netgraph file and run /usr/local/etc/rc.loader once before reboot + try again.
Cheers,
Franco
Thank you for the hint.
I have tried this and did a reboot after I have added the content to the new file /usr/local/etc/rc.loader.d/20-netgraph and executed /usr/local/etc/rc.loader once.
I get no warning any more, but the "Fatal trap 12..." happens again:
root@OPNsense-beta:~ #
Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address = 0x28
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80e00f86
stack pointer = 0x28:0xfffffe00004d4500
frame pointer = 0x28:0xfffffe00004d4540
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 16 (usbus0)
trap number = 12
panic: page fault
cpuid = 3
time = 1590504781
__HardenedBSD_version = 1200059 __FreeBSD_version = 1201000
[...]
The full output is again in the attachment.
Best regards, Werner
It is enough evidence that a regression with 12.1 is more likely than anything we did. At this point raising an issue with FreeBSD is probably the best way forward: https://bugs.freebsd.org/bugzilla/
Cheers,
Franco
Thank you for your hint.
I will test it next week using FreeBSD 12.1 and will report the issue then here: https://bugs.freebsd.org/bugzilla/
I will keep you updated in this thread, too.
Best regards,
Werner
Hi Franco,
currently we have not managed to reproduce the issue under FreeBSD. I seems we cannot get mpd5 running in the correct way.
I'm curious what happens in the background on an OPNsense system when I as an user activate a LTE connection.
Under a OPNsense 20.1 system I see this process running:
root@fw-home:/var/etc # ps -auxww | grep -i ppp
root 56849 0.0 0.3 1067716 6400 - Ss 06:51 0:04.04 /usr/local/sbin/mpd5 -b -k -d /var/etc -f mpd_wan.conf -p /var/run/ppp_wan.pid -s ppp pppclient
So the configuration file for mpd is:
/var/etc/mpd_wan.conf
The contents of it is:
startup:
# configure the console
set console close
# configure the web server
set web close
default:
pppclient:
create bundle static wan
set bundle enable ipv6cp
set iface name ppp0
set iface route default
set iface disable on-demand
set iface idle 0
set iface enable tcpmssfix
set iface up-script /usr/local/opnsense/scripts/interfaces/ppp-linkup.sh
set iface down-script /usr/local/opnsense/scripts/interfaces/ppp-linkdown.sh
set ipcp ranges 0.0.0.0/0 10.64.64.0/0
set ipcp enable req-pri-dns
set ipcp enable req-sec-dns
create link static wan_link0 modem
set link action bundle wan
set link disable multilink
set link keep-alive 10 60
set link max-redial 0
set link disable chap pap
set link accept chap pap eap
set link disable incoming
set link mtu 1492
set auth authname "user"
set auth password ���
set modem device /dev/cuaU0.2
set modem script DialPeer
set modem idle-script Ringback
set modem watch -cd
set modem var $DialPrefix "DT"
set modem var $Telephone "*99#"
set modem var $APN "FixedIPRange1.mass.at"
set modem var $APNum "1"
open
Could you give us some hints what steps we could do on a FreeBSD 12.1 system on the command line to create the LTE connection in the same way like OPNsense does it?
Thanks in advance for your help,
best regards,
Werner
Hi Franco,
we did some further extensive testing with both OPNsense 20.1 and 20.7.
Establishing a LTE connection from the command line works with both OPNsense 20.1 and 20.7beta.
I did now three times in a row the following tests on the command line of an OPNsense 20.7beta (all updates applied):
1. Boot OPNsense 20.7beta
2. Execute
/usr/local/sbin/mpd5 -b -k -d /var/etc -f mpd_opt1-wernertest.conf -p /var/run/ppp_opt1-wernertest.pid -s ppp pppclient (see attachment for content of mpd_opt1-wernertest.conf)
3. Executing
ifconfig ppp0 shows:
root@OPNsense:~ # ifconfig ppp0
ppp0: flags=88d1<UP,POINTOPOINT,RUNNING,NOARP,SIMPLEX,MULTICAST> metric 0 mtu 1492
inet 10.164.12.43 --> 10.64.64.0 netmask 0xffffffff
inet6 fe80::de58:bcff:fee0:16%ppp0 prefixlen 64 scopeid 0x9
nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
I did then a poweroff / poweron and tried again (in total 3 times, every time it was successfully).
When configuring it via the OPNsense webinterface, I get the problem like described above.
Yesterday I had the issue, that when I tried it on the command line, the ppp0 interface did not get an IP address. But the interesting thing was, that as soon as a manual mpd5 process has been running (with my own manual pid file), activating the LTE interface via the OPNsense web interface was successfully possible (without causing the "Fatal trap 12: page fault while in kernel mode")
So my question is:
- What commands/things does OPNsense trigger/execute when enabling an LTE interface via the OPNsense web interface? (I think it must be more then simply executing "/usr/local/sbin/mpd5 -b -k -d /var/etc -f mpd_opt1-wernertest.conf -p /var/run/ppp_opt1-wernertest.pid -s ppp pppclient" as doing this on the command line in my tests did not trigger the "Fatal trap 12: page fault while in kernel mode" error)
Best regards,
Werner
Hi again,
I did some further extensive testing today, and I have noticed that I have missed one step to mention in my former post (copying cp /usr/local/opnsense/scripts/interfaces/mpd.script to /var/etc/ before starting mpd5 on the command line).
So to sum up the following procedure works without any issues (using the command line):
------------
1) Install OPNsense 20.7 beta
2) Apply all updates as of July, 15th
3) Configure opt interface with LTE modem but do not enable it (I think this step would not be necessary)
4) Activate SSH access
5) Create /var/etc/mpd_opt1-wernertest.conf with the contents of the attached file from the former post above
6) Execute:
# cp -a /usr/local/opnsense/scripts/interfaces/mpd.script /var/etc/
# /usr/local/sbin/mpd5 -b -k -d /var/etc -f mpd_opt1-wernertest.conf -p /var/run/ppp_opt1-wernertest.pid -s ppp pppclient
------------
# ifconfig
[...]
ppp0: flags=88d1<UP,POINTOPOINT,RUNNING,NOARP,SIMPLEX,MULTICAST> metric 0 mtu 1492
inet 10.170.120.117 --> 10.64.64.0 netmask 0xffffffff
inet6 fe80::de58:bcff:fee0:16%ppp0 prefixlen 64 scopeid 0x9
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
------------
Log output /var/log/system.log:
Jul 15 11:36:45 OPNsense kernel: WARNING: attempt to domain_add(netgraph) after domainfinalize()
Jul 15 11:36:45 OPNsense kernel: ng0: changing name to 'ppp0'
Jul 15 11:36:46 OPNsense opnsense-devel[3553]: /usr/local/etc/rc.newwanip: IPv4 renewal is starting on 'ppp0'
Jul 15 11:36:46 OPNsense opnsense-devel[3553]: /usr/local/etc/rc.newwanip: Interface 'opt1' is disabled or empty, nothing to do.
------------
Log output /var/log/ppps.log is attached (as it is longer)
So doing this on the command line does not trigger the issue. Therefore I have nothing what I could report on FreeBSD's bugzilla right now.
Best regards,
Werner
Hi again,
I did now another test via the web interface. In parallel, I've executed "clog -f /var/log/system.log" and "clog -f /var/log/ppps.log" (both files are attached).
I ran into the same issue again as before when using the web interface to configure the LTE connection, but the info I got in those two files is interesting:
It seems indeed, that establishing the LTE connection works and the issue is caused by something that is executed afterwards.
Best regards,
Werner
For the reference: the issue has been solved, details see: https://github.com/opnsense/src/issues/67
Root cause was https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=242406
Thank you @mimugmail and @franco for all your help.