OPNsense Forum

Archive => 20.7 Legacy Series => Topic started by: Werner Fischer on May 26, 2020, 02:34:29 pm

Title: SOLVED: LTE usage broken - WARNING: attempt to domain_add(netgraph) after ...
Post by: Werner Fischer on May 26, 2020, 02:34:29 pm
Hi OPNsense team,

I did some in-depth testing today with two LTE modems (Quectel EG25-G, Quectel EG25-E, Huawei ME909u-521). All those modems work fine with OPNsense 20.1 (and they did with 19.7, 19.1, 18.7). With the current OPNsense 20.7 beta, the firewall reboots after I try to set the WAN connection to the LTE interface.

Steps to reproduce:

Immediately after that, on the console there is the following output (in bold):
WARNING: attempt to domain_add(netgraph) after domainfinalize

About 2 seconds after that, a lot of outputs runs through the console and after a while the firewall reboots. On the next login the dashboard mentions "A problem was detected. Click here for more information." (I did this and submitted the output).

I have attached the four files here, too.

Any ideas what the root cause of this issue could be? (I think this is a general ppp0/LTE issue, as both the Quectel and Huawei modems show the same issue)

Best regards,
Werner

Title: Re: LTE usage broken - WARNING: attempt to domain_add(netgraph) after domainfinalize
Post by: franco on May 26, 2020, 03:24:57 pm
Can you try to add the following to your system, reboot and try again....

echo 'netgraph_load="YES"' > /boot/loader.conf.local

The warning is normal (it should also appear on 20.1), the crash not so much.


Cheers,
Franco
Title: Re: LTE usage broken - WARNING: attempt to domain_add(netgraph) after domainfinalize
Post by: Werner Fischer on May 26, 2020, 04:02:01 pm
Thank you for your fast reply.

I have added "netgraph_load="YES"" to /boot/loader.conf.local, did a reboot and tried again.
The behaviour is still the same.
I have switched the console to the serial console and I have captured the output (see attachment here).

Here is the area where you can see the WARNING and then "Fatal trap 12: page fault while in kernel mode"

root@OPNsense-beta:~ # cat /boot/loader.conf.local
netgraph_load="YES"
root@OPNsense-beta:~ #
root@OPNsense-beta:~ # WARNING: attempt to domain_add(netgraph) after domainfinalize()


Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address   = 0x28
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80e00f86
stack pointer           = 0x28:0xfffffe00004c6500
frame pointer           = 0x28:0xfffffe00004c6540
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 16 (usbus0)
trap number             = 12
panic: page fault
cpuid = 3
time = 1590500997
__HardenedBSD_version = 1200059 __FreeBSD_version = 1201000
version = FreeBSD 12.1-RELEASE-p4-HBSD #1  6673d781c3f(master)-dirty: Wed Apr 29 05:17:47 CEST 2020
[...]
Title: Re: LTE usage broken - WARNING: attempt to domain_add(netgraph) after domainfinalize
Post by: franco on May 26, 2020, 04:38:07 pm
Let's go one step further then:

https://raw.githubusercontent.com/opnsense/core/130436ca745bcc2f2b4ce93c0264a2aae1cd5dbc/src/etc/rc.loader.d/20-netgraph

Add this to /usr/local/etc/rc.loader.d/20-netgraph file and run /usr/local/etc/rc.loader once before reboot + try again.


Cheers,
Franco
Title: Re: LTE usage broken - WARNING: attempt to domain_add(netgraph) after domainfinalize
Post by: Werner Fischer on May 26, 2020, 04:58:05 pm
Thank you for the hint.
I have tried this and did a reboot after I have added the content to the new file /usr/local/etc/rc.loader.d/20-netgraph and executed /usr/local/etc/rc.loader once.
I get no warning any more, but the "Fatal trap 12..." happens again:


root@OPNsense-beta:~ #

Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address   = 0x28
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80e00f86
stack pointer           = 0x28:0xfffffe00004d4500
frame pointer           = 0x28:0xfffffe00004d4540
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 16 (usbus0)
trap number             = 12
panic: page fault
cpuid = 3
time = 1590504781
__HardenedBSD_version = 1200059 __FreeBSD_version = 1201000
[...]


The full output is again in the attachment.

Best regards, Werner
Title: Re: LTE usage broken - WARNING: attempt to domain_add(netgraph) after domainfinalize
Post by: franco on May 26, 2020, 05:13:07 pm
It is enough evidence that a regression with 12.1 is more likely than anything we did. At this point raising an issue with FreeBSD is probably the best way forward: https://bugs.freebsd.org/bugzilla/


Cheers,
Franco
Title: Re: LTE usage broken - WARNING: attempt to domain_add(netgraph) after domainfinalize
Post by: Werner Fischer on May 29, 2020, 02:21:55 pm
Thank you for your hint.

I will test it next week using FreeBSD 12.1 and will report the issue then here: https://bugs.freebsd.org/bugzilla/

I will keep you updated in this thread, too.

Best regards,
Werner
Title: Re: LTE usage broken - WARNING: attempt to domain_add(netgraph) after domainfinalize
Post by: Werner Fischer on June 17, 2020, 03:03:18 pm
Hi Franco,

currently we have not managed to reproduce the issue under FreeBSD. I seems we cannot get mpd5 running in the correct way.

I'm curious what happens in the background on an OPNsense system when I as an user activate a LTE connection.

Under a OPNsense 20.1 system I see this process running:
Code: [Select]
root@fw-home:/var/etc # ps -auxww | grep -i ppp
root    56849   0.0  0.3 1067716  6400  -  Ss   06:51      0:04.04 /usr/local/sbin/mpd5 -b -k -d /var/etc -f mpd_wan.conf -p /var/run/ppp_wan.pid -s ppp pppclient

So the configuration file for mpd is:
Code: [Select]
/var/etc/mpd_wan.conf

The contents of it is:
Code: [Select]
startup:
  # configure the console
  set console close
  # configure the web server
  set web close

default:
pppclient:
  create bundle static wan
  set bundle enable ipv6cp
  set iface name ppp0
  set iface route default
  set iface disable on-demand
  set iface idle 0
  set iface enable tcpmssfix
  set iface up-script /usr/local/opnsense/scripts/interfaces/ppp-linkup.sh
  set iface down-script /usr/local/opnsense/scripts/interfaces/ppp-linkdown.sh
  set ipcp ranges 0.0.0.0/0 10.64.64.0/0
  set ipcp enable req-pri-dns
  set ipcp enable req-sec-dns
  create link static wan_link0 modem
  set link action bundle wan
  set link disable multilink
  set link keep-alive 10 60
  set link max-redial 0
  set link disable chap pap
  set link accept chap pap eap
  set link disable incoming
  set link mtu 1492
  set auth authname "user"
  set auth password ���
  set modem device /dev/cuaU0.2
  set modem script DialPeer
  set modem idle-script Ringback
  set modem watch -cd
  set modem var $DialPrefix "DT"
  set modem var $Telephone "*99#"
  set modem var $APN "FixedIPRange1.mass.at"
  set modem var $APNum "1"
open

Could you give us some hints what steps we could do on a FreeBSD 12.1 system on the command line to create the LTE connection in the same way like OPNsense does it?

Thanks in advance for your help,
best regards,
Werner
Title: Re: LTE usage broken - WARNING: attempt to domain_add(netgraph) after domainfinalize
Post by: Werner Fischer on July 08, 2020, 02:02:27 pm
Hi Franco,

we did some further extensive testing with both OPNsense 20.1 and 20.7.

Establishing a LTE connection from the command line works with both OPNsense 20.1 and 20.7beta.

I did now three times in a row the following tests on the command line of an OPNsense 20.7beta (all updates applied):

1. Boot OPNsense 20.7beta
2. Execute /usr/local/sbin/mpd5 -b -k -d /var/etc -f mpd_opt1-wernertest.conf -p /var/run/ppp_opt1-wernertest.pid -s ppp pppclient (see attachment for content of mpd_opt1-wernertest.conf)
3. Executing ifconfig ppp0 shows:
Code: [Select]
root@OPNsense:~ # ifconfig ppp0
ppp0: flags=88d1<UP,POINTOPOINT,RUNNING,NOARP,SIMPLEX,MULTICAST> metric 0 mtu 1492
inet 10.164.12.43 --> 10.64.64.0 netmask 0xffffffff
inet6 fe80::de58:bcff:fee0:16%ppp0 prefixlen 64 scopeid 0x9
nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>

I did then a poweroff / poweron and tried again (in total 3 times, every time it was successfully).

When configuring it via the OPNsense webinterface, I get the problem like described above.

Yesterday I had the issue, that when I tried it on the command line, the ppp0 interface did not get an IP address. But the interesting thing was, that as soon as a manual mpd5 process has been running (with my own manual pid file), activating the LTE interface via the OPNsense web interface was successfully possible (without causing the "Fatal trap 12: page fault while in kernel mode")

So my question is:

Best regards,
Werner
Title: Re: LTE usage broken - WARNING: attempt to domain_add(netgraph) after domainfinalize
Post by: Werner Fischer on July 15, 2020, 03:46:25 pm
Hi again,

I did some further extensive testing today, and I have noticed that I have missed one step to mention in my former post (copying cp /usr/local/opnsense/scripts/interfaces/mpd.script to /var/etc/ before starting mpd5 on the command line).

So to sum up the following procedure works without any issues (using the command line):

------------
1) Install OPNsense 20.7 beta
2) Apply all updates as of July, 15th
3) Configure opt interface with LTE modem but do not enable it (I think this step would not be necessary)
4) Activate SSH access
5) Create /var/etc/mpd_opt1-wernertest.conf with the contents of the attached file from the former post above
6) Execute:
# cp -a /usr/local/opnsense/scripts/interfaces/mpd.script /var/etc/
# /usr/local/sbin/mpd5 -b -k -d /var/etc -f mpd_opt1-wernertest.conf -p /var/run/ppp_opt1-wernertest.pid -s ppp pppclient
------------

# ifconfig
[...]
ppp0: flags=88d1<UP,POINTOPOINT,RUNNING,NOARP,SIMPLEX,MULTICAST> metric 0 mtu 1492
   inet 10.170.120.117 --> 10.64.64.0 netmask 0xffffffff
   inet6 fe80::de58:bcff:fee0:16%ppp0 prefixlen 64 scopeid 0x9
   nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>


------------
Log output /var/log/system.log:

Jul 15 11:36:45 OPNsense kernel: WARNING: attempt to domain_add(netgraph) after domainfinalize()
Jul 15 11:36:45 OPNsense kernel: ng0: changing name to 'ppp0'
Jul 15 11:36:46 OPNsense opnsense-devel[3553]: /usr/local/etc/rc.newwanip: IPv4 renewal is starting on 'ppp0'
Jul 15 11:36:46 OPNsense opnsense-devel[3553]: /usr/local/etc/rc.newwanip: Interface 'opt1' is disabled or empty, nothing to do.


------------
Log output /var/log/ppps.log is attached (as it is longer)

So doing this on the command line does not trigger the issue. Therefore I have nothing what I could report on FreeBSD's bugzilla right now.

Best regards,
Werner
Title: Re: LTE usage broken - WARNING: attempt to domain_add(netgraph) after domainfinalize
Post by: Werner Fischer on July 16, 2020, 03:50:22 pm
Hi again,

I did now another test via the web interface. In parallel, I've executed "clog -f /var/log/system.log" and "clog -f /var/log/ppps.log" (both files are attached).

I ran into the same issue again as before when using the web interface to configure the LTE connection, but the info I got in those two files is interesting:

It seems indeed, that establishing the LTE connection works and the issue is caused by something that is executed afterwards.

Best regards,
Werner
Title: SOLVED: LTE usage broken - WARNING: attempt to domain_add(netgraph)
Post by: Werner Fischer on July 23, 2020, 11:54:03 am
For the reference: the issue has been solved, details see: https://github.com/opnsense/src/issues/67

Root cause was https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=242406

Thank you @mimugmail and @franco for all your help.