OPNsense Forum

Archive => 20.1 Legacy Series => Topic started by: Waschbuesch on February 01, 2020, 09:50:17 pm

Title: flowd not working after upgrade.
Post by: Waschbuesch on February 01, 2020, 09:50:17 pm
Hi all,

I upgraded a firewall from 19.7 to 20.1 yesterday.
The upgrade itself went well, but afterwards, flowd is not working.

The passage in config.xml

Code: [Select]
    <Netflow version="1.0.1">
      <capture>
        <interfaces>lan,opt7,opt10,opt1,opt2</interfaces>
        <egress_only>opt1,opt2</egress_only>
        <version>v9</version>
        <targets>127.0.0.1:2056</targets>
      </capture>
      <collect>
        <enable>1</enable>
      </collect>
      <activeTimeout>1800</activeTimeout>
      <inactiveTimeout>15</inactiveTimeout>
    </Netflow>

/var/log/flowd.log is empty

and the flowd process has zero CPU usage despite running for hours and there being a lot of traffic:

Code: [Select]
gw01:~ # ps ax | grep flow
 6611  -  Is      0:00.00 flowd: net (flowd)
57722  -  Is      0:00.00 flowd: monitor (flowd)

Reboots and deleting the flowd.log and /var/netflow/* files have not made a difference.

I have a very similar setup on another box where this still works even after the upgrade to 20.1

Any ideas what else to try?
Title: Re: flowd not working after upgrade.
Post by: Waschbuesch on February 01, 2020, 10:27:11 pm
Ok, there seems to be a problem when generating

/usr/local/etc/netflow.conf

Code: [Select]
#
# Automatic generated configuration for netflow.
# Do not edit this file manually.
#
netflow_interfaces="bridge1 bridge0 bridge2 ovpnc1 pppoe0 "
netflow_egress_only="ovpnc1pppoe0 "
netflow_version="9"
netflow_int_destination="127.0.0.1:2055"
netflow_destinations="127.0.0.1:2056"
netflow_active_timeout=1800
netflow_inactive_timeout=15

The egress_only line *should* read:
Code: [Select]
netflow_egress_only="ovpnc1 pppoe0 "

The template (/usr/local/opnsense/service/templates/OPNsense/Netflow/netflow.conf) has that space in the wrong place:

Line 23-24 should read
Code: [Select]
}} {%
  endfor%}{%endif%}"

(instead of)
Code: [Select]
}}{%
  endfor%} {%endif%}"

I guess?
Title: Re: flowd not working after upgrade.
Post by: Waschbuesch on February 01, 2020, 11:07:07 pm
Now correct syntax in /usr/local/etc/netflow.conf
But still no content in /var/log/flowd.log

 :-\
Title: Re: flowd not working after upgrade.
Post by: AdSchellevis on February 02, 2020, 11:30:18 am
It looks like we accidentally removed a separator indeed, https://github.com/opnsense/core/commit/5c1756e6e91f17a9295e7300ced27f547239f9cf#diff-7d2fae400f9afec04a38ecab7e0f9150 should bring it back.

If you execute the following from a console, you should be able to apply settings again from the user interface.
Code: [Select]
opnsense-patch 5c1756e6

Title: Re: flowd not working after upgrade.
Post by: Waschbuesch on February 02, 2020, 12:38:57 pm
Thanks!

Since even with the corrected netflow.conf flowd does not write anything to its logfile, do you have any advice what else I could try to make NetFlow Reporting work again?
Title: Re: flowd not working after upgrade.
Post by: AdSchellevis on February 02, 2020, 12:41:17 pm
If it's restarted with the new config it's usually running fine again, but keep in mind that you should use the gui to restart it, since there are a couple of services involved here (netgraph, flowd)
Title: Re: flowd not working after upgrade.
Post by: Waschbuesch on February 02, 2020, 12:44:33 pm
I did use the gui, also tried a reboot. flowd.log still does not get written to.
Title: Re: flowd not working after upgrade.
Post by: Waschbuesch on February 03, 2020, 07:28:58 pm
Is there any other piece of information I could supply that might help diagnose this?

Even after the patch mentioned by AdSchellevis, followed by a reboot, flowd.log does not get written to. At all.
Title: Re: flowd not working after upgrade.
Post by: AdSchellevis on February 03, 2020, 07:47:29 pm
Do the Netflow cache counters progress (Reporting ->Netflow)? flowd is a consumer for those messages
Title: Re: flowd not working after upgrade.
Post by: Waschbuesch on February 04, 2020, 09:08:29 am
Actually, no. There are no entries on that page at all.
What process should be generating those?

And do I understand the process correctly:
XYZ generates the counters, they are picked up by flowd and put into the /var/log/flowd.log and then the aggregate script writes it into the sqlite DBs in /var/netflow?

I would really like to see a chart for this process, btw. :-)
Title: Re: flowd not working after upgrade.
Post by: AdSchellevis on February 04, 2020, 09:24:58 am
Netflow in FreeBSD is described in their man page, which you can find here
https://www.freebsd.org/cgi/man.cgi?query=ng_netflow

This script outputs the counters.
Code: [Select]
/usr/local/opnsense/scripts/netflow/flowctl_stats.py

ng_netflow -->samplicate [127.0.0.1:2055] --> flowd [127.0.0.1:2056]

The ng_netflow + samplicate startups can be found here:
Code: [Select]
/usr/local/etc/rc.d/netflow

Samplicate is used to offer the ability to send netflow data to multiple targets (localhost for flowd + others)

The normal startup for netflow (/usr/local/etc/rc.d/netflow restart) is likely a good starting point.

Title: Re: flowd not working after upgrade.
Post by: Waschbuesch on February 04, 2020, 11:05:00 am
Thank you for explaining the flow of things.

The result of /usr/local/etc/rc.d/netflow restart
Code: [Select]
root@gw01:~ # /usr/local/etc/rc.d/netflow restart
setup bridge1
ngctl: send msg: No such file or directory
error bridge1: cannot create netflow node for bridge1
setup bridge0
ngctl: send msg: No such file or directory
error bridge0: cannot create netflow node for bridge0
setup bridge2
ngctl: send msg: No such file or directory
error bridge2: cannot create netflow node for bridge2
setup ovpnc1 [egress only]
ngctl: send msg: No such file or directory
error ovpnc1: cannot create netflow node for ovpnc1
setup pppoe0 [egress only]
ngctl: send msg: No such file or directory
error pppoe0: cannot create netflow node for pppoe0

I am reading manpages to try and understand the syntax, etc., but for the record:
With pppoe0 device, for example, /usr/local/etc/rc.d/netflow restart tries to do:

Code: [Select]
root@gw01:~ # /usr/sbin/ngctl shutdown netflow_pppoe0
ngctl: shutdown: No such file or directory
root@gw01:~ # /usr/sbin/ngctl mkpeer pppoe0: netflow lower iface19
ngctl: send msg: No such file or directory
Title: Re: flowd not working after upgrade.
Post by: franco on February 04, 2020, 11:24:43 am
It looks like this is a side effect of removing netgraph driver load from the boot loader, see the following for a workaround to try:

https://forum.opnsense.org/index.php?topic=15653.0


Cheers,
Franco
Title: Re: flowd not working after upgrade.
Post by: Waschbuesch on February 04, 2020, 02:35:13 pm
Thanks, franco

Putting the content of this file https://github.com/opnsense/core/blob/stable/19.7/src/etc/rc.loader.d/20-netgraph
into /boot/loader.conf.local brought some improvement:

Code: [Select]
root@gw01:~ # /usr/local/etc/rc.d/netflow restart
setup bridge1
setup bridge0
setup bridge2
setup ovpnc1 [egress only]
ngctl: send msg: No such file or directory
error ovpnc1: cannot create netflow node for ovpnc1
setup pppoe0 [egress only]
ngctl: send msg: No such file or directory
error pppoe0: cannot create netflow node for pppoe0

So, some of the necessary modules are now loaded that weren't before.
At least, Reporting ->Netflow -> Cache now lists the bridges and their counters.

Obviously, however, my egress interfaces still don't collect data.
Title: Re: flowd not working after upgrade.
Post by: Waschbuesch on March 07, 2020, 09:58:30 am
Giving this a bump as it is still the same behavior in OPNsense 20.1.2.

I can get some of the interfaces to log netflow data by loading kernel modules that are no longer loaded automatically since 20.1.x.

But I have not been able to get my pppoe or openvpn WAN ports to log egress traffic. (I have to admin though, that I do not know for certain these two ever did).

At any rate, manually adding kernel modules to be loaded on boot in order to get built-in features (Netflow) to work seems like a band-aid to me. :-) Are there plans to overhaul the reporting section?
Title: Re: flowd not working after upgrade.
Post by: AdSchellevis on March 07, 2020, 07:14:50 pm
Normally the netgraph modules should be loaded automatically, but not all of them seem to be doing that at the moment.

If you can identify which ones we should add by minimal for your issue and open a ticket here https://github.com/opnsense/core/issues we can add those in the netflow loader script, like the ng_ether module added here https://github.com/opnsense/core/commit/4edbacc5193319337f4c1004e2505fe0821cb0c3

You can see the ones loaded (automatically) using the following command:

Code: [Select]
kldstat | grep ng_

Best regards,

Ad
Title: Re: flowd not working after upgrade.
Post by: Waschbuesch on March 08, 2020, 11:31:28 am
Thanks, Ad.

I have tried to find out what the missing modules might be, but so far, no luck.

If I do not touch modules, I end up with the graph as shown in the attached vanilla.png
and the output of
Code: [Select]
ngctl types is:

Code: [Select]
There are 10 total types:
      Type name   Number of living nodes
      ---------   ----------------------
        ksocket       1
        netflow       1
         tcpmss       1
          pppoe       1
          ether      10
            tee       1
            ppp       1
          iface       1
           mppc       0
         socket       6

If I put all these (https://github.com/opnsense/core/blob/stable/19.7/src/etc/rc.loader.d/20-netgraph) into /boot/loader.conf.local, I end up with the graph as shown in the attached modules.png and this output

Code: [Select]
There are 31 total types:
      Type name   Number of living nodes
      ---------   ----------------------
        netflow       4
         socket       6
           vlan       0
            vjc       0
            tty       0
            tee       1
         tcpmss       1
          ether      13
         eiface       0
        rfc1490       0
          pred1       0
           echo       0
        pptpgre       0
          pppoe       1
        deflate       0
            ppp       1
          async       0
           pipe       0
          cisco       0
       one2many       0
           mppc       0
            car       0
         bridge       0
            lmi       0
           l2tp       0
        ksocket       4
            bpf       0
          iface       1
             UI       0
           hole       0
    frame_relay       0

I assumed I would see which types are missing by listing the used node types, but apparently, that is not true (or there might be ng_xyz modules that are not node types themselves, but add functionality to existing types?).

Also note that, apparently, I have to reboot after adding modules. Using
Code: [Select]
kldload ng_xyz to load each of the modules mentioned in https://github.com/opnsense/core/blob/stable/19.7/src/etc/rc.loader.d/20-netgraph without a reboot and then doing
Code: [Select]
/usr/local/etc/rc.d/netflow restart, I do not get the same results.

How would I go about narrowing this down without doing trial and error and module by module and having to reboot? Is there another service that needs restarting or is a reboot really necessary?
Title: Re: flowd not working after upgrade.
Post by: franco on March 08, 2020, 01:14:51 pm
I am kind of lost now WRT what works when and how and if this can actually be restored to pre-20.1 behaviour despite general lack of reports that suggest there is a pre-20.1 behaviour.


Cheers,
Franco
Title: Re: flowd not working after upgrade.
Post by: Waschbuesch on March 09, 2020, 06:03:06 pm
Franco, let's forget about the unsure stuff for a moment (For the unsure stuff I will have to reinstall 19.7 on some box and test it.).
What I am certain about:

Say you create a bridge, add ports and assign the bridge as an interface e.g. OPT1.
And then enable flowd for OPT1.

Pre 20.x it would record traffic out of the box. With 20.x it does not.

This is definitely due to the modules mentioned and is what I can get back working when adding the modules back in.
The previous post explained how I was so far unable to narrow it down to which specific modules are responsible.
Title: Re: flowd not working after upgrade.
Post by: franco on March 10, 2020, 03:44:07 pm
Thanks for the explanation. How about we load "netgraph" kernel module on system loading time like before but load all other required modules on demand? ng_bridge seems like a good candidate.

I just don't want to go full circle with this as netgraph modules may needlessly slow down processing time.


Cheers,
Franco
Title: Re: flowd not working after upgrade.
Post by: Waschbuesch on March 13, 2020, 08:40:03 am
That's what I had attempted. The problem is, adding ng_bridge on boot is not sufficient (though that would have seemed like the obvious thing).
I have not yet made out what other module is needed. :-(