flowd not working after upgrade.

Started by Waschbuesch, February 01, 2020, 09:50:17 PM

Previous topic - Next topic
Hi all,

I upgraded a firewall from 19.7 to 20.1 yesterday.
The upgrade itself went well, but afterwards, flowd is not working.

The passage in config.xml


    <Netflow version="1.0.1">
      <capture>
        <interfaces>lan,opt7,opt10,opt1,opt2</interfaces>
        <egress_only>opt1,opt2</egress_only>
        <version>v9</version>
        <targets>127.0.0.1:2056</targets>
      </capture>
      <collect>
        <enable>1</enable>
      </collect>
      <activeTimeout>1800</activeTimeout>
      <inactiveTimeout>15</inactiveTimeout>
    </Netflow>


/var/log/flowd.log is empty

and the flowd process has zero CPU usage despite running for hours and there being a lot of traffic:


gw01:~ # ps ax | grep flow
6611  -  Is      0:00.00 flowd: net (flowd)
57722  -  Is      0:00.00 flowd: monitor (flowd)


Reboots and deleting the flowd.log and /var/netflow/* files have not made a difference.

I have a very similar setup on another box where this still works even after the upgrade to 20.1

Any ideas what else to try?

Ok, there seems to be a problem when generating

/usr/local/etc/netflow.conf


#
# Automatic generated configuration for netflow.
# Do not edit this file manually.
#
netflow_interfaces="bridge1 bridge0 bridge2 ovpnc1 pppoe0 "
netflow_egress_only="ovpnc1pppoe0 "
netflow_version="9"
netflow_int_destination="127.0.0.1:2055"
netflow_destinations="127.0.0.1:2056"
netflow_active_timeout=1800
netflow_inactive_timeout=15


The egress_only line *should* read:

netflow_egress_only="ovpnc1 pppoe0 "


The template (/usr/local/opnsense/service/templates/OPNsense/Netflow/netflow.conf) has that space in the wrong place:

Line 23-24 should read

}} {%
  endfor%}{%endif%}"


(instead of)

}}{%
  endfor%} {%endif%}"


I guess?

Now correct syntax in /usr/local/etc/netflow.conf
But still no content in /var/log/flowd.log

:-\

It looks like we accidentally removed a separator indeed, https://github.com/opnsense/core/commit/5c1756e6e91f17a9295e7300ced27f547239f9cf#diff-7d2fae400f9afec04a38ecab7e0f9150 should bring it back.

If you execute the following from a console, you should be able to apply settings again from the user interface.

opnsense-patch 5c1756e6



Thanks!

Since even with the corrected netflow.conf flowd does not write anything to its logfile, do you have any advice what else I could try to make NetFlow Reporting work again?

If it's restarted with the new config it's usually running fine again, but keep in mind that you should use the gui to restart it, since there are a couple of services involved here (netgraph, flowd)

I did use the gui, also tried a reboot. flowd.log still does not get written to.

Is there any other piece of information I could supply that might help diagnose this?

Even after the patch mentioned by AdSchellevis, followed by a reboot, flowd.log does not get written to. At all.

Do the Netflow cache counters progress (Reporting ->Netflow)? flowd is a consumer for those messages

Actually, no. There are no entries on that page at all.
What process should be generating those?

And do I understand the process correctly:
XYZ generates the counters, they are picked up by flowd and put into the /var/log/flowd.log and then the aggregate script writes it into the sqlite DBs in /var/netflow?

I would really like to see a chart for this process, btw. :-)

Netflow in FreeBSD is described in their man page, which you can find here
https://www.freebsd.org/cgi/man.cgi?query=ng_netflow

This script outputs the counters.
/usr/local/opnsense/scripts/netflow/flowctl_stats.py


ng_netflow -->samplicate [127.0.0.1:2055] --> flowd [127.0.0.1:2056]

The ng_netflow + samplicate startups can be found here:

/usr/local/etc/rc.d/netflow


Samplicate is used to offer the ability to send netflow data to multiple targets (localhost for flowd + others)

The normal startup for netflow (/usr/local/etc/rc.d/netflow restart) is likely a good starting point.


Thank you for explaining the flow of things.

The result of /usr/local/etc/rc.d/netflow restart

root@gw01:~ # /usr/local/etc/rc.d/netflow restart
setup bridge1
ngctl: send msg: No such file or directory
error bridge1: cannot create netflow node for bridge1
setup bridge0
ngctl: send msg: No such file or directory
error bridge0: cannot create netflow node for bridge0
setup bridge2
ngctl: send msg: No such file or directory
error bridge2: cannot create netflow node for bridge2
setup ovpnc1 [egress only]
ngctl: send msg: No such file or directory
error ovpnc1: cannot create netflow node for ovpnc1
setup pppoe0 [egress only]
ngctl: send msg: No such file or directory
error pppoe0: cannot create netflow node for pppoe0


I am reading manpages to try and understand the syntax, etc., but for the record:
With pppoe0 device, for example, /usr/local/etc/rc.d/netflow restart tries to do:


root@gw01:~ # /usr/sbin/ngctl shutdown netflow_pppoe0
ngctl: shutdown: No such file or directory
root@gw01:~ # /usr/sbin/ngctl mkpeer pppoe0: netflow lower iface19
ngctl: send msg: No such file or directory


It looks like this is a side effect of removing netgraph driver load from the boot loader, see the following for a workaround to try:

https://forum.opnsense.org/index.php?topic=15653.0


Cheers,
Franco

February 04, 2020, 02:35:13 PM #13 Last Edit: February 04, 2020, 02:47:39 PM by Waschbuesch
Thanks, franco

Putting the content of this file https://github.com/opnsense/core/blob/stable/19.7/src/etc/rc.loader.d/20-netgraph
into /boot/loader.conf.local brought some improvement:


root@gw01:~ # /usr/local/etc/rc.d/netflow restart
setup bridge1
setup bridge0
setup bridge2
setup ovpnc1 [egress only]
ngctl: send msg: No such file or directory
error ovpnc1: cannot create netflow node for ovpnc1
setup pppoe0 [egress only]
ngctl: send msg: No such file or directory
error pppoe0: cannot create netflow node for pppoe0


So, some of the necessary modules are now loaded that weren't before.
At least, Reporting ->Netflow -> Cache now lists the bridges and their counters.

Obviously, however, my egress interfaces still don't collect data.

Giving this a bump as it is still the same behavior in OPNsense 20.1.2.

I can get some of the interfaces to log netflow data by loading kernel modules that are no longer loaded automatically since 20.1.x.

But I have not been able to get my pppoe or openvpn WAN ports to log egress traffic. (I have to admin though, that I do not know for certain these two ever did).

At any rate, manually adding kernel modules to be loaded on boot in order to get built-in features (Netflow) to work seems like a band-aid to me. :-) Are there plans to overhaul the reporting section?