os-realtek-re plugin

Started by hemirunner426, September 23, 2021, 12:58:42 AM

Previous topic - Next topic
Is it recommended to install this plugin on 21.7.3?

The way I interpret the changelog it sounds like the realtek vendor drivers will be in this plug-in package and OPNSense kernel will be shipped with the standard freebsd driver.  Is that correct?

If so, then will the standard driver be recommended over the plugin once 22 drops?

The standard driver has been avoided because it was (and probably still is) much worse. Back in 2017, however, there was no FreeBSD port of the vendor driver so we replaced the driver in the base system, but that isn't necessary anymore so we would like to give people the opportunity to try the FreeBSD default driver again. The vendor driver actually had a few devices dropped from support over the years and WoL wasn't working as well anymore.

If you can try the plugin (install + reboot) and report back. :)


Thanks,
Franco

September 24, 2021, 10:55:49 PM #2 Last Edit: September 25, 2021, 02:29:21 AM by hemirunner426
Seem that both drivers give good performance on a 1G PPPoE link.

Issues start to pop up when anything using netmap is enabled.  I seen this problem before installing the realtek plugin...

1. Router stops routing
2. CPU pins 100% (suricata or sensei process is the culprit)
3. Stale table sizes grows quickly.

The only way out of this state is to reboot the router.

I'm not sure how else to troubleshoot this.

EDIT: seeing the same logging noise as in this post when enabling sensei in IPS mode with native or emulated netmap support.

https://forum.opnsense.org/index.php?topic=21458.0

September 25, 2021, 05:14:51 AM #3 Last Edit: September 25, 2021, 05:45:54 AM by hemirunner426
Wonder if the netmap issues lies with a buffer size issue between the re driver and netmap?

The re driver sets a buffer to the highest supported mtu size by the nic.  Probably 9000 on a gigabit card.  It is override or by setting hw.re.max_rx_mbuf_sz to something like 2048 or 4096.

I've seen several mentions that netmap does not do well with a buffer size greater than 4096.

Does anyone have more information on this?

September 25, 2021, 11:15:50 PM #4 Last Edit: September 25, 2021, 11:44:35 PM by hemirunner426
So far I've had no luck sifting through the logs that give any indication what goes on.  I had it happen twice in the span of 45 min.

The common symptoms:
1. re0 goes down every time (WAN).  The TX/RX light stops blinking when this occurs.
2. re1 (LAN) remains responsive and functional.
3. CPU spikes to 100%.  Unbound, python, suricata (or sensei related stuff when testing it) are the culprits.  Stopping/killing those processes does not change the state of the system.
4. State table goes though the roof as does memory and eventually swap.
5. A reboot is the only way to get the router back in a fully usable state.

The only relevant thing I see in dmesg is:

re0: reset never completed!

This seems to only happen when sensei or suricata are enabled.  I've only ran suricata in IDS mode and while I was testing sensei I tried native netmap, emulated, and passive.



I'm not sure what else I can do to gather more information?


So I've been doing some code comparison today.  You'll have to forgive me, C/C++ and driver development is not my expertise...

I was reviewing the Realtek vendor driver in the OPNSense Github here:
https://raw.githubusercontent.com/opnsense/src/21.7.2/sys/dev/re/if_re.c

What I noticed is there is no reference to it reading hw.re.max_rx_mbuf_sz tunable as what is specified in the following README:

QuoteAdd the following lines to your /boot/loader.conf
to override the built-in FreeBSD re(4) driver.

if_re_load="YES"
if_re_name="/boot/modules/if_re.ko"

By default, the size of allocated mbufs is enough
to receive the largest Ethernet frame supported
by the card.  If your memory is highly fragmented,
trying to allocate contiguous pages (more than
4096 bytes) may result in driver hangs.
For this reason the value is tunable at boot time,
e.g. if you don't need Jumbo frames you can lower
the memory requirements and avoid this issue with:

hw.re.max_rx_mbuf_sz="2048"

Unless I am somehow missing it, I don't see how the vendor driver in OPNSense is utilizing this tunable?

I'm using the following branch as a reference which has this sysctl enabled as a tunable:
https://github.com/kostikbel/rere

This person seemed to of had similar issues on a NAS box.  I suspect somewhere along the lines his commits may of been pushed to the FreeBSD driver tree, but for some reason it's not present in OPNSense.

My running theory is the increased load from something like suricata or sensei is causing this memory fragmentation issue and eventually killing the driver.

I may try to built this myself and try it out... That would require me building a dev VM and all that.
If someone would/could be me to it, that would be great!




I went ahead and compiled the driver from https://github.com/kostikbel/rere
against the OPNSense kernel source.

I replaced the binary if_re.ko in boot/modules with the one I compiled and reboot.

I enabled sensei with the native netmap module (although from the dmesg output it doesn't appear native mode works/is supported).

I will report back and see if these commits take care of my issue.

This was also a no-go.

I believe this may be a hardware fault/incompatibility.


Let's slow down. You are mixing up a multiple things:

1. The vendor driver is in src.git master branch and most recent stable branches reaching back to 2017.
2. hw.re.max_rx_mbuf_sz exists ONLY in the newly added realtek-re-kmod port installed by the plugin of the same name.
3. The FreeBSD driver supports NATIVE netmap mode, the vendor driver (port or OPNsense src.git) uses the EMULATED driver. I haven't heard a lot of bad things about the emulated driver use so far. In fact, reports were a lot more positive towards EMULATED driver back in 2017 when we did the switch.
4. I'm unsure what you are trying to achieve. At least we need to establish a better baseline and also inspect the actual hardware chipset you have at hand.


Cheers,
Franco

September 28, 2021, 05:24:19 PM #9 Last Edit: September 28, 2021, 05:25:52 PM by hemirunner426
Let's slow down. You are mixing up a multiple things:

I wouldn't be surprised.   :)

1. The vendor driver is in src.git master branch and most recent stable branches reaching back to 2017.
    - OK that is what I thought.

2. hw.re.max_rx_mbuf_sz exists ONLY in the newly added realtek-re-kmod port installed by the plugin of the same name.
    - So this is not the os-realtek-re plugin?  If I search 'realtek-re-kmod' in plugins it does not show.  I took these to be the same because their dmesg output when the driver loads appear to be the same.  ie it prints patent and driver version info where the FreeBSD driver does not.  This leaves me wondering why one would make reference to max_rx_mbut while the other is static?

3. The FreeBSD driver supports NATIVE netmap mode, the vendor driver (port or OPNsense src.git) uses the EMULATED driver. I haven't heard a lot of bad things about the emulated driver use so far. In fact, reports were a lot more positive towards EMULATED driver back in 2017 when we did the switch.
    - So this can be used (along side the output from dmesg on boot when the driver loads) to confirm which driver is installed.  Emulated vs native didn't really matter to me.  I wanted to see if I could figure out why my WAN link dies when these services are enabled under any sort of load.

4. I'm unsure what you are trying to achieve. At least we need to establish a better baseline and also inspect the actual hardware chipset you have at hand.

I'm trying to figure out if NIC is bad or if there is a driver issue.  For now, I am going to replace this unit with a Protectli as IPS is important to me.

As of now using anything IDS/IPS related on this device will render it unworkable after some amount of time.  The only hint from dmesg that something is wrong is "re0: reset never completed!".  The only way out of this broken state is a reboot.

I can hang on to the unit for awhile if you'd like me to do some exploratory work.  Here is the output from pciconf:

# pciconf -lbcevV re0
re0@pci0:1:0:0: class=0x020000 card=0x012310ec chip=0x816810ec rev=0x15 hdr=0x00
    vendor     = 'Realtek Semiconductor Co., Ltd.'
    device     = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet
    bar   [10] = type I/O Port, range 32, base 0xe000, size 256, enabled
    bar   [18] = type Memory, range 64, base 0xa1304000, size 4096, enabled
    bar   [20] = type Memory, range 64, base 0xa1300000, size 16384, enabled
    cap 01[40] = powerspec 3  supports D0 D1 D2 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit
    cap 10[70] = PCI-Express 2 endpoint MSI 1 max data 128(128) RO
                 link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1)
    cap 11[b0] = MSI-X supports 4 messages, enabled
                 Table in map 0x20[0x0], PBA in map 0x20[0x800]
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected
    ecap 0002[140] = VC 1 max VC0
    ecap 0003[160] = Serial 1 01000000684ce000
    ecap 0018[170] = LTR 1
    ecap 001e[178] = unknown 1
  PCI-e errors = Correctable Error Detected



Installed the plugin as well - using top -aPSH I don't see much different in interrupt use, so performance is on par. Using it on an Odroid H2+ using two RTL8125B.