OPNsense Forum

Archive => 15.7 Legacy Series => Topic started by: fraenki on December 27, 2015, 04:24:52 pm

Title: [SOLVED] HA: secondary node -> high system load / packet loss
Post by: fraenki on December 27, 2015, 04:24:52 pm
Hi,

I'm running two nodes of OPNsense in HA node with pfsync and CARP. The master is a physical server and the slave is a virtual machine. After starting the slave node everything is fine for ~15-45 minutes. But after this time, the system load increases dramatically on both nodes:

Code: [Select]
last pid: 37844;  load averages:  0.19,  0.07,  0.03                                                                                            up 0+00:36:01  15:18:51
176 processes: 3 running, 123 sleeping, 50 waiting
CPU:  0.0% user,  0.0% nice,  0.0% system, 10.8% interrupt, 89.2% idle
Mem: 73M Active, 64M Inact, 107M Wired, 568K Cache, 86M Buf, 719M Free
Swap:

  PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
   11 root       155 ki31     0K    32K RUN     1  35:45 100.00% idle{idle: cpu1}
   11 root       155 ki31     0K    32K CPU0    0  35:32  81.98% idle{idle: cpu0}
   12 root       -92    -     0K   800K WAIT    0   0:12  23.00% intr{irq257: virtio_p}


Once this happens all(?) networks will experience massive packet loss. Any ideas?
If running only the master node (with slave node shutdown) everything is fine.


Thanks
- Frank
Title: Re: HA: secondary node -> high system load / packet loss
Post by: fraenki on December 27, 2015, 04:56:16 pm
When the packet loss happens, there's not much going on, but on the secondary node the WAN interface (vtnet0) has unusual high traffic (for the *secondary* node!):

Code: [Select]
# systat -ifstat 1

                    /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
     Load Average   |

      Interface           Traffic               Peak                Total
        pfsync0  in     10.032 KB/s         31.589 KB/s            4.780 MB
                 out     4.438 KB/s         31.328 KB/s            1.976 MB

         vtnet9  in      0.067 KB/s          0.171 KB/s           47.361 KB
                 out     0.000 KB/s          0.000 KB/s            0.477 KB

         vtnet8  in      0.608 KB/s          1.085 KB/s          407.436 KB
                 out     0.000 KB/s          0.000 KB/s            2.262 KB

         vtnet7  in      0.067 KB/s          0.171 KB/s           47.635 KB
                 out     0.000 KB/s          0.000 KB/s            0.613 KB

         vtnet6  in      0.067 KB/s          0.170 KB/s           47.635 KB
                 out     0.000 KB/s          0.000 KB/s            0.613 KB

         vtnet5  in      0.067 KB/s          6.289 KB/s           90.687 KB
                 out     0.000 KB/s          0.000 KB/s            0.613 KB

         vtnet4  in      0.067 KB/s          0.170 KB/s           47.635 KB
                 out     0.000 KB/s          0.000 KB/s            0.613 KB

         vtnet3  in      0.176 KB/s         10.911 KB/s          673.893 KB
                 out     0.000 KB/s         10.325 KB/s          555.955 KB

         vtnet2  in      0.683 KB/s          1.314 KB/s          354.385 KB
                 out     0.000 KB/s          0.039 KB/s            2.420 KB

         vtnet1  in     10.521 KB/s         32.222 KB/s            5.835 MB
                 out     4.706 KB/s         32.696 KB/s            2.488 MB

         vtnet0  in      7.749 MB/s          8.372 MB/s          383.252 MB
                 out     7.825 MB/s         16.263 MB/s          380.792 MB

And on the primary node I can see ~50k packets/s across all VLANs, which is relatively high I guess:

Code: [Select]
# netstat -i -b -n -I re0 1
            input            re0           output
   packets  errs idrops      bytes    packets  errs      bytes colls
     47954     0     0    8383392      48001     0    8410276     0
     47212     0     0    7416337      47216     0    7402466     0
     49305     0     0    8875731      49317     0    8883694     0
     49921     0     0    7824459      49922     0    7833479     0


Thanks
- Frank
Title: Re: HA: secondary node -> high system load / packet loss
Post by: fraenki on December 27, 2015, 06:06:32 pm
Hmm. Somehow my WAN interface gets flooded with DHCP requests:

Code: [Select]
18:05:09.184607 IP A.B.C.D.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:1a:4a:25:c3:48, length 300
Obviously there's some kind of loop causing the high load and packet loss. And yes, I have dhcp relay enabled. Though I don't know why this will only happen if the secondary node is running...
Title: Re: HA: secondary node -> high system load / packet loss
Post by: fraenki on December 29, 2015, 11:01:50 pm
Found the loop, it was related to my Dual-WAN setup. I've messed up my rules somewhat and forwarded some DHCP requests to my WAN interface erroneously.
Title: Re: [SOLVED] HA: secondary node -> high system load / packet loss
Post by: franco on January 09, 2016, 12:33:56 am
Whoops. :)