OPNsense Forum

Archive => 16.1 Legacy Series => Topic started by: Pedro on April 22, 2016, 04:00:59 pm

Title: WAN with DHCP loosing internet access.
Post by: Pedro on April 22, 2016, 04:00:59 pm
Hi all,

I finally managed to convince the higher-ups to implement OPNsense. Initially, all seemed to be working ok in the testing, but now that I've moved into production (setup is slightly different) we occasionally loose internet access. Everything seems to be working ok, but OPNsense shows the gateway as being down and we have no internet. I've yet to find a pattern and usually the only way I get internet access back is by rebooting OPNsense. Even a release/renew of the DHCP lease for WAN doesn't solve the problem.

Being rather new to OPNsense and FreeBSD, I'm at a loss as to what further I can do to troubleshoot this issue and would appreciate any help/guidance in solving this.

Specs
OPNsense 16.1.11_1-i386
FreeBSD 10.2-RELEASE-p14
LibreSSL 2.2.6
WAN using DHCP with gateway monitoring.

Title: Re: WAN with DHCP loosing internet access.
Post by: Pedro on April 26, 2016, 10:26:19 am
Really hate to bump this after just a couple of days, but the situation persists and I really could do with some help from the more experienced.

Any ideas?
Title: Re: WAN with DHCP loosing internet access.
Post by: fabian on April 26, 2016, 12:48:57 pm
I think nobody can help you with your problem because the required information is missing:

+ state of the interfaces (for example provide an output of ifconfig - you can replace your ip address) when it is working and when it is not
+ which protocol are you running on your wan?
+ ping from the firewall - does it work?
+ syslog messages
+ are the services running?
+ do you have any special configuration which is usually not used?
+ ...

Fabian
Title: Re: WAN with DHCP loosing internet access.
Post by: Zeitkind on April 27, 2016, 03:03:40 am
WAN using DHCP with gateway monitoring.
WAN failover?
This is often a big problem, depending how you try to achive monitoring. A lot of ping'able gateways and such tend to ignore pings a lot and therefor you get false alarms and failovers. Even the often used Google DNS (8.8.8.8 or 8.8.4.4) tend to drop pings a lot. What are your settings?
Also, some providers miss a minimum of intelligent DHCP handling and often dismiss TTL or do it non RFC-like, check for such errors in logs.
A combination of this: No ICMP echos, WAN restart and no answer from DHCP -> WAN offline though it wasn't at all. Seen this quite often.
Title: Re: WAN with DHCP loosing internet access.
Post by: Pedro on April 27, 2016, 12:15:37 pm
Hi all, thanks for your answers so far. I'll try to address them in order:

@Fabian:
- WAN is using DHCP to connect to the outside world;
- When internet fails, ping fails with "no buffer space available";
- Services are all running fine but I have not yet managed to look into syslog messages, that's the next step;
- No special configuration or tunable set yet

@Zeitkind:
Thanks for your input. I've disabled gateway monitoring for the time being and the problem persists, only now OPNsense thinks everything is fine. The "no buffer space available" gave me a little more to go on and I've managed to collect the following:

Code: [Select]
root@gw:~ # netstat -m
781/1499/2280 mbufs in use (current/cache/total)
762/758/1520/26368 mbuf clusters in use (current/cache/total/max)
762/756 mbuf+clusters out of packet secondary zone in use (current/cache)
0/31/31/13184 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/3906 9k jumbo clusters in use (current/cache/total/max)
0/0/0/2197 16k jumbo clusters in use (current/cache/total/max)
1719K/2014K/3734K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/8/6656 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile

Any further ideas? I'll keep watching the logs to see if anything pertinent pops up
Title: Re: WAN with DHCP loosing internet access.
Post by: Zeitkind on April 27, 2016, 02:23:21 pm
- When internet fails, ping fails with "no buffer space available";

- check for bad cables
- check for bad NIC
- check for kern.ipc.nmbclusters and kern.maxusers
- check for net.inet.tcp.recvbuf_max and net.inet.tcp.sendbuf_max
- check for lost default route

tbh, I suspect a hardware/driver related problem.
Title: Re: WAN with DHCP loosing internet access.
Post by: franco on April 27, 2016, 03:29:49 pm
Can we get the full unmodified log line of "no buffer space available" output?
Title: Re: WAN with DHCP loosing internet access.
Post by: Pedro on April 27, 2016, 05:36:08 pm
So, failed again (making that 5 times today). This time however I managed to get a little more info. Also, I neglected to mention earlier that we're in temporary facilities and are "sharing" another network on a seperate vlan, so in essence we have:

partner network on vlan => WAN with DHCP => LAN

When internet fails, any traffic on LAN continues to work just fine.

After internet failed

Code: [Select]
root@gw:~ # netstat -m
781/1244/2025 mbufs in use (current/cache/total)
746/524/1270/26368 mbuf clusters in use (current/cache/total/max)
746/519 mbuf+clusters out of packet secondary zone in use (current/cache)
0/83/83/13184 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/3906 9k jumbo clusters in use (current/cache/total/max)
0/0/0/2197 16k jumbo clusters in use (current/cache/total/max)
1687K/1691K/3378K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/7/6656 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile

Code: [Select]
root@gw:~ # ifconfig -a
re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=82098<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>
        ether f8:1a:67:00:23:73
        inet 192.168.200.1 netmask 0xffffff00 broadcast 192.168.200.255
        inet6 fe80::fa1a:67ff:fe00:2373%re0 prefixlen 64 scopeid 0x1
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect (none)
        status: no carrier
re1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=82098<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>
        ether 64:70:02:00:ef:4c
        inet 10.10.0.1 netmask 0xffff0000 broadcast 10.10.255.255
        inet6 fe80::6670:2ff:fe00:ef4c%re1 prefixlen 64 scopeid 0x2
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
vr0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=82808<VLAN_MTU,WOL_UCAST,WOL_MAGIC,LINKSTATE>
        ether 00:1b:fc:1e:62:1b
        inet6 fe80::21b:fcff:fe1e:621b%vr0 prefixlen 64 scopeid 0x3
        inet 192.168.4.20 netmask 0xffffff00 broadcast 192.168.4.255
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
pflog0: flags=100<PROMISC> metric 0 mtu 33184
pfsync0: flags=0<> metric 0 mtu 1500
        syncpeer: 0.0.0.0 maxupd: 128 defer: off
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
        inet 127.0.0.1 netmask 0xff000000
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x6
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
enc0: flags=0<> metric 0 mtu 1536
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

Code: [Select]
root@gw:~ # ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
ping: sendto: No buffer space available
ping: sendto: No buffer space available
ping: sendto: No buffer space available
ping: sendto: No buffer space available
^C
--- 8.8.8.8 ping statistics ---
4 packets transmitted, 0 packets received, 100.0% packet loss
root@gw:~ # ping 192.168.4.1
PING 192.168.4.1 (192.168.4.1): 56 data bytes
ping: sendto: No buffer space available
ping: sendto: No buffer space available
ping: sendto: No buffer space available
^C
--- 192.168.4.1 ping statistics ---
3 packets transmitted, 0 packets received, 100.0% packet loss

After reboot

Code: [Select]
root@gw:~ # netstat -m
658/1367/2025 mbufs in use (current/cache/total)
640/630/1270/26368 mbuf clusters in use (current/cache/total/max)
640/625 mbuf+clusters out of packet secondary zone in use (current/cache)
0/27/27/13184 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/3906 9k jumbo clusters in use (current/cache/total/max)
0/0/0/2197 16k jumbo clusters in use (current/cache/total/max)
1444K/1709K/3154K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/6/6656 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile

Code: [Select]
root@gw:~ # ifconfig -a
re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=82098<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>
        ether f8:1a:67:00:23:73
        inet 192.168.200.1 netmask 0xffffff00 broadcast 192.168.200.255
        inet6 fe80::fa1a:67ff:fe00:2373%re0 prefixlen 64 scopeid 0x1
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect (none)
        status: no carrier
re1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=82098<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>
        ether 64:70:02:00:ef:4c
        inet 10.10.0.1 netmask 0xffff0000 broadcast 10.10.255.255
        inet6 fe80::6670:2ff:fe00:ef4c%re1 prefixlen 64 scopeid 0x2
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
vr0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=82808<VLAN_MTU,WOL_UCAST,WOL_MAGIC,LINKSTATE>
        ether 00:1b:fc:1e:62:1b
        inet6 fe80::21b:fcff:fe1e:621b%vr0 prefixlen 64 scopeid 0x3
        inet 192.168.4.20 netmask 0xffffff00 broadcast 192.168.4.255
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
pflog0: flags=100<PROMISC> metric 0 mtu 33184
pfsync0: flags=0<> metric 0 mtu 1500
        syncpeer: 0.0.0.0 maxupd: 128 defer: off
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
        inet 127.0.0.1 netmask 0xff000000
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x6
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
enc0: flags=0<> metric 0 mtu 1536
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>


Code: [Select]
root@gw:~ # ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
ping: sendto: Operation not permitted
ping: sendto: Operation not permitted
64 bytes from 8.8.8.8: icmp_seq=2 ttl=56 time=124.951 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=56 time=110.339 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=56 time=120.774 ms
64 bytes from 8.8.8.8: icmp_seq=5 ttl=56 time=88.388 ms
64 bytes from 8.8.8.8: icmp_seq=6 ttl=56 time=202.519 ms
^C
--- 8.8.8.8 ping statistics ---
7 packets transmitted, 5 packets received, 28.6% packet loss

As far as system log goes, I saw this and am wondering if it could be related somehow:
Code: [Select]
Apr 27 14:13:43 gw kernel: warning: total configured swap (4194304 pages) exceeds maximum recommended amount (2097312 pages).
Apr 27 14:13:43 gw kernel: warning: increase kern.maxswzone or reduce amount of swap.


@Zeitkind:
We've tested with various cables already, but I'll test with another NIC as soon as I get my hands on one. I'll also look into tunables and adjust kern.ipc.nmbclusters and kern.maxusers. Any pointers as to which would be best? As far as default route goes, I do see this in the logs. Just can't quite be sure if it's the cause of the fault or part of startup/restart:

Code: [Select]
Apr 27 15:50:07 gw opnsense: /usr/local/etc/rc.bootup: ROUTING: remove current default route to 192.168.4.1
Apr 27 15:50:07 gw opnsense: /usr/local/etc/rc.bootup: ROUTING: setting default route to 192.168.4.1
Apr 27 15:50:07 gw dhcpleases: kqueue error: unkown


Once again, thanks for all the help, really appreciate it.
Title: Re: WAN with DHCP loosing internet access.
Post by: franco on April 27, 2016, 05:51:25 pm
On first glance looking at the good old docs from the parent:

https://doc.pfsense.org/index.php/No_buffer_space_available

That's a good checklist to go through. What Zeitkind said is likely true... vr(4) and re(4) are not the best drivers.

What device is this?
Title: Re: WAN with DHCP loosing internet access.
Post by: DFink on June 28, 2016, 03:52:20 am
I run into a similar situation testing OPNsense running in a Hyper-V VM.  Running an iperf3 client with multiple parallel streams (-P option) will cause traffic to stop flowing through OPNSense VM.   Trying to ping any IP address though the outside interface returns ping: sendto: No buffer space available.   Bringing the interface down and up restores connectivity.   I was running into the same issue when running an earlier version of OPNSense on a physical box  (Intel N3150 CPU, Dual RTL8111/8168/8411, 4GB RAM)
Title: Re: WAN with DHCP loosing internet access.
Post by: Julien on June 30, 2016, 11:13:20 pm
i had the same issue last day with my VM box on ESXI,
the issue was the VM has 2GB memory after i enable proxy server i los the connectivity with the firewall.
i've add some CPU and Memory to the firewall et voila the issue is gone.
what i am trying to say make sure your firewall has enough memory.