OPNsense Forum

Archive => 17.1 Legacy Series => Topic started by: Mr.Goodcat on February 13, 2017, 07:45:12 pm

Title: Loss of internet access and OPNSense reachability
Post by: Mr.Goodcat on February 13, 2017, 07:45:12 pm
Hi,

a strange, new problem has reared its ugly head  :-\
Every few hours the connection to the internet through the OPNSense Box is severed and the Box becomes unreachable from the LAN, i.e. doesn't load the GUI and doesn't answer pings. Devices on the LAN-bridge however can still ping each other. From the VGA Console on the Box itself it is possible to ping out to internet but not any LAN client. Oddly enough DHCP seams to work correctly and sets DNS and Gateway appropriately if any new LAN clients are startet.

Two options to get the setup woking again have been identified: restarting OPNSense, or switching one of the previously inactive LAN ports on (e.g. via starting an attached switch).

From the logs these lock-ups always occur after the following WAN-side event (note: re0 is the NIC towards the WAN):

Feb. 12 18:14:32   opnsense: /usr/local/etc/rc.newwanip: ROUTING: setting IPv4 default route to X.Y.116.1
Feb. 12 18:14:31   opnsense: /usr/local/etc/rc.newwanip: rc.newwanip: on (IP address: X.Y.119.64) (interface: WAN[wan]) (real interface: re0).
Feb. 12 18:14:31   opnsense: /usr/local/etc/rc.newwanip: rc.newwanip: Informational is starting re0.

I tried changing the default setting for gateway monitoring but that didn't help. Actually I'm not quite sure if the observed behaviour also existed under 17.1RC from which an update was performed. Do you have any idea what could be the problem/solution? I suspect the WAN's DHCP lease but that shouldn't affect the ability to ping the OPNSense box from LAN. In such a case the problem should also occur in fixed intervals, but sometimes it takes ~24h and in other instances just a few hours. A case of PEBKAC is always a possibility as well but I'm at a loss regardless.


Details regarding my setup:


Thanks and kind regards,
Fabian
Title: Re: Loss of internet access and OPNSense reachability
Post by: markus on February 13, 2017, 08:54:41 pm
Hi Fabian,

I just updated to 17.1.1 and I am also stuck with the problem, that I can not ping the LAN address from my LAN clients. Everything else seems to be working alright, though.

Did you find any solution to this, yet?

Cheers,
Markus
Title: Re: Loss of internet access and OPNSense reachability
Post by: Mr.Goodcat on February 14, 2017, 05:14:58 pm
Hi Markus,

unfortunately there are no advancements to report. So far I tried to reconfigure most settings and fiddled with the gateway config. Rebooting the modem and OPNSense also didn't help. Maybe the problem is related to OPNSense locking up completely (i.e. https://forum.opnsense.org/index.php?topic=4414.0 (https://forum.opnsense.org/index.php?topic=4414.0))?

In any case, having to reboot OPNSense multiple times per day to get back out to WAN isn't an acceptable workaround. If all else fails I'll try 16.x and see if that works.

Does your hardware setup have something in common with mine? Just to make sure this doesn't come from some FreeBSD 11.0 driver issue.

Cheers,
Fabian

Title: Re: Loss of internet access and OPNSense reachability
Post by: djGrrr on February 14, 2017, 06:05:54 pm
Please try out a kernel we are testing with a new Realtek NIC driver and see if this makes any difference:

From console/shell as root:

# opnsense-update -kr 17.1.1-re
# /usr/local/etc/rc.reboot

Title: Re: Loss of internet access and OPNSense reachability
Post by: Mr.Goodcat on February 14, 2017, 10:41:06 pm
Please try out a kernel we are testing with a new Realtek NIC driver and see if this makes any difference:

From console/shell as root:

# opnsense-update -kr 17.1.1-re
# /usr/local/etc/rc.reboot

Hi djGrrr,

thank you for your help! I've just performed the update and the system was up and running again in no time. So far so good.

From the logs I also saw a few other bits that might help to illuminate the situation.

The part below keeps repeating but that shouldn't affect my main problem of losing the path out to WAN and OPNSense itself:
Code: [Select]
apinger: Error while feeding rrdtool: Broken pipe
apinger: rrdtool respawning too fast, waiting 300s.


Throughout the day the part below occurs multiple times as well. What's strange about this is that opt1 (i.e. the chelsio cxgbe1 NIC) is connected to a switch which was powered down during the day. So the port shouldn't come up or down but simply keep its off state. As you can see DynDNS is also mentioned in the log though it was never set up in the first place.
Code: [Select]
Feb. 14 17:26:36 configd.py: [5747513c-e9e6-4c9d-ab51-c7797f363eb3] updating dyndns opt1
Feb. 14 17:26:31 kernel: cxgbe0: cxgbe_media_change unimplemented.
Feb. 14 17:26:30 opnsense: /usr/local/etc/rc.linkup: HOTPLUG: Configuring interface opt1
Feb. 14 17:26:30 opnsense: /usr/local/etc/rc.linkup: DEVD Ethernet attached event for opt1
Feb. 14 17:26:30 configd.py: [5d214b1c-b327-480c-8ca2-db2090fb041e] Linkup starting cxgbe0
Feb. 14 17:26:30 kernel: cxgbe0: link state changed to UP
Feb. 14 17:26:30 opnsense: /usr/local/etc/rc.linkup: DEVD Ethernet detached event for opt1
Feb. 14 17:26:30 configd.py: [b24b7a30-bb2c-48b5-b10c-dcdc04d9c060] Linkup stopping cxgbe0
Feb. 14 17:26:30 kernel: cxgbe0: link state changed to DOWN


Last but not least something regarding my original observeration has changed in the night:
Code: [Select]
Feb. 14 04:40:01 opnsense: /usr/local/etc/rc.newwanip: The command '/sbin/route delete -inet '176.199.116.1' -interface 're0'' returned exit code '1', the output was 'route: route has not been found delete host 176.199.116.1: gateway re0 fib 0: not in table'
Feb. 14 04:40:01 opnsense: /usr/local/etc/rc.newwanip: ROUTING: setting IPv4 default route to 176.199.116.1
Feb. 14 04:40:00 opnsense: /usr/local/etc/rc.newwanip: rc.newwanip: on (IP address: 176.199.119.64) (interface: WAN[wan]) (real interface: re0).
Feb. 14 04:40:00 opnsense: /usr/local/etc/rc.newwanip: rc.newwanip: Informational is starting re0.

Previously that was the part after which connectivity to the WAN and OPNSense itself broke off. Either it "fixed" itself after some time during which no one noticed it broke, or me reselecting a few settings triggered something. Anyhow I'll keep monitoring it and see if it's fixed with the new kernel. Thanks again!


Cheers,
Fabian
Title: Re: Loss of internet access and OPNSense reachability
Post by: Mr.Goodcat on February 15, 2017, 10:19:24 am
Hi,

new day, new problem.  :(
In the morging my connection to WAN was down again, so I did the usual trick and powered on the switch attached to one of the LAN ports. Only this time that didn't help to revive OPNSense. Thus I logged into the VGA concole and tried to ping out to Internet and LAN but neither worked. Upon reboot: no change. Next step was to reboot the entire modem->OPNSense part of the network and reload settings, all of which didn't help either. Interestingly the LAN bridge and DHCP didn't do their jobs as well, though OPNSense reported all services as running. However during boot it took considerably longer for WAN, DHCP and DNS to come up.

As I haven't had the time to look at the logs from console, the setup is back to OpenWRT on an old router for now. After the kernel update yesterday everything worked smoothly at first, so I'm not sure what could have gone that wrong during the night. Perhaps my cron job which looks for updates broke someting?

Cheers,
Fabian
Title: Re: Loss of internet access and OPNSense reachability
Post by: markus on February 16, 2017, 10:00:37 am
Hi Fabian,

sorry for my laggy response. I didn't realize that I need to enable e-mail notification per thread in the forum.

Maybe the problem is related to OPNSense locking up completely (i.e. https://forum.opnsense.org/index.php?topic=4414.0 (https://forum.opnsense.org/index.php?topic=4414.0))?

It did that once, pretty fast after the reboot that took place after the update to 17.1.1. I was in the middle of configuring the IDS when I wondered why the box became unresponsive.

In any case, having to reboot OPNSense multiple times per day to get back out to WAN isn't an acceptable workaround. If all else fails I'll try 16.x and see if that works.

Funny thing was, 17.1 did not issue any problems...

Does your hardware setup have something in common with mine? Just to make sure this doesn't come from some FreeBSD 11.0 driver issue.

Not that much. It runs on a Lex Barebone with em0 (LAN) and igb0-igb4 (WAN1, WAN2, GUEST) network devices. So I'd rule out a driver problem since we use different vendors.

Will perform a clean 17.1.1 install this weekend to ensure the update process didn't mess things up.

Cheers,
Markus
Title: Re: Loss of internet access and OPNSense reachability
Post by: Mr.Goodcat on February 25, 2017, 03:29:19 pm
Hi,

sorry for the delay!
I reinstalled the box and am now on 17.1.2. However the issue with being able to ping out to WAN from the box itself but not to or from LAN still occures. This is always after "opnsense: /usr/local/etc/rc.newwanip" is shown by the logs, which I assume is the renewed IP lease from my DOCSIS modem. I also checked the dhcpd log and believe the cause for my problems comes from routes being broken once the WAN DHCP client renews its IP.

Here is the relevant part of the system log:
Code: [Select]
Feb 25 14:05:33 opnsense: /usr/local/etc/rc.newwanip: ROUTING: setting IPv4 default route to 37.201.232.1
Feb 25 14:05:32 opnsense: /usr/local/etc/rc.newwanip: rc.newwanip: on (IP address: 37.201.235.74) (interface: WAN[wan]) (real interface: re0).
Feb 25 14:05:32 opnsense: /usr/local/etc/rc.newwanip: rc.newwanip: Informational is starting re0.

And here the corresponding part of the dhcpd log:
Code: [Select]
Feb 25 14:05:29 OPNSENSEBOX dhclient: EXPIRE
Feb 25 14:05:29 OPNSENSEBOX dhclient: Starting delete_old_states()
Feb 25 14:05:29 OPNSENSEBOX dhclient: Comparing IPs: Old: 37.201.235.74 New:
Feb 25 14:05:29 OPNSENSEBOX dhclient: Removing states from old IP '37.201.235.74' (new IP '')
Feb 25 14:05:30 OPNSENSEBOX dhclient: Deleting old routes
Feb 25 14:05:30 OPNSENSEBOX dhclient: PREINIT
Feb 25 14:05:30 OPNSENSEBOX dhclient: Starting delete_old_states()
Feb 25 14:05:30 OPNSENSEBOX dhclient[65959]: DHCPDISCOVER on re0 to 255.255.255.255 port 67 interval 2
Feb 25 14:05:30 OPNSENSEBOX dhclient[65959]: DHCPOFFER from 37.201.232.1
Feb 25 14:05:30 OPNSENSEBOX dhclient: ARPSEND
Feb 25 14:05:32 OPNSENSEBOX dhclient: ARPCHECK
Feb 25 14:05:32 OPNSENSEBOX dhclient[65959]: DHCPREQUEST on re0 to 255.255.255.255 port 67
Feb 25 14:05:32 OPNSENSEBOX dhclient[65959]: DHCPACK from 37.201.232.1
Feb 25 14:05:32 OPNSENSEBOX dhclient: BOUND
Feb 25 14:05:32 OPNSENSEBOX dhclient: Starting add_new_address()
Feb 25 14:05:32 OPNSENSEBOX dhclient: ifconfig re0 inet 37.201.235.74 netmask 255.255.248.0 broadcast 255.255.255.255
Feb 25 14:05:32 OPNSENSEBOX dhclient: New IP Address (re0): 37.201.235.74
Feb 25 14:05:32 OPNSENSEBOX dhclient: New Subnet Mask (re0): 255.255.248.0
Feb 25 14:05:32 OPNSENSEBOX dhclient: New Broadcast Address (re0): 255.255.255.255
Feb 25 14:05:32 OPNSENSEBOX dhclient: New Routers (re0): 37.201.232.1
Feb 25 14:05:32 OPNSENSEBOX dhclient: Adding new routes to interface: re0
Feb 25 14:05:32 OPNSENSEBOX dhclient: /sbin/route add default 37.201.232.1
Feb 25 14:05:32 OPNSENSEBOX dhclient: Creating resolv.conf
Feb 25 14:05:35 OPNSENSEBOX dhclient[65959]: bound to 37.201.235.74 -- renewal in 1800 seconds.


Feb 25 14:15:08 OPNSENSEBOX dhcpd: send_packet: Invalid argument
Feb 25 14:15:08 OPNSENSEBOX dhcpd: dhcp.c:1699: Failed to send 300 byte long packet over fallback interface.

After the WAN ip renew at 14:05:xx pings to LAN result in this output: ping: sendto: invalid argument.
After reloading all services the connection out to WAN works again.

This seems to be related to a previously reported bug: https://github.com/opnsense/core/issues/190

Franco, could you or someone else with knowledge of the inner workings of OPNSense perhaps have a look at this?

Best,
Fabian
Title: Re: Loss of internet access and OPNSense reachability
Post by: Mr.Goodcat on February 26, 2017, 05:14:46 pm
Hi,

I've got some more data on this issue.
First of all there are further, similar reports on this, also for Pfsense:
https://forum.pfsense.org/index.php?topic=80478.0 (https://forum.pfsense.org/index.php?topic=80478.0)
https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1551351 (https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1551351)
https://forum.opnsense.org/index.php?topic=1242.0 (https://forum.opnsense.org/index.php?topic=1242.0)

It seems to be an issue that mostly occurs with cable (i.e. DOCSIS) modems for which the WAN IP is being supplied by the ISP DHCP. Fixing the speed of the WAN NIC seemed to have worked for at least one person but  didn't help in my case.

My setup is as follows: cable to ISP <-> Cisco EPC3212 cable modem <-> OPNsense box WAN (realtek NIC)  <-> OPNSense box LAN_bridge (Intel i350-T4, Chelsio T420-CR, Realtek) <-> LAN

Once again the issue came up, only this time I was logging in on the OPNsense box and could watch how it failed to renew the WAN IP after its lease expired. However its not entirely clear to me why this happens. Typically the WAN IP lease is valid for 1h, thus OPNsense tries to renew it after 1800s. Most of the time this works, but an answer from the ISP DHCP server (DHCPACK from 10.207.0.1) is only received roughly 5min ahead of the lease expiration.
Now below you can see how it successfully renews at 14:39:45. Shortly thereafter something fails.

Code: [Select]
Feb 26 15:18:44 dhcpd: DHCPACK on 192.168.0.2 to 00:05:1e:... via bridge0
Feb 26 15:18:44 dhcpd: DHCPREQUEST for 192.168.0.2 from 00:05:1e:... via bridge0
Feb 26 15:16:40 dhcpd: DHCPACK on 192.168.0.83 to bc:79:ad:... via bridge0
Feb 26 15:16:40 dhcpd: DHCPREQUEST for 192.168.0.83 from bc:79:ad:... via bridge0
Feb 26 15:14:52 dhcpd: DHCPACK on 192.168.0.131 to 10:4f:a8:... via bridge0
Feb 26 15:14:52 dhcpd: DHCPREQUEST for 192.168.0.131 from 10:4f:a8:... via bridge0
Feb 26 15:11:54 dhcpd: DHCPACK to 192.168.0.7 (00:1a:4d:...) via bridge0
Feb 26 15:11:54 dhcpd: DHCPINFORM from 192.168.0.7 via bridge0
Feb 26 15:01:53 dhcpd: DHCPACK to 192.168.0.7 (00:1a:4d:...) via bridge0
Feb 26 15:01:53 dhcpd: DHCPINFORM from 192.168.0.7 via bridge0
Feb 26 14:51:52 dhcpd: DHCPACK to 192.168.0.7 (00:1a:4d:...) via bridge0
Feb 26 14:51:52 dhcpd: DHCPINFORM from 192.168.0.7 via bridge0
Feb 26 14:41:51 dhcpd: DHCPACK to 192.168.0.7 (00:1a:4d:...) via bridge0
Feb 26 14:41:51 dhcpd: DHCPINFORM from 192.168.0.7 via bridge0
Feb 26 14:39:50 dhcpd: exiting.
Feb 26 14:39:50 dhcpd:
Feb 26 14:39:50 dhcpd: process and the information we find helpful for debugging..
Feb 26 14:39:50 dhcpd: before submitting a bug. These pages explain the proper
Feb 26 14:39:50 dhcpd: bugs on either our web page at www.isc.org or in the README file
Feb 26 14:39:50 dhcpd: than a configuration issue please read the section on submitting
Feb 26 14:39:50 dhcpd: If you think you have received this message due to a bug rather
Feb 26 14:39:50 dhcpd:
Feb 26 14:39:50 dhcpd: includes a bootp server.
Feb 26 14:39:50 dhcpd: are not running HP JetAdmin software, which
Feb 26 14:39:50 dhcpd: bootp in /etc/inetd.conf. Also make sure you
Feb 26 14:39:50 dhcpd: running and that there's no entry for dhcp or
Feb 26 14:39:50 dhcpd: Please make sure there is no other dhcp server
Feb 26 14:39:50 dhcpd: Can't bind to dhcp address: Address already in use
Feb 26 14:39:50 dhcpd: Sending on BPF/bridge0/02:be:7f:.../192.168.0.0/24
Feb 26 14:39:50 dhcpd: Listening on BPF/bridge0/02:be:7f:.../192.168.0.0/24
Feb 26 14:39:50 dhcpd: Wrote 1 leases to leases file.
Feb 26 14:39:50 dhcpd: Wrote 0 new dynamic host decls to leases file.
Feb 26 14:39:50 dhcpd: Wrote 0 deleted host decls to leases file.
Feb 26 14:39:50 dhcpd: For info, please visit https://www.isc.org/software/dhcp/
Feb 26 14:39:50 dhcpd: All rights reserved.
Feb 26 14:39:50 dhcpd: Copyright 2004-2016 Internet Systems Consortium.
Feb 26 14:39:50 dhcpd: PID file: /var/run/dhcpd.pid
Feb 26 14:39:50 dhcpd: Internet Systems Consortium DHCP Server 4.3.5
Feb 26 14:39:50 dhcpd: Database file: /var/db/dhcpd.leases
Feb 26 14:39:50 dhcpd: Config file: /etc/dhcpd.conf
Feb 26 14:39:50 dhcpd: For info, please visit https://www.isc.org/software/dhcp/
Feb 26 14:39:50 dhcpd: All rights reserved.
Feb 26 14:39:50 dhcpd: Copyright 2004-2016 Internet Systems Consortium.
Feb 26 14:39:50 dhcpd: Internet Systems Consortium DHCP Server 4.3.5
Feb 26 14:39:50 dhcpd: Server starting service.
Feb 26 14:39:50 dhcpd: Sending on Socket/fallback/fallback-net
Feb 26 14:39:50 dhcpd: Sending on BPF/bridge0/02:be:7f:.../192.168.0.0/24
Feb 26 14:39:50 dhcpd: Listening on BPF/bridge0/02:be:7f:.../192.168.0.0/24
Feb 26 14:39:50 dhcpd: Wrote 1 leases to leases file.
Feb 26 14:39:50 dhcpd: Wrote 0 new dynamic host decls to leases file.
Feb 26 14:39:50 dhcpd: Wrote 0 deleted host decls to leases file.
Feb 26 14:39:50 dhcpd: For info, please visit https://www.isc.org/software/dhcp/
Feb 26 14:39:50 dhcpd: All rights reserved.
Feb 26 14:39:50 dhcpd: Copyright 2004-2016 Internet Systems Consortium.
Feb 26 14:39:50 dhcpd: PID file: /var/run/dhcpd.pid
Feb 26 14:39:50 dhcpd: Internet Systems Consortium DHCP Server 4.3.5
Feb 26 14:39:50 dhcpd: Database file: /var/db/dhcpd.leases
Feb 26 14:39:50 dhcpd: Config file: /etc/dhcpd.conf
Feb 26 14:39:50 dhcpd: For info, please visit https://www.isc.org/software/dhcp/
Feb 26 14:39:50 dhcpd: All rights reserved.
Feb 26 14:39:50 dhcpd: Copyright 2004-2016 Internet Systems Consortium.
Feb 26 14:39:50 dhcpd: Internet Systems Consortium DHCP Server 4.3.5
Feb 26 14:39:47 dhclient[56146]: exiting.
Feb 26 14:39:47 dhclient[56146]: short write: wanted 20 got 0 bytes
Feb 26 14:39:46 dhclient[6545]: exiting.
Feb 26 14:39:46 dhclient[6545]: dhclient already running, pid: 43446.
Feb 26 14:39:45 dhclient: Creating resolv.conf
Feb 26 14:39:45 dhclient: /sbin/route add default 37.201.232.1
Feb 26 14:39:45 dhclient: Adding new routes to interface: re0
Feb 26 14:39:45 dhclient: New Routers (re0): 37.201.232.1
Feb 26 14:39:45 dhclient: New Broadcast Address (re0): 255.255.255.255
Feb 26 14:39:45 dhclient: New Subnet Mask (re0): 255.255.248.0
Feb 26 14:39:45 dhclient: New IP Address (re0): 37.201.235.74
Feb 26 14:39:45 dhclient: ifconfig re0 inet 37.201.235.74 netmask 255.255.248.0 broadcast 255.255.255.255
Feb 26 14:39:45 dhclient: Starting add_new_address()
Feb 26 14:39:45 dhclient: Comparing IPs: Old: 37.201.235.74 New: 37.201.235.74
Feb 26 14:39:45 dhclient: Starting delete_old_states()
Feb 26 14:39:45 dhclient: REBOOT
Feb 26 14:39:45 dhclient[43446]: DHCPACK from 10.207.0.1
Feb 26 14:39:44 dhclient[43446]: DHCPREQUEST on re0 to 255.255.255.255 port 67
Feb 26 14:39:44 dhclient: Removing states from old IP '37.201.235.74' (new IP '')
Feb 26 14:39:44 dhclient: Comparing IPs: Old: 37.201.235.74 New:
Feb 26 14:39:44 dhclient: Starting delete_old_states()
Feb 26 14:39:44 dhclient: PREINIT
Feb 26 14:39:39 dhclient[55265]: exiting.
Feb 26 14:39:39 dhclient[55265]: connection closed
Feb 26 14:31:50 dhcpd: DHCPACK to 192.168.0.7 (00:1a:4d:...) via bridge0
Feb 26 14:31:50 dhcpd: DHCPINFORM from 192.168.0.7 via bridge0
Feb 26 14:30:42 dhcpd: DHCPACK on 192.168.0.7 to 00:1a:4d:... via bridge0
Feb 26 14:30:42 dhcpd: DHCPREQUEST for 192.168.0.7 from 00:1a:4d:... via bridge0
Feb 26 14:24:47 dhclient[82719]: bound to 37.201.235.74 -- renewal in 1800 seconds.
Feb 26 14:24:47 dhclient: Creating resolv.conf
Feb 26 14:24:47 dhclient: RENEW
Feb 26 14:24:47 dhclient[82719]: DHCPACK from 10.207.0.1
Feb 26 14:24:47 dhclient[82719]: DHCPREQUEST on re0 to 255.255.255.255 port 67
Feb 26 14:22:15 dhclient[82719]: DHCPREQUEST on re0 to 10.207.0.1 port 67

At about the same time the DHCP log shows the folling:
Code: [Select]
Feb 26 14:39:51 configd.py: [ba6f6539-179b-4f6b-a0d4-195529956dae] Reloading filter
Feb 26 14:39:50 configd.py: [fb4e0fe1-8ce5-4864-a0f7-18707cf67ec8] updating dyndns wan
Feb 26 14:39:50 opnsense: /interfaces.php: The command '/usr/local/sbin/dhcpd -user dhcpd -group dhcpd -chroot /var/dhcpd -cf /etc/dhcpd.conf -pf /var/run/dhcpd.pid bridge0' returned exit code '1', the output was 'Internet Systems Consortium DHCP Server 4.3.5 Copyright 2004-2016 Internet Systems Consortium. All rights reserved. For info, please visit https://www.isc.org/software/dhcp/ Config file: /etc/dhcpd.conf Database file: /var/db/dhcpd.leases PID file: /var/run/dhcpd.pid Wrote 0 deleted host decls to leases file. Wrote 0 new dynamic host decls to leases file. Wrote 1 leases to leases file. Listening on BPF/bridge0/02:be:7f:.../192.168.0.0/24 Sending on BPF/bridge0/02:be:7f:.../192.168.0.0/24 Can't bind to dhcp address: Address already in use Please make sure there is no other dhcp server running and that there's no entry for dhcp or bootp in /etc/inetd.conf. Also make sure you are not running HP JetAdmin software, which includes a bootp server. If you think you have received this mess
Feb 26 14:39:50 configd.py: [c08b669f-2633-4c4e-8777-5bb61c34fcb5] updating dyndns wan
Feb 26 14:39:50 opnsense: /interfaces.php: The command '/usr/local/sbin/unbound -c '/var/unbound/unbound.conf'' returned exit code '1', the output was '[1488116390] unbound[25890:0] error: bind: address already in use [1488116390] unbound[25890:0] fatal error: could not open ports'
Feb 26 14:39:50 opnsense: /usr/local/etc/rc.linkup: The command '/usr/local/opnsense/scripts/dns/unbound_dhcpd.py /domain 'lan'' returned exit code '1', the output was 'Unable to lock on the pidfile.'
Feb 26 14:39:46 opnsense: /usr/local/etc/rc.linkup: ROUTING: setting IPv4 default route to 37.201.232.1
Feb 26 14:39:46 opnsense: /usr/local/etc/rc.linkup: The command '/sbin/dhclient -c /var/etc/dhclient_wan.conf re0 > /tmp/re0_output 2> /tmp/re0_error_output' returned exit code '1', the output was ''
Feb 26 14:39:46 opnsense: /usr/local/etc/rc.newwanip: ROUTING: setting IPv4 default route to 37.201.232.1
Feb 26 14:39:45 opnsense: /usr/local/etc/rc.newwanip: rc.newwanip: on (IP address: 37.201.235.74) (interface: WAN[wan]) (real interface: re0).
Feb 26 14:39:45 opnsense: /usr/local/etc/rc.newwanip: rc.newwanip: Informational is starting re0.
Feb 26 14:39:45 opnsense: /interfaces.php: ROUTING: setting IPv4 default route to 37.201.232.1
Feb 26 14:39:45 opnsense: /interfaces.php: The command '/sbin/dhclient -c /var/etc/dhclient_wan.conf re0 > /tmp/re0_output 2> /tmp/re0_error_output' returned exit code '15', the output was ''
Feb 26 14:39:45 opnsense: /usr/local/etc/rc.linkup: HOTPLUG: Configuring interface wan
Feb 26 14:39:45 opnsense: /usr/local/etc/rc.linkup: DEVD Ethernet attached event for wan
Feb 26 14:39:44 configd.py: [6b1d92bf-71f2-442e-b44e-757e90a10a5f] Linkup starting re0
Feb 26 14:39:44 kernel: re0: link state changed to UP
Feb 26 14:39:41 opnsense: /usr/local/etc/rc.linkup: DEVD Ethernet detached event for wan
Feb 26 14:39:41 configd.py: [9b8cfb08-73a0-4c73-b3cb-93f46887e9ff] Linkup stopping re0
Feb 26 14:39:41 kernel: re0: link state changed to DOWN
Feb 26 14:39:40 opnsense: /interfaces.php: Clearing states to old gateway 37.201.232.1.

1h after these error messages showed up, the WAN lease expired. Thus first the ability to reach the WAN broke at around 15:40 and /var/run/dhclient.re0.pid was gone. I tried to get a new IP by selecting "renew" for the realtek WAN interface from the GUI (which surprisingly was still reachable). Directly after the whole LAN bridge broke down and the box wasn't reachable anymore as well as other deviced on the bridge. Therefore I logged in locally and saw that the ability to go out to WAN was restored and only the LAN bridge didn't work. Next I reloaded all services and everything worked once more. But this was just true for a few minutes, so I issued an reboot directly from the box. As it came back up again only two (!) interfaces showed up so nothing worked. Thus I reloaded the last config and all except one interface (the second realtek NIC, dmesg -> re1: unknown device) came up. The LAN IP had to be reassigned to the realtek WAN interface as it somehow was attached to the wrong device. Another reboot and for the moment the setup is up and running again...

Interestingly the WAN IP has changed altough the modem hasn't been reset. Normally the WAN IP changes perhaps once a month.

If its any help here is the output from /etc/inetd.conf, as this file was mentioned in one of the logs as something that should be checked.
Code: [Select]
# $FreeBSD$
#
# Internet server configuration database
#
# Define *both* IPv4 and IPv6 entries for dual-stack support.
# To disable a service, comment it out by prefixing the line with '#'.
# To enable a service, remove the '#' at the beginning of the line.
#
#ftp    stream  tcp     nowait  root    /usr/libexec/ftpd       ftpd -l
#ftp    stream  tcp6    nowait  root    /usr/libexec/ftpd       ftpd -l
#ssh    stream  tcp     nowait  root    /usr/sbin/sshd          sshd -i -4
#ssh    stream  tcp6    nowait  root    /usr/sbin/sshd          sshd -i -6
#telnet stream  tcp     nowait  root    /usr/libexec/telnetd    telnetd
#telnet stream  tcp6    nowait  root    /usr/libexec/telnetd    telnetd
#shell  stream  tcp     nowait  root    /usr/libexec/rshd       rshd
#shell  stream  tcp6    nowait  root    /usr/libexec/rshd       rshd
#login  stream  tcp     nowait  root    /usr/libexec/rlogind    rlogind
#login  stream  tcp6    nowait  root    /usr/libexec/rlogind    rlogind
#finger stream  tcp     nowait/3/10 nobody /usr/libexec/fingerd fingerd -k -s
#finger stream  tcp6    nowait/3/10 nobody /usr/libexec/fingerd fingerd -k -s
#
# run comsat as root to be able to print partial mailbox contents w/ biff,
# or use the safer tty:tty to just print that new mail has been received.
#comsat dgram   udp     wait    tty:tty /usr/libexec/comsat     comsat
#
# ntalk is required for the 'talk' utility to work correctly
#ntalk  dgram   udp     wait    tty:tty /usr/libexec/ntalkd     ntalkd
#tftp   dgram   udp     wait    root    /usr/libexec/tftpd      tftpd -l -s /tftpboot
#tftp   dgram   udp6    wait    root    /usr/libexec/tftpd      tftpd -l -s /tftpboot
#bootps dgram   udp     wait    root    /usr/libexec/bootpd     bootpd
#
# "Small servers" -- used to be standard on, but we're more conservative
# about things due to Internet security concerns.  Only turn on what you
# need.
#
#daytime stream tcp     nowait  root    internal
#daytime stream tcp6    nowait  root    internal
#daytime dgram  udp     wait    root    internal
#daytime dgram  udp6    wait    root    internal
#time   stream  tcp     nowait  root    internal
#time   stream  tcp6    nowait  root    internal
#time    dgram  udp     wait    root    internal
#time    dgram  udp6    wait    root    internal
#echo   stream  tcp     nowait  root    internal
#echo   stream  tcp6    nowait  root    internal
#echo   dgram   udp     wait    root    internal
#echo   dgram   udp6    wait    root    internal
#discard stream tcp     nowait  root    internal
#discard stream tcp6    nowait  root    internal
#discard dgram  udp     wait    root    internal
#discard dgram  udp6    wait    root    internal
#chargen stream tcp     nowait  root    internal
#chargen stream tcp6    nowait  root    internal
#chargen dgram  udp     wait    root    internal
#chargen dgram  udp6    wait    root    internal
#
# CVS servers - for master CVS repositories only!  You must set the
# --allow-root path correctly or you open a trivial to exploit but
# deadly security hole.
#
#cvspserver     stream  tcp     nowait  root    /usr/local/bin/cvs      cvs --allow-root=/your/cvsroot/here pserver
#cvspserver     stream  tcp     nowait  root    /usr/local/bin/cvs      cvs --allow-root=/your/cvsroot/here kserver
#
# RPC based services (you MUST have rpcbind running to use these)
#
#rstatd/1-3     dgram rpc/udp wait root /usr/libexec/rpc.rstatd  rpc.rstatd
#rusersd/1-2    dgram rpc/udp wait root /usr/libexec/rpc.rusersd rpc.rusersd
#walld/1        dgram rpc/udp wait root /usr/libexec/rpc.rwalld  rpc.rwalld
#pcnfsd/1-2     dgram rpc/udp wait root /usr/local/libexec/rpc.pcnfsd    rpc.pcnfsd
#rquotad/1      dgram rpc/udp wait root /usr/libexec/rpc.rquotad rpc.rquotad
#rquotad/1      dgram rpc/udp6 wait root        /usr/libexec/rpc.rquotad rpc.rquotad
#sprayd/1       dgram rpc/udp wait root /usr/libexec/rpc.sprayd  rpc.sprayd
#
# example entry for the optional pop3 server
#
#pop3   stream  tcp     nowait  root    /usr/local/libexec/popper       popper
#
# example entry for the optional imap4 server
#
#imap4  stream  tcp     nowait  root    /usr/local/libexec/imapd        imapd
#
# example entry for the optional nntp server
#
#nntp   stream  tcp     nowait  news    /usr/local/libexec/nntpd        nntpd
#
# example entry for the optional uucpd server
#
#uucpd  stream  tcp     nowait  root    /usr/local/libexec/uucpd        uucpd
#
# Return error for all "ident" requests
#
#auth   stream  tcp     nowait  root    internal
#auth   stream  tcp6    nowait  root    internal
#
# Provide internally a real "ident" service which provides ~/.fakeid support,
# provides ~/.noident support, reports UNKNOWN as the operating system type
# and times out after 30 seconds.
#
#auth   stream  tcp     nowait  root    internal        auth -r -f -n -o UNKNOWN -t 30
#auth   stream  tcp6    nowait  root    internal        auth -r -f -n -o UNKNOWN -t 30
#
# Example entry for an external ident server
#
#auth   stream  tcp     wait    root    /usr/local/sbin/identd  identd -w -t120
#
# Example entry for the optional qmail MTA
#  NOTE: This is no longer the correct way to handle incoming SMTP
#        connections for qmail.  Use tcpserver (http://cr.yp.to/ucspi-tcp.html)
#        instead.
#
#smtp   stream  tcp     nowait  qmaild  /var/qmail/bin/tcp-env  tcp-env /var/qmail/bin/qmail-smtpd
#
# Enable the following two entries to enable samba startup from inetd
# (from the Samba documentation).  Enable the third entry to enable the swat
# samba configuration tool.
#
#netbios-ssn stream tcp nowait          root    /usr/local/sbin/smbd    smbd
#netbios-ns dgram udp   wait            root    /usr/local/sbin/nmbd    nmbd
#swat   stream  tcp     nowait/400      root    /usr/local/sbin/swat    swat

I'm at a loss regarding what could be done to fix this issue.
Title: Re: Loss of internet access and OPNSense reachability
Post by: Mr.Goodcat on March 03, 2017, 05:09:34 pm
OK, I made an interesting observation:

Normally OPNSense's DHCP client sends multiple DHCPREQUEST messages to the cable modem on port 67. However, the modem only seems to answer to requests made via broadcast:

Code: [Select]
DHCPREQUEST on re0 to 10.207.0.1 port 67
DHCPREQUEST on re0 to 10.207.0.1 port 67
DHCPREQUEST on re0 to 10.207.0.1 port 67
DHCPREQUEST on re0 to 255.255.255.255 port 67
DHCPACK from 10.207.0.1

If no request to 255.255.255.255 is made adhead of the lease's expiry, the renew process is started anew (see last posts). However, DHCPACK is then comming from the current gateway IP (i.e. the current WAN IP) and not 10.207.0.1 which normally hands out the WAN IP. I assume this 10.x.x.x IP is from a DHCP on a kind of ISP cable LAN? At least it is shown by the cable modem as its tftp server:

Cable Modem IP Address:    10.133.20.16
Cable Modem Mask:    255.255.192.0
Cable Modem Gateway:    0.0.0.0
Cable Modem TFTP Server:    10.207.0.1

Any idea?
Title: Re: Loss of internet access and OPNSense reachability
Post by: Mr.Goodcat on March 12, 2017, 11:24:15 am
Another update:

In an attempt to see if my hardware is at fault I installed pfsense 2.3.3_p1 and configured it just like OPNsense - pretty bare bones that is. The machine has now been running for a few days without the lan bridge going down when the WAN lease expires, so this (routing?) problem definitely comes from OPNsense. Perhaps this info helps and I'd like to finally help fix this if any logs are required.

However, another part of the problem also occures with pfsense: Dhclient fails to renew the WAN IP (due to unicasts to the ISP's DHCP relay which aren't answered) and only sometimes succeeds in the rebind phase, i.e. via broadcast to 255.255.255.255. Now I set an entry in dhclient.conf (not directly as pfsense creates this file from an interfaces.inc file) which should force the dhclient to always broadcast its requests:

supersede dhcp-server-identifier 255.255.255.255;

Although the entry can be been in the correct file, it doesn't have any effect. Now I haven't tried this with OPNsense yet, but I assume the results would be similar. Please help me in fixing this. These problems seem to exist for years, yet any thread I found failed to get a solution.
Title: Re: Loss of internet access and OPNSense reachability
Post by: djGrrr on March 12, 2017, 03:24:22 pm
You should be able to add this option by using the advanced configuration mode with Option Modifiers.

If that does not work, it should be possible to use NAT to change the DHCP packet's destination address to 255.255.255.255
Title: Re: Loss of internet access and OPNSense reachability
Post by: Mr.Goodcat on March 12, 2017, 04:37:23 pm
Hi djGrrr,

thanks for your answer! Unfortunately packet captures show that dhclient simply ignores the supersede entry.
As far as I understand https://github.com/opnsense/src/blob/master/sbin/dhclient/dhclient.c (https://github.com/opnsense/src/blob/master/sbin/dhclient/dhclient.c), there is never a check if this option is set.

This should be done after T1 expires and thus would be the responsibility of the state_bound function:

Code: [Select]
void
state_bound(void *ipp)
{
struct interface_info *ip = ipp;

ASSERT_STATE(state, S_BOUND);

/* T1 has expired. */
make_request(ip, ip->client->active);
ip->client->xid = ip->client->packet.xid;

if (ip->client->active->options[DHO_DHCP_SERVER_IDENTIFIER].len == 4) {
memcpy(ip->client->destination.iabuf, ip->client->active->
    options[DHO_DHCP_SERVER_IDENTIFIER].data, 4);
ip->client->destination.len = 4;
} else
ip->client->destination = iaddr_broadcast;

ip->client->first_sending = cur_time;
ip->client->interval = ip->client->config->initial_interval;
ip->client->state = S_RENEWING;

/* Send the first packet immediately. */
send_request(ip);
}

To me it looks like it takes the DHCP server identifier supplied by the DHCP without checking if it is superseded.
The only check (within the dhclient.c) for a supersede option seems to be related to DHO_SUBNET_MASK.

Could the following modifications fix this? (Haven't tried it as I don't have a build environment and firm grasp of the code).

Code: [Select]
if (ip->client->config->default_actions[DHO_DHCP_SERVER_IDENTIFIER] == ACTION_SUPERSEDE) {
memcpy(ip->client->destination.iabuf, ip->client->config->defaults[DHO_DHCP_SERVER_IDENTIFIER].data, 4);
ip->client->config->defaults[DHO_DHCP_SERVER_IDENTIFIER].len = 4;
} else if (ip->client->active->options[DHO_DHCP_SERVER_IDENTIFIER].len == 4) {
memcpy(ip->client->destination.iabuf, ip->client->active->
    options[DHO_DHCP_SERVER_IDENTIFIER].data, 4);
ip->client->destination.len = 4;
} else
ip->client->destination = iaddr_broadcast;


Regarding the NAT option, I set up a virtual IP alias (255.255.255.255/32) on the WAN interface. Next I set "Hybrid Outbound NAT rule generation" in outbound NAT with
source=any port 68
destination= 10/8 port 67

and translation address to the virtual IP alias (i.e. 255.255.255.255) with port=static.

This doesn't work either, altough I'm not entirely sure if this is a valid configuration... :-\
Title: Re: Loss of internet access and OPNSense reachability
Post by: djGrrr on March 12, 2017, 05:05:45 pm
Outbound NAT would not work because it is changing the source address, not the destination address, it would likely require some changes to opnsense to create the correct rule through the UI (assuming pf even supports such a configuration)
Title: Re: Loss of internet access and OPNSense reachability
Post by: Mr.Goodcat on March 12, 2017, 05:17:54 pm
You don't happen to have any idea how to do this with pf instead of iptables?
Title: Re: Loss of internet access and OPNSense reachability
Post by: Mr.Goodcat on March 12, 2017, 09:44:17 pm
OK, fixed it.

As written a few posts ago I made some changes in the dhclient.c file:
https://github.com/opnsense/src/blob/master/sbin/dhclient/dhclient.c (https://github.com/opnsense/src/blob/master/sbin/dhclient/dhclient.c)

Here is the function which contains my change to actually make the "supersede dhcp-server-identifier 255.255.255.255" line work:

Code: [Select]
void
state_bound(void *ipp)
{
struct interface_info *ip = ipp;

ASSERT_STATE(state, S_BOUND);

/* T1 has expired. */
make_request(ip, ip->client->active);
ip->client->xid = ip->client->packet.xid;

/* Start of the updated section. */


if (ip->client->config->default_actions[DHO_DHCP_SERVER_IDENTIFIER] == ACTION_SUPERSEDE) {
memcpy(ip->client->destination.iabuf, ip->client->config->defaults[DHO_DHCP_SERVER_IDENTIFIER].data, 4);
ip->client->config->defaults[DHO_DHCP_SERVER_IDENTIFIER].len = 4;
} else if (ip->client->active->options[DHO_DHCP_SERVER_IDENTIFIER].len == 4) {
memcpy(ip->client->destination.iabuf, ip->client->active->
    options[DHO_DHCP_SERVER_IDENTIFIER].data, 4);
ip->client->destination.len = 4;
} else
ip->client->destination = iaddr_broadcast;


/* End of the updated section */


ip->client->first_sending = cur_time;
ip->client->interval = ip->client->config->initial_interval;
ip->client->state = S_RENEWING;

/* Send the first packet immediately. */
send_request(ip);
}

After compiling dhclient from your git under FreeBSD the option works and dhclient sends renew requests via broadcast as it should. This finally gets the DHCP behind the relay to update the lease ahead of expiry.

@Franco or any other OPN dev: can you please verify this and include it in the next point release? Or even better, submit it to upstream? Thanks!


Now the remaining question is why the connection between WAN and the LAN bridge is lost if the WAN lease expires. If that could also get fixed I'd at last be able to deploy OPN more widely.
Title: Re: Loss of internet access and OPNSense reachability
Post by: franco on March 24, 2017, 08:26:43 am
Hi Fabian,

Working on this now... Is this your patch or was this derived from e.g. OpenBSD's version?

Is there anything else we need to do here or will patching dhclient suffice to fix the use case?


Thanks,
Franco
Title: Re: Loss of internet access and OPNSense reachability
Post by: franco on March 24, 2017, 08:29:26 am
And how relevant is the MTU addition here?

https://github.com/opnsense/src/commit/4af258db


Cheers,
Franco
Title: Re: Loss of internet access and OPNSense reachability
Post by: franco on March 25, 2017, 08:10:49 am
Ok, the MTU parts didn't build and I have aligned the code to this in order to allow the sanity check to be executed in both cases, which matches a similar block found in priv_script_write_params():

https://github.com/opnsense/src/commit/702ff9b912

If you could test this and let me know it works we can merge this. :)


Cheers,
Franco
Title: Re: Loss of internet access and OPNSense reachability
Post by: franco on March 25, 2017, 08:51:17 am
Me again, so this atomic patch would be it: https://github.com/opnsense/src/commit/d5cb0e5.patch
Title: Re: Loss of internet access and OPNSense reachability
Post by: opnsense_user12123 on January 27, 2018, 11:21:20 am
Im using a cable Modem too. In the current version 17.7.12 this problem still exisits.
I have no problems connecting to opnsense or getting internet connection, but after every wan dhcp renew i get broken pipe rdrtool waiting messages in gateway log. These messages disappear only if i hit the save button on reporting settings rdrtool and appear again after dhcp renew on wan interface. Reinstall rdrtool or disable rdrtool doesnt fix the problem.
Will this be fixed in the new Version (18.1) ?

Thx
Title: Re: Loss of internet access and OPNSense reachability
Post by: Mr.Goodcat on February 03, 2018, 12:31:40 pm
Me again, so this atomic patch would be it: https://github.com/opnsense/src/commit/d5cb0e5.patch

Didn't track the development of OPN for a while, sorry. Version 18.1 is running fine and renews the WAN IP as it should, thanks to the patch (supersede dhcp-server-identifier 255.255.255.255). Thank you!
Title: Re: Loss of internet access and OPNSense reachability
Post by: franco on February 03, 2018, 03:35:33 pm
Hey Mr.Goodcat,

We've been carrying your proposal for a while now without issues. FreeBSD has not yet reacted to our upstream offer.

In any case: thanks for the feedback!  :)


Cheers,
Franco