Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - part_time_nerd

#1
Hi all, at my parent's house I have setup OPNsense as a KVM guest in a small Linux homeserver. Two Intel ethernet adapters are directly mapped from the linux host system to the OPNsense instance. One is uplink and one outputs 4 trunked vLANs which are then distributed thoughout the house and to an AP via a switch. (separate segments for trusted and untrusted server devices, parents and kids)

The KVM virtualized Firewall always wasnt the fastest, but for the task at hand ("fast enough for netflix") the vLan to vLan throughput was good enough at about 100+MBits.

However, after updating to 22.1, sending data from one vLan to another through the OPNsense router (e.g. streaming from fileserver to tablet) has come down to a mere crawl, maxing out at about 300-500kilobytes per second, with the permanent rate more at the lower end of that range. Once I swap in the backup image of 21.1, all is back to normal.

I have not changed any settings beyond the update. What could have gone wrong? Any ideas where to begin looking? Although I am quite proficient as a linux user, my BSD experience is limited and outdated.
#2
... and the bug got closed without any response. Duh.
#4
All right... I might have found the culprit.

I had enabled the "Enable Static ARP entries" Option in the DHCP section for this interface. Once I unticked this option, everything went back to normal.
This post here pointed me to the right direction.
https://moh10ly.wordpress.com/2015/02/14/ping-on-pfsense-gives-invalid-argument/

Of course, all devices on that subnet DO have DHCP Static Mappings registered in that section below. But somehow it looks like they seem to get lost in the reload procedure shown in the logs above.
Could that be a bug or am I getting something completely wrong here?
#5
Hi again,

I am now at the point where I have disabled all HW offloading options, with no success. I have, however, implemented a logging mechanism that allowed me to get the time in which the problem occurs more precisely.

Using that I found a common pattern in the System log, which looks like this:




Sep 4 21:53:46    opnsense: /usr/local/etc/rc.newwanip: Interface '' is disabled or empty, nothing to do.
Sep 4 21:53:46    opnsense: /usr/local/etc/rc.newwanip: IP renewal is starting on 'ovpns1'
Sep 4 21:53:46    configd.py: [66637140-dccd-4a86-a069-63ba711697f4] rc.newwanip starting ovpns1
Sep 4 21:53:45    kernel: ovpns1: link state changed to UP
Sep 4 21:53:45    configd.py: [2c8dd3bf-c9d2-4cf3-abe9-dac306ef241a] Reloading filter
Sep 4 21:53:44    configd.py: [5962f6e9-ca80-4081-8a90-d05894fb95fc] Reloading filter
Sep 4 21:53:44    kernel: ovpns1: link state changed to DOWN
Sep 4 21:53:44    opnsense: /usr/local/etc/rc.newwanip: Resyncing OpenVPN instances for interface WAN.
Sep 4 21:53:44    opnsense: /usr/local/etc/rc.newwanip: ROUTING: setting IPv4 default route to 109.XX.XY.1
Sep 4 21:53:44    opnsense: /usr/local/etc/rc.newwanip: On (IP address: 109.XX.YY.47) (interface: WAN[wan]) (real interface: vtnet3).
Sep 4 21:53:43    opnsense: /usr/local/etc/rc.newwanip: IP renewal is starting on 'vtnet3'




Sep 9 18:00:40    opnsense: /usr/local/etc/rc.newwanip: Interface '' is disabled or empty, nothing to do.
Sep 9 18:00:40    opnsense: /usr/local/etc/rc.newwanip: IP renewal is starting on 'ovpns1'
Sep 9 18:00:39    configd.py: [deda5b84-e388-495b-884d-c433c864d1aa] rc.newwanip starting ovpns1
Sep 9 18:00:39    kernel: ovpns1: link state changed to UP
Sep 9 18:00:39    configd.py: [bbe682dd-c1c8-4854-9137-8e35d0ea96bf] Reloading filter
Sep 9 18:00:38    configd.py: [1a22f2b2-de2e-4350-8b7b-0dcb9670981a] Reloading filter
Sep 9 18:00:38    kernel: ovpns1: link state changed to DOWN
Sep 9 18:00:38    opnsense: /usr/local/etc/rc.newwanip: Resyncing OpenVPN instances for interface WAN.
Sep 9 18:00:38    opnsense: /usr/local/etc/rc.newwanip: ROUTING: setting IPv4 default route to 109.XX.XY.1
Sep 9 18:00:37    opnsense: /usr/local/etc/rc.newwanip: On (IP address: 109.XX.XX.47) (interface: WAN[wan]) (real interface: vtnet3).
Sep 9 18:00:37    opnsense: /usr/local/etc/rc.newwanip: IP renewal is starting on 'vtnet3'




So it appears like when this happens (which I presume is a DHCP renewal on the WAN port) my restricted vLan number 5 loses all routing to anywhere.

Here are my routing tables:
(I took the freedom to modify parts of the WAN addresses using a few X Y and Z characters)

root@router:~ #   netstat -rW
Routing tables

Internet:
Destination        Gateway            Flags       Use    Mtu      Netif Expire
default            ip-1X9-91-XX-1.hsiZZ.provider.com UGS  1769411   1500     vtnet3
10.10.42.0/24      10.10.42.2         UGS           0   1500     ovpns1
10.10.42.1         link#9             UHS           0  16384        lo0
10.10.42.2         link#9             UH            0   1500     ovpns1
ip1X.9X.6X.8X.in-addr.arpa 52:54:00:03:6d:fd UHS      549   1500     vtnet3
8X.2XX.1X9.4       52:54:00:03:6d:fd  UHS           0   1500     vtnet3
109.91.52.0/22     link#4             U             0   1500     vtnet3
ip-1X9-91-XX-1.hsiZZ.provider.com 52:54:00:03:6d:fd UHS           0   1500     vtnet3
ip-1X9-91-XY-YY.hsiZZ.provider.com link#4 UHS           0  16384        lo0
localhost          link#6             UH        12164  16384        lo0
192.168.10.0/24    link#13            U             0   1500 vtnet1_vlan4
192.168.10.1       link#13            UHS           0  16384        lo0
192.168.23.0/24    link#11            U        483426   1500 vtnet1_vlan2
192.168.23.1       link#11            UHS           0  16384        lo0
192.168.42.0/24    link#10            U       4827227   1500 vtnet1_vlan1
router             link#10            UHS           0  16384        lo0
192.168.42.88      link#3             UHS           0  16384        lo0
192.168.42.88/32   link#3             U             0   1500     vtnet2
192.168.100.0/24   link#14            U         40610   1500 vtnet1_vlan5
192.168.100.1      link#14            UHS           0  16384        lo0

192.168.123.0/24   link#12            U          1944   1500 vtnet1_vlan3
192.168.123.1      link#12            UHS           0  16384        lo0
192.168.200.0/24   link#1             U         95885   1500     vtnet0
192.168.200.10     link#1             UHS           0  16384        lo0

Internet6:
Destination        Gateway            Flags       Use    Mtu    Netif Expire
::1                link#6             UH            0  16384      lo0
fe80::%vtnet0/64   link#1             U             0   1500   vtnet0
fe80::5054:ff:feXX:4765%vtnet0 link#1 UHS           0  16384      lo0
fe80::%vtnet1/64   link#2             U             0   1500   vtnet1
fe80::5054:ff:feXX:d898%vtnet1 link#2 UHS           0  16384      lo0
fe80::%vtnet2/64   link#3             U             0   1500   vtnet2
fe80::5054:ff:feXX:448b%vtnet2 link#3 UHS           0  16384      lo0
fe80::%vtnet3/64   link#4             U             0   1500   vtnet3
fe80::5054:ff:feXX:6dfd%vtnet3 link#4 UHS           0  16384      lo0
fe80::%lo0/64      link#6             U             0  16384      lo0
fe80::1%lo0        link#6             UHS           0  16384      lo0
fe80::a85e:b6XX:93YY:f37b%ovpns1 link#9 UHS         0  16384      lo0
fe80::%vtnet1_vlan1/64 link#10        U           258   1500 vtnet1_vlan1
fe80::5054:ff:feXX:d898%vtnet1_vlan1 link#10 UHS        0  16384      lo0
fe80::%vtnet1_vlan2/64 link#11        U             0   1500 vtnet1_vlan2
fe80::5054:ff:feXX:d898%vtnet1_vlan2 link#11 UHS        0  16384      lo0
fe80::%vtnet1_vlan3/64 link#12        U             0   1500 vtnet1_vlan3
fe80::5054:ff:feXX:d898%vtnet1_vlan3 link#12 UHS        0  16384      lo0
fe80::%vtnet1_vlan4/64 link#13        U             0   1500 vtnet1_vlan4
fe80::5054:ff:feXX:d898%vtnet1_vlan4 link#13 UHS        0  16384      lo0
fe80::%vtnet1_vlan5/64 link#14        U             0   1500 vtnet1_vlan5
fe80::5054:ff:feXX:d898%vtnet1_vlan5 link#14 UHS        0  16384      lo0


It is clearly visible, that routing appears to be identical for all interfaces. But when I try to connect to something in the .100 subnet, I get this:

root@router:~ #   ping 192.168.100.15
PING 192.168.100.15 (192.168.100.15): 56 data bytes
ping: sendto: Invalid argument
ping: sendto: Invalid argument
^C
--- 192.168.100.15 ping statistics ---
2 packets transmitted, 0 packets received, 100.0% packet loss
root@router:~ #   ssh 192.168.100.15
ssh: connect to host 192.168.100.15 port 22: Invalid argument


Not sure what "invalid argument" does mean in this context.

I am not entirely sure in which way I would sensibly use tcpdump like suggested in fraenkis post to find out more about the issue. So I simply tried to tcp-dump the above ssh connection attempt using the command:

tcpdump -i vtnet1_vlan5 -vv

But this command does not capture anything related to the commands above but only some unreplied ARP requests, IP6 router advertisements and some DNS UDP packets sent by clients in the .100 subnet to the router, also without replies.

Any more ideas?
#6
Unfortunately the Problem appeared again. I have now disabled Hardware VLan filtering in the Interface Settings Dialogue. Waiting again.
#7
Quote from: fraenki on September 04, 2017, 04:08:39 PM
Quote from: part_time_nerd on September 03, 2017, 08:00:08 PM
but now I get very strange routing problems on vLan 5: after some time running as expected, the subnet becomes inaccessible (from management subnet: "no route to host"). When I reboot opnsense, the routing turns back to normal and the subnet becomes available again.

This sounds quite odd. Just a few suggestions:


  • use tcpdump on the OPNsense CLI to debug traffic on the VLAN 5 interface when it becomes inaccessible
  • test option "Disable reply-to on WAN rules" in Firewall -> Settings -> Advanced (be aware that this may break network connectivity
  • make sure "Shared forwarding" is disabled in Firewall -> Settings -> Advanced (assuming you don't use CaptivePortal and Traffic Shaper)
  • disable hardware offload features in Interfaces -> Settings

Hello fraenki,

many thanks for your kind reply and suggestions!

I begann following your tips and for a starter, I did the following:

* firmware upgrade to 17.7.1
* I checked the option "Disable reply-to on WAN rules" as suggested.
* Shared forwarding is not enabled

Now I am waiting to see if routing failures will happen again. It was stable for 24 hours now but I had more than 24 hours of stable routing before, so the last word is not spoken here.

I'd prefer not to disable hardware offloading, on the other hand I'll be willing to give it a try if everything else fails.
#8
Hi all,

a few months ago I started changing my home network for the (hopefully) better, beginning with the installation of opnsense as core router. My vanilla "one router, one subnet, one SSID" home network was replaced by a set of 5 subnets: one management (vlan 1), one private (2), one for kids (3), one for guests (4) and one for all the IoT crap I like but dont trust (vLan5). Since this is a private side project it had to fit my sparse spare time and since I encountered certain problems creating a proper external WLan AP solution for vLans 2-5, the router ran for months while we continued to use only vLan 1.

Now the AP is there and I have moved the 5-vLan setup to production. That went quite well so far but now I get very strange routing problems on vLan 5: after some time running as expected, the subnet becomes inaccessible (from management subnet: "no route to host"). When I reboot opnsense, the routing turns back to normal and the subnet becomes available again. There are no scheduled firewall rules or anything of that sort in my configuration that I would be aware of. Morevoer, I could not find any pfsense log entries that look suspicious around the time when the subnet becomes unavailable. Unfortunately my BSD knowledge ist rather limited so I didnt look very far under the hood.

Facts that might be worth mentioning:
* vLan5 is trunked on the same LAN port as all other vLans, none of which exhibits this problem.
* It is, however, the only one of the vLans that is not WAN-routed by default but single IPs get their tailored set of access rules to whereever they need to.
* Allowed outgoing connections also die when the routing dies.
* Listing the routes in the OPNsense GUI "after the fact" still lists correct routes for the affected subnet.

Here is my version information:

OPNsense 17.7-amd64
FreeBSD 11.0-RELEASE-p11
OpenSSL 1.0.2l 25 May 2017

I'd appreciate any suggestions on how to tackle problem and find the root cause of it.
#9
HAHAAA! I think I found the root cause (bug)!

That is quite a funny one!

@fabian, since I just upgraded opnsense today (see above) , the package had been reinstalled already and I left this one out.

But I decided to look at the network debugger in FF and see if I spot something. Aaaand there, in my FF session with the debugging extension, everything suddenly worked as expected!

This promptly caused me compare the request- and response-headers of the affected vs the unaffected instance. Et voila... since opnsense just replaced my old router, it had inherited its hostname "router". And for that name, my day-to-day FF session has some stale BASIC-AUTH data stored, which it still sends along with every request! Apparently some component in opnsense that is not invoked in every request (probably XHR only because of the dreaded API, which really seems to haunt me in a way!) tries to make (opn)"sense" of that data and fails. These requests are then not properly answered, hence the weird appearance. And that also explains why the interface suddenly stopped working properly once it was actually productive.

The proper fix should probably be to ignore a failed BASIC AUTH in XHR API requests as long as a valid PHP session cookie is also available.
#10
Quote from: franco on March 24, 2017, 06:05:02 PMnone of those are normal. How much RAM and disk space does the VM have? Can you provide a screenshot of a defect of your choosing? :)

Thanks for answering so promptly. Of course, I can provide you with any information you desire.

The machine has 2 exclusive Cores, 1GB of RAM and 10GB of disk available.

root@router:~ #   df -h
Filesystem           Size    Used   Avail Capacity  Mounted on
/dev/ufs/OPNsense    9.7G    1.1G    7.8G    13%    /
devfs                1.0K    1.0K      0B   100%    /dev
devfs                1.0K    1.0K      0B   100%    /var/dhcpd/dev

#11
Hi all,

I just wanted to share with you the experiences I had trying to use the API in 17.1

I am a fresh opnsense user and I have my own, homegrown "dyndns" solution which includes a challenge-response authentication that I could not easily integrate with the mechaisms in the opnsense GUI. So I needed to extract the WAN ip from the router for a custom script. I found the API section in the dev wiki and it seemed the perfect tool for the task.

However, after reading the nice examples I began searching for the API documentation ... and did not find any. I could not believe that this should be it so I kept searching harder, but an hour later I had realized that the API docs are basically UTSL - so I cloned the source and grepped. Then I made a user for API access.
Since the API sections I had discovered did not intuitively match most of the rights that can be granted in the UI and I got lots of "Authentication error"s, I soon WTFed and granted all rights to it for the sake of exploration. Goodbye security. I did, however, not find a single API call that would allow me to simply extract the currently used WAN IP. After a lot of trying and cursing and at least three hours wasted, I disabled the API user, went to the shell and created a simple cronjob, that greps the WAN IP from ifconfig and dumps it into a file named "ip" in the web root. Done in 10 minutes.

Base line:

Trying to use the API turned out to be a very frustrating endeavour for me, mostly because the wiki page made it look like being a lot more usable than it actually is.
If you have an API but no documentation whatsoever, please mention that in a prominent place. Also the fact that by far not every part of opnsense has API support should be mentioned somewhere. If possible, the API should be extended in a way that authentication errors include information about which rights are missing to use a certain call.

Please note that this post is meant to be constructive criticism and not a personal insult. I am aware that this is open source and I am not entitled to demand anything. I thought, however, you might be interested in my experience.
#12

Hi all,

some time ago I decided our home network would need a bit more structure than just a Wifi Router with Merlin on it. So I replaced our three switches with vLan capable units and a piece of hardware with four intel GBit ports for a router and created a plan on how to put family, guests, kids, home automation and shady china hardware into separate networks for better manageability. I just had no experiency with any OSS router distro so I went searching. After some reading I decided to try opnsense. I downloaded and installed the 17.1 into a KVM guest with 3 of the network interfaces directly attached. It installed flawlessly, I created the required interfaces and routing and put it into the closet and I am still able to write this little story. So far, it was a success.

Unfortunately, shortly after installing the router, the opnsense GUI started to act strangely. I am using the default root user which I did not alter in any way (except the pw of course).


  • For example, in the dashboard, the GUI shows the IDS (suricata) as running (also, the console mentions it starting on boot). When I go to Services/IDS, it is "off". No rules are shown and no alerts. When I try to switch it to "on", the spinner in the "Apply" button starts spinning forever and that is it.
  • When I go to "Reporting/Insight", there is no graph drawn. The drop down in the lower graph shows two items: "401" and "Authentication failed". Resetting the RRD Data in Setttings did not fix it, but caused Reporting/Health to fail. with a JS Alert "Error while fetching RRD list". Maybe the latter is temporary.
  • When I go to "System/Firmware/udpates" I can click on the "Check for updates" button. It then says "Checking... (may take up to 30 seconds)" ... forever. When I go to the "Packages" tab, I get "No packages were found on your system. Please call for help.". At this point I made a backup of the VM and went to the console, selecting "upgrade from console". It went on and installed roughly 50 packages (I did not recognize any suspicious error messages scrolling by \o/), effectively moving opnsense from 17.1 to 17.1.3 and rebooted. However all the GUI errors mentioned above are still present, which makes me wonder how to deal with this.
  • I found several other flaws which I will report separately.