Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - sknr

#1
Quote from: CJ on March 02, 2024, 03:39:17 PM
I've never dealt with DHCP option 82 so I can't speak to that, unfortunately, although it sounds like that might be the cause of your issues.  With the old ONT it's handling that for you instead of just passing things directly on in bridge mode.

I would recommend turning on the gateway monitoring as well as using your ISP gateway to test your connectivity as you don't want to add additional complexity when your problem is definitely in the last mile.


So my journey into this investigation has come to an end... finally. So after having a few email threads with my ISPs tech support and a few calls with them to confirm a few things. It looks like what might have happened was that one of the field techs might have assigned my static IP address to another ONT.

So I decided to just power everything down, including the ONT and then "sleep on it" hoping that the 12-hour DHCP lease would expire, both on my end and whatever other ONT accidentally got my static IP address bound to it.

Woke up in the morning, did another clean install of OPNsense (just to be sure), powered on the ONT and everything works again.

Thanks for everyone in the thread for the suggestions! turns out the ISP just had a "whoops" moment.
#2
Just another update on this situation, i've just connected my OPNsense box to my old ISPs home gateway/ONT that isn't in bridge mode, etc. And my network connections seems to be stable.... in hindsight I should have probably tested that earlier, but here we are.

I guess this narrows things down to a point where there is some issue with compatibility between OPNsense and my ISP.

To clarify, this was working for about a week with no issues, and all of a sudden things are broken. My new ISP provides me with a static IP which is not under cgnat, and as far as they can see, there should be nothing wrong with my service.

Could it be down to some sort of problem with DHCP option 82 authentication? I.e it acknowledges my DHCP request but then goes "hangon you haven't responded with option 82 info" and cuts drops my connection? I'll have to hop on the phone with my ISP to try to see what more they can investigate.
#3
Quote from: CJ on March 01, 2024, 04:59:03 PM
I wonder if you're getting hit with the Intel 2.5G NIC issues.  Try putting a switch between OPNsense and the ONT.

There's some more info regarding the i225/i226 NICs in this thread. https://forum.opnsense.org/index.php?topic=38055.0

Yeah I saw that thread a while ago when I was trying to figure out why my i226 NIC wasn't auto-negotiating with my previous ONT, but recently the ISP gave me a "business version" which auto-negotiates fine to 2.5G. I have tried running link speeds all the way down to 10mbps, and the same issue persists, and i've also run a simple L2 switch between the ONT and OPNsense to see if that helped, unfortunately it doesn't.

I've even managed to narrow things down to a point were to get my internet working (either ping to 1.1.1.1/8.8.8.8 from OPNsense or even google.com from a client web browser) after pressing save and apply on the WAN interface settings page. So i'm no longer having to "reload services" vis SSH to get momentary internet access.

As another check, i've been running tcpdump -i igc0 -v and looking at the DHCP request and ACK messages from my ISP, now without knowing too much about DHCP option 82, I cannot see anything glaringly wrong in the tcpdump but funnily can see the ARP request from my WAN gateway showing a Cisco MAC address, which I assume is one of the ISP's routers.

During my testing I also stopped using 1.1.1.1/8.8.8.8 to check if I had access to the internet and reverted back to checking if I had access to my WAN gateway IP, and that also fails in the same way as checking against 1.1.1.1/8.8.8.8. So, I can only assume that something is wrong with my OPNsense, despite multiple re-installs, unless I've got something funky going on with my hardware... not quite sure how to try to validate that other than installing a simple linux distro on my appliance and seeing if the NICs start dropping packets after 20 seconds...
#4
Quote from: cookiemonster on March 01, 2024, 12:37:55 PM
If you speak with ISP tech support, I expect they will eventually be able to confirm what you have, that the problem is only when using OPN or the machine running OPN, so not on their side unless they need some mechanisms to use their network (option 82 you mention) and/or authentication details, vlan ids, etc. that aren't yet set in OPN. Not on their side I mean, if it works with another device, or theirs, they would not normally spend a lot of time helping you diagnose it. Hopefully it will go well though.
I'd start by looking for clues in your wan dhcp logs. Sorry, not much else to suggest if logs aren't helping.

Yeah I just heard back from the ISP and they have confirmed that there are no flags or issues on their side, from what I can tell DHCP is working fine as my WAN interface is getting the static IP address that I've confirmed with the ISP is assigned to my service. Which is all I can really ask from them, as I am running my own hardware post-ONT. From their side they don't need anything to be configured in terms of VLAN or additional authentication, as DHCP option 82 somehow takes care of all that (need to dive into what exactly option 82 might need as a valid response).

I've tried reinstalling OPNsense again, but this time wiping the m.2 to remove any residual issues that might somehow have persisted. But that didn't help either, wondering if there is some sort of a hardware fault causing the NICs to just drop packets or stop working, i've tried forcing the NICs to 1Gbps, 2.5Gbps and even 100mbps just to see if that impacts things, but so far the same results, after reloading all the services i'm online for about 20 seconds before it dies again  :'(

But i'll dive into DHCP Option 82/RFC 3046 for now and see if that might be the solution to this problem!
#5
Quote from: sknr on March 01, 2024, 12:00:01 AM
Quote from: cookiemonster on February 29, 2024, 10:21:10 PM
you need to be methodical and specific so we can help you. "I get network access.." doesn't help much. From OPN, from a client? Is this a wireless or lan client, etc. Right now all being OK until something fails again doesn't sound much like is an OPN thing.

To clarify, things are definitely not OK, my test has been to use the OPNsense CLI, using option "11" to restart all the services, then press "8" to enter the shell and run "ping 8.8.8.8", and then after about 10-20 successful replies it fails. If I repeat the process of reloading all services, I can successfully get another batch of ping replies before them failing again.

Not sure if it's the best way of testing, but it implies that my OPNsense can access the WAN and get a response from Google for a bit before things get blocked again. Not really sure if it's an OPNsense thing or my ISP somehow blocking my connection after a while. Still waiting for someone from my ISP's tech support to take a look at my ticket.

I guess what seems odd, is that if I swap out my OPNsense box for an old Ubiquiti Edgerouter, my access to the internet seems to work, albeit having to wait for a while for the WAN DHCP address to figure itself out.


Just as a quick update, I can literally SSH into OPNsense from a client on the LAN network, type "11" to reload all services, then my internet works on my client and on OPNsense, after about 15, maybe 25 seconds, my internet connection is lost and my ping to 8.8.8.8 fails. Then I just jump back on the OPNsense SSH shell, type "11" again to reload all services, and my internet/pings to 8.8.8.8 start working again, until it fails again, until I reload all services again, etc....

I've tried my best to look through as many logs on OPNsense that I can find in debug mode, and nothing is throwing any errors that I can see.

Next steps for me is to spend tomorrow morning trying to get through to my ISP's tech support to see if they can confirm if there's something funky going on with my service.
#6
Quote from: cookiemonster on February 29, 2024, 10:21:10 PM
you need to be methodical and specific so we can help you. "I get network access.." doesn't help much. From OPN, from a client? Is this a wireless or lan client, etc. Right now all being OK until something fails again doesn't sound much like is an OPN thing.

To clarify, things are definitely not OK, my test has been to use the OPNsense CLI, using option "11" to restart all the services, then press "8" to enter the shell and run "ping 8.8.8.8", and then after about 10-20 successful replies it fails. If I repeat the process of reloading all services, I can successfully get another batch of ping replies before them failing again.

Not sure if it's the best way of testing, but it implies that my OPNsense can access the WAN and get a response from Google for a bit before things get blocked again. Not really sure if it's an OPNsense thing or my ISP somehow blocking my connection after a while. Still waiting for someone from my ISP's tech support to take a look at my ticket.

I guess what seems odd, is that if I swap out my OPNsense box for an old Ubiquiti Edgerouter, my access to the internet seems to work, albeit having to wait for a while for the WAN DHCP address to figure itself out.
#7
Quote from: cookiemonster on February 29, 2024, 03:45:40 PM
I can only think of three reasons to go from a working state to a non-working state after some days.
First is that you had settings uncommited to config, like not saving a rule or working from a live media.
Second is a hardware problem.
Third is enablement of services that overpower the machine.
Otherwise I can't see this happening.
I lean to the third. I'd like to suggest to disable one of the two, probably Suricata IPS if you are not protecting somethingn especific with it. The thinking is those two will use a big chunk of resources, maybe the system started swapping and then killed a core service needed.

Hello again cookiemonster!

AFAIK I performed the reduction of services as usual, and currently I'm running a clean install from the ISO image and i've not enabled anything other than the bog-standard services, CPU is trickling in at 1-4% util, RAM at 4%, MBUF util at 1%, Disk usage at 1%, and temps at 27 degrees. The odd part is that for about 20seconds after reconnecting the WAN link I get network access again, which baffles me!
#8
I'd like to believe that my ISP isn't blocking me either, but it also seems odd that a fresh install of OPNsense is somehow blocking all of my WAN traffic. Especially as it was working fine for 3-4 days. My CPU utilisation hasn't gone above 40% and my memory usage is around 10-15%, so no real "red-flags" regarding services pushing my Intel N100 to any limits. CPU temp is also around 27-30 degrees during idle and load. The IP provided by my ISP is 185.96.xxx.xxx so it's not in the CGnat range and I specifically asked my ISP before install that there was no CGnat.

My search for an answer continues!
#9
Yeah I've already raised a ticket with my ISP, they are fairly open with customers running their own equipment after the ONT. They even recently traded out my "standard" ONT for a Nokia XS-010G-Q, which requires a router to function.

Still waiting for The ISPs tech support to confirm if they see anything unusual on my connection.
#10
Hello,

My OPNsense adventures continue. My previous ask for help ended up being user error, so I hope I'm not doing something dumb again!

I have been running OPNsense for about a week now, and I've got a very odd one today where when I got home from work it seems like my OPNsense has stopped working. My gateway monitoring was informing me that packet loss was 100%, and subsequently I had no internet access.

My setup is fairly simple, with my WAN on igc0, LAN on igc1 and WLAN on igc2, I've configured basic firewall rules allowing both the LAN and WLAN networks to access the internet but not each other. In addition i've been running DNS over TLS using unbound, NATing any port 53 traffic to unbound and using cloudflare on port 853 on my outbound DNS queries. In addition I also decided to run IDS/IPS on my WAN port and Zenarmour on my LAN/WLAN ports. This was all working fine for a couple of days, and I even added SQM/QoS to improve my "buffer bloat".

I initially thought that this might be a service interruption from my ISP, but I noticed that if I rebooted my ONT or reconnected my CAT6a connection to the ONT, and left a ping command running on the OPNsense shell, I would get a valid connection to the internet (or 8.8.8.8) for about 20-30 seconds before my connection went down again. Subsequently I went throught the process of disabling extra services like IDS/IPS and Zenarmour, and even disabling/removing my DNS over TLS configuration, followed by multiple reboots and power cycles of both my ONT and OPNsense. To no avail, the same issue persisted, re-plug the WAN connection, 15-30 seconds of success followed by the Gateway monitor reporting packet loss, slowly building up from 10%, 15%, 20%, etc finally to 100% and then nothing worked again.

While checking the firewall live logs I could see that all WAN traffic coming in was being blocked by Default deny/state violation, but my firewall was still sending stuff out. And the ping requests to 8.8.8.8 were still getting responses on the OPNsense shell while I could see that 8.8.8.8:53 was being blocked on the "WAN in" the live log.

As the next step I connected my display and keyboard, and did a "reset to factory settings", that didn't work either. As a last resort I then tried to do a clean install from the ISO image on a USB, assigned the usual WAN/LAN interfaces, and still didn't get a network connection, despite that working when I first started.

Luckily I have an old EdgeRouter-lite-3 sitting around and managed to get that re-configured and up and running and it seems like that now give me an internet connection, which leads me to think that either:

a) the ISP has has decided to blocked my OPNsense machine (mac address)
b) it was a complete fluke that I got OPNsense running last week and now something fundamental is missing which I'm not aware of

The appliance running OPNsense, is an Intel N100 based HUNSN box, 16Gb Ram, 250Gb SSD, 4 x Intel i226-V NICS, and I'm running the latest release of OPNsense.

My ISP provides me with a static IP address which is assigned via DHCP Option 82, which I can see get assigned on my WAN interface, before I loose my connection to packet loss.

Any tips or advice is much appreciated!

#11
So, quick update on this situation, seems like I foolishly was setting the igc interface to 1000baseTX full-duplex instead of 1000baseT full-duplex... I was clicking around in the UI and happened to get it to 1000baseT and it started working, still broken when set to autoselect, but forcing the interface to 1000baseT fixes the negotiation issue.

I followed this up with a call to my ISP and got them to put my ONT in bridge mode, and everything is now working as expect at 1Gbps link speed.

wohoo!
#12
Quote from: cookiemonster on February 21, 2024, 11:16:12 PM
it could be. I didn't realise that it was negotiating fine with other devices. Post what you get from:
$ grep igc0 /var/run/dmesg.boot

and
$ grep igc0 /var/run/dmesg.boot



Not sure what the second command you wanted me to run was, but here is what i got from the first command


root@OPNsense:~ # grep igc0 /var/run/dmesg.boot
igc0: <Intel(R) Ethernet Controller I226-V> mem 0x80a00000-0x80afffff,0x80b00000-0x80b03fff irq 18 at device 0.0 on pci2
igc0: Using 1024 TX descriptors and 1024 RX descriptors
igc0: Using 4 RX queues 4 TX queues
igc0: Using MSI-X interrupts with 5 vectors
igc0: Ethernet address: a8:b8:e0:01:1f:cb
igc0: netmap queues/slots: TX 4/1024, RX 4/1024
#13
Quote from: cookiemonster on February 21, 2024, 03:20:19 PM
then seems that the problem to be that the other end only accepts a link at 100 M. If that's the case, nothing you can do on the other to force it.

It's an interesting one, if I connect another device to the same port with auto-negotiation enabled, the link will auto-negotiate to 1Gbps, so it might be an odd interop issue between Intel i226-V and whatever chipset is used in the Nokia XS-2426G-A, where for whatever reason the link-speed negotiation is falling back to 100mbps.

I was hoping it would just be some sort of configuration change on the freeBSD side of things, or a certain flag/option in ifconfig that could solve the issue.
#14
Quote from: cookiemonster on February 20, 2024, 10:47:47 PM
please be sure you are using the right Ethernet cable type for 1Gb or more.

Already checked with a few different cables that are running at full 1Gb between other clients, even tried running a simple L2 network switch in circuit as well to see if it changed anything, but it seems like it was still just running at the lower 100mbps speed.
#15
Hello,

I'm running OPNsense 24.1.2 (on FreeBSD 13.2-RELEASE-p10) on an Intel N100 based mini-PC with 4 x Intel i266-V NICs. I'm having an odd issue where when left on "auto", the link speed is negotiated at 100baseTX full-duplex instead of the expected 1000baseTX full-duplex.

My "WAN" interface is on igc0 and when I force the link-speed to anything other than 100baseTX full-duplex (either via the CLI or webUI) the link seems to go down.

Checking dmesg | grep igc0, all I see is that the interface link state immediately switches to "down" when I force the link speed to 1000baseTX and back to "up" when I revert back to 100baseTX.

If it helps at all, the WAN port (igc0) is connecting to a Nokia XS-2426G-A, which seems to be reporting the link as running at full-duplex but max bit rate being set to "100". Due to my ISP's restrictions I'm not able to access the setting's page to try to force the link speed on the device.


Any solutions, or troubleshooting tips to try to resolve this would be greatly appreciated!