Huge bogonsv6 list = really long boot time

Started by puithove, April 21, 2021, 11:38:56 PM

Previous topic - Next topic
April 21, 2021, 11:38:56 PM Last Edit: April 21, 2021, 11:41:01 PM by puithove
I don't reboot my router very often luckily, usually just when doing upgrades.  However, I recently started noticing a really long startup time which is spent waiting on "Configuring Firewall..." with the CPU just pegged out.  We're talking times of 10+ minutes waiting like that.

I decided to dig through all of my firewall rules to figure out what was taking so long.  In doing so I looked at the number of addresses defined for the bogonsv6 alias, and as soon as I saw that I knew what the holdup was.  There are 10s of thousands that are added to the firewall rules for blocking bogons via that alias (makes sense given the address space of IPv6).

I have turned off blocking bogons for now and that makes an instant difference. I wonder though if there are any optimizations that can be made to how firewall rulesets are loaded that could reduce the amount of time it takes to get through this at bootup time.  Considering that making changes to firewall rulesets and applying changes even with the block bogons enabled is very quick, it kinda surprises me that it takes so long to load the ruleset at bootup (though maybe applying changes doesn't force a full ruleset reload).

Other thought - I've had the Block Bogons feature enabled since the dawn of time, and have had dual-stack IPv6 running for the past few years - so why is this suddenly an issue more recently?

Currently on OPNsense 21.1.5-amd64

So the tough question is what hardware are you using? :)


Cheers,
Franci

I definitely should have included that.

VM running under Proxmox
Dual E5-2630v2 - 12 physical cores with hyperthreading enabled
VM given 24 threads (virtual cores)

Ok... can you try giving it fewer cores? About 6 should be enough to see a significant difference if you have multiple VMs running in Proxmox (or any other virtualisation solution).

Over the years I have seen this consistent pattern during support sessions that the VM will get no CPU time because it rarely, if ever, gets all the cores at once and thus the VM cannot be resumed a lot which results in sluggish operation.


Cheers,
Franco

Yea, you're not wrong on that.  Right now though it's running on an otherwise unused node, and was the same when I had only 12 vcpus on it (turned it up to try to improve boot time, which it as expected didn't).  I'm away from home right now so don't / can't take down the router to change it.  I'll play with it a bit more when I'm back home in a few days.

It seems you were right on the money.  Dropped the vcpu down to 6, and while it does spin up the CPU when applying the firewall rules, it doesn't take very long at all - 30s or so vs 10 minutes.  Definitely seems like CPU contention anyway.

Thanks!

Yay, happy that worked out fine. 8)


Cheers,
Franco