Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - isg-ek

#1
thx a lot for your super fast reply! we've decided for the first option, configured the persistent carp mode, reconnected. and will bring it back online - but for this last part, tomorrow :) and then check both machines config, log files, firewall rules again.

For the debugging process, I already checked the switchports config. both machines are on the same switch, port config is identical. If this switch isn't somehow broken or buggy, I guess the problem would most likely be within the opnsenses firewall rules (?). So I think after we checked that we'll have to go on with the TCP dumps tomorrow, and to be able to do this, I suppose we'd have to end the maintenance mode.

Another thought: I mentioned in my first post, we knew from the past that a fail-over to backup was triggered for node1 over all interfaces, sometimes node1 switched back to master, sometimes not. we've seen this, we could solve it, but we could never quite explain it. I wonder - whatever the reason in our network is for this, if we did not already have it for a while. Hope we'll be able to catch it

#2
Dear all,

we have a HA pair of OPNsenses, LAN trunk interface with around 10 Vlans on both machines, WAN and admin interfaces separate NICs. Matching carp interfaces on the nodes for the Vlans, Wan, admin.
We sync our config from node1 to 2, node1 is regulary master. Sync between the pair is done over a dedicated hardware interface with direct cable connection.

This setup has grown over the last 1 1/2 years, but worked like a charm with firmware updates, reboots, changes between master/backup mode, everything good - until this morning:
node1 was in state "BACKUP" since a few days - we've seen this happening before, but after rebooting the node, everything went back to normal in the past. So we checked for firmware updates in the morning, it showed one minor update, no reboot required, installed it one node1. And booted to get rid of the "backup" state. System took quite a long time to come up again, afterwards stayed in "backup" mode with GUI telling "system is booting, not all services started". This stayed for around another 15 minutes, then node1 became "master" for ~7 out of its 15 carp interfaces. We found out after few minutes that we had connectivity problems in some of the vlans, partially services not available, slow or broken internet connection and decided to take the "safe" way: and shut and turned off node1 completely. Node2 is master again since then, and everything is "fine" from the connectivity point of view. But of course, it can't stay like this.

So far, now to our question :) What's the safest way to get node1 back online again, to check its log files, status, and so on .. I suppose there must have been recent changes to our config which are the reason for the pair to behave like this, as it's never done so before. Maybe "force" the backup node2 to stay master, even if node1 comes back online? There is this button "enter persistent CARP Maintenance mode" on the backup node2 - I don't want to simply try it, never used it before and if I understand it right, it should normally be used on the regular master node before a system update/reboot?  Any suggestions.. ?

thx a lot & best
Silke