BAD STATE

Started by simonmcn, November 12, 2024, 11:29:04 PM

Previous topic - Next topic
I'm at a loss trying to understand my issues and could do with some help please.

I am running opnsense on Proxmox.  I'm passing through a trunk with vlans 10,20,30,40,50,60.  I also have another ethernet that I use for a PPPOE connection.

My internet works, seems reliable and has no issue from my main vlans of 30 (servers) and 50 (wifi).

I created, for each vlan a firewall rule allowing access to everything.  I just wanted it to work, then the plan was to set up the rules, and disable the global rule.

But I'm getting hung ssh, nfs and cifs connections.  Everything to the internet is fine, but inter vlan just seems to work intermittently and badly.

Any advice, ideas on how to diagnose please ?

Working ping but hanging bulkier traffic can be caused by MTU mismatches. Perhaps some inconsistent jumbo frames?

Maybe I dont understand correctly but

QuoteI am running opnsense on Proxmox.  I'm passing through a trunk with vlans 10,20,30,40,50,60.  I also have another ethernet that I use for a PPPOE connection.

My internet works, seems reliable and has no issue from my main vlans of 30 (servers) and 50 (wifi).

But then you show rules for VLAN50 as well live log.

So what exactly has the problem here? The VLAN50 or the other VLANs 10,20,40,60?
Can you ping from a device in these VLANs/Subnets to the GWs?
Do you have unique subnets for these VLANs on OPNsense?
Do you have proper MASK configured on devices in these VLANs?

Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
APU2D2 - deceased
N5105 - i226-V | Patriot 2x8G 3200 DDR4 | L 790 512G - VM HA(SOON)
N100   - i226-V | Crucial 16G  4800 DDR5 | S 980 500G - PROD

Quote from: bartjsmit on November 13, 2024, 09:14:03 AM
Working ping but hanging bulkier traffic can be caused by MTU mismatches. Perhaps some inconsistent jumbo frames?

All my MTU should be 1500 or less.  It's certainly less for the wireguard and PPPOE interfaces.   The rest is configured as 1500

Quote from: Seimus on November 13, 2024, 11:08:58 AM
Maybe I dont understand correctly but

QuoteI am running opnsense on Proxmox.  I'm passing through a trunk with vlans 10,20,30,40,50,60.  I also have another ethernet that I use for a PPPOE connection.

My internet works, seems reliable and has no issue from my main vlans of 30 (servers) and 50 (wifi).

But then you show rules for VLAN50 as well live log.

So what exactly has the problem here? The VLAN50 or the other VLANs 10,20,40,60?
Can you ping from a device in these VLANs/Subnets to the GWs?
Do you have unique subnets for these VLANs on OPNsense?
Do you have proper MASK configured on devices in these VLANs?

Regards,
S.

I'm having issues with intervlan traffic.  Going from 50 to 30.  All traffic 30-30 which doesn't touch opnsense is fine.  and surprisingly web traffic from 50-30 seems to work fine, but it just may be that the packet size is small, or the session state different to cifs/nfs/ssh

All my masks are /24, each vlan is on a different subnet 10.150.10.0/24 for vlan 10, 10.150.30.0/24 for vlan 30 etc

All devices can ping both the gateways and the upstream servers.

All the devices can see each other, it seems that the session state is being reset by the pf I think perhaps.

I've just tested it.  if I turn the packet filter off.  All my problems go away.

more information:
root@OPNsense:~ # pfctl -si
Status: Enabled for 0 days 02:58:23             Debug: Loud

Interface Stats for vtnet0_vlan30     IPv4             IPv6
  Bytes In                        78388748                0
  Bytes Out                     1494252352                0
  Packets In
    Passed                          828852                0
    Blocked                            162                0
  Packets Out
    Passed                         1374002                0
    Blocked                              0                0

State Table                          Total             Rate
  current entries                     1768               
  searches                        45657736         4265.9/s
  inserts                           653014           61.0/s
  removals                          651256           60.8/s
Counters
  match                             728116           68.0/s
  bad-offset                             0            0.0/s
  fragment                               0            0.0/s
  short                                  0            0.0/s
  normalize                              4            0.0/s
  memory                                 0            0.0/s
  bad-timestamp                          0            0.0/s
  congestion                             0            0.0/s
  ip-option                              0            0.0/s
  proto-cksum                            0            0.0/s
  state-mismatch                     37898            3.5/s
  state-insert                          10            0.0/s
  state-limit                            0            0.0/s
  src-limit                              0            0.0/s
  synproxy                               0            0.0/s
  map-failed                             0            0.0/s

November 16, 2024, 03:01:07 AM #7 Last Edit: November 16, 2024, 03:04:00 AM by Seimus
You have there > 

Quotestate-mismatch                     37898            3.5/s

If a FW sees out of Order for TCP he will block it, TCP based traffic can pass thru a FW only after a Handshake is established.

S > D: TCP S
D > S: TCP SA
S > D: TCP A

Check the Live log. Create a filter with a specific source and destination from which you will test from to. Then if you see a session appear that is blocked, click the magnify glass and check the TCP Flags.

if there is really a TCP out of order it means you traffic is leaking somewhere or there is asymmetrical routing.

Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
APU2D2 - deceased
N5105 - i226-V | Patriot 2x8G 3200 DDR4 | L 790 512G - VM HA(SOON)
N100   - i226-V | Crucial 16G  4800 DDR5 | S 980 500G - PROD

Quote from: Seimus on November 16, 2024, 03:01:07 AM
You have there > 

Quotestate-mismatch                     37898            3.5/s

If a FW sees out of Order for TCP he will block it, TCP based traffic can pass thru a FW only after a Handshake is established.

S > D: TCP S
D > S: TCP SA
S > D: TCP A

Check the Live log. Create a filter with a specific source and destination from which you will test from to. Then if you see a session appear that is blocked, click the magnify glass and check the TCP Flags.

if there is really a TCP out of order it means you traffic is leaking somewhere or there is asymmetrical routing.

Regards,
S.

Thank you !!

I found that I had a server with a foot in the VLAN 30 and VLAN 50.  Both had the same default gateway but the server decided to reply on the interface that was already in the vlan rather than responding via the df gateway.

Much appreciated !

That's how it's supposed to work. A host will always prefer a locally connected interface over a static route. Don't connect hosts via more than one interface/network.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

November 18, 2024, 01:58:49 PM #10 Last Edit: November 18, 2024, 02:01:02 PM by bimbar
Quote from: Patrick M. Hausen on November 18, 2024, 01:36:30 PM
That's how it's supposed to work. A host will always prefer a locally connected interface over a static route. Don't connect hosts via more than one interface/network.

NM didn't read enough of the posts ;) .

Just a comment though, more specific routes are always preferred regardless of connection, but in this case, the subnet mask was the same, and then, distance / metric / connection are relevant.

Quote from: bimbar on November 18, 2024, 01:58:49 PM
Just a comment though, more specific routes are always preferred regardless of connection, but in this case, the subnet mask was the same, and then, distance / metric / connection are relevant.

Correct, I overgeneralised a bit, sorry. More specifics will take precedence over locally connected.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

In hindsight I should have known.  I have come across problems like this before.
The issue was caused by the fact that I had implemented vlans and wanted to maintain connectivity in case I broke a vlan and wanted connectivity from elsewhere in order to fox it.  Not a dissimilar situation to having something visible on a management interface on one vlan vs a service interface on another vlan.

It is difficult sometimes to make the leap of understanding that an interface has a higher priority than a default gateway.

I apologise if you think this was a waste of peoples time.  It was not intended to be.


No worries,

multi-homed setup is not unusual. You just need to make sure all is configured well from perspective of the routing.

As mentioned in your case, more specific routes will take precedence, if they are equal than Administrative distance play a huge role. Directly connected has better AD than a static route.

See >
Quote
Route Source                                                                                                      Default Distance Values

Connected interface                                                                                                        0
Static route                                                                                                                        1
Enhanced Interior Gateway Routing Protocol (EIGRP) summary route                                5
External Border Gateway Protocol (BGP)                                                                       20
Internal EIGRP                                                                                                               90
Interior Gateway Routing Protocol (IGRP)                                                                      100
Open Shortest Path First (OSPF                                                                                        110
Intermediate System-to-Intermediate System (IS-IS)                                                      115
Routing Information Protocol (RIP)                                                                                      120
Exterior Gateway Protocol (EGP)                                                                                      140
On Demand Routing (ODR)                                                                                              160
External EIGRP                                                                                                              170
Internal BGP                                                                                                                      200
Unknown*                                                                                                                      255
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
APU2D2 - deceased
N5105 - i226-V | Patriot 2x8G 3200 DDR4 | L 790 512G - VM HA(SOON)
N100   - i226-V | Crucial 16G  4800 DDR5 | S 980 500G - PROD