Post 24.4.1 (business) upgrade FW initiated traffic is blocking

Started by morik_opnsense, June 28, 2024, 03:11:21 PM

Previous topic - Next topic
(updated w/ logs - initial post was done via cellphone)
Hello experts,
When on 23.x business edition, life was great. 24. X Upgrade was to make it better. To a large degree it is. But, I have a strange new problem which I'm unable to solve. Two plugins: crowdsec (8080 port) and Telegraf (port 8086 for influx) stopped working. Logs indicate a connection timeout for both services. The destination endpoints (on opt6) are fine, and reachable to:from elsewhere both inside and outside the network; just not for when originating from firewall for non-ICMP traffic. No rule changes at my end. Results in a timeout.


traceroute to 192.168.100.21 (192.168.100.21), 64 hops max, 40 byte packets
1  crowdsec-lapi (192.168.100.21)  0.656 ms  0.416 ms  0.330 ms


Live log doesn't show packet blocks. It does show "let packets from firewall itself in the out direction but nothing in the reverse direction (which should be allowed by default given the stateful nature of flows).


curl -vi --connect-timeout 10 http://crowdsec-lapi.esco.ghaar:8080
* Host crowdsec-lapi.esco.ghaar:8080 was resolved.
* IPv6: (none)
* IPv4: 192.168.100.21
*   Trying 192.168.100.21:8080...
* ipv4 connect timeout after 9999ms, move on!
* Failed to connect to crowdsec-lapi.esco.ghaar port 8080 after 10006 ms: Timeout was reached
* Closing connection
curl: (28) Failed to connect


interface capture shows:


Servers
vlan0.100 2024-06-28
07:37:50.442037 f4:90:ea:00:9f:72 00:50:56:82:d8:b4 ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.1.31315 > 192.168.100.21.8080: Flags [S], cksum 0x8070 (correct), seq 445912424, win 65535, options [mss 8960,nop,wscale 12,sackOK,TS val 1292126707 ecr 0], length 0
Servers
vlan0.100 2024-06-28
07:37:50.442400 00:50:56:82:d8:b4 f4:90:ea:00:9f:72 ethertype IPv4 (0x0800), length 74: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.21.8080 > 192.168.100.1.31315: Flags [S.], cksum 0xe967 (correct), seq 3873949677, ack 445912425, win 43440, options [mss 1460,sackOK,TS val 3838080763 ecr 1292126707,nop,wscale 9], length 0
Servers
vlan0.100 2024-06-28
07:37:51.442697 f4:90:ea:00:9f:72 00:50:56:82:d8:b4 ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.1.31315 > 192.168.100.21.8080: Flags [S], cksum 0x7c87 (correct), seq 445912424, win 65535, options [mss 8960,nop,wscale 12,sackOK,TS val 1292127708 ecr 0], length 0
Servers
vlan0.100 2024-06-28
07:37:51.443231 00:50:56:82:d8:b4 f4:90:ea:00:9f:72 ethertype IPv4 (0x0800), length 74: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.21.8080 > 192.168.100.1.31315: Flags [S.], cksum 0xe57e (correct), seq 3873949677, ack 445912425, win 43440, options [mss 1460,sackOK,TS val 3838081764 ecr 1292126707,nop,wscale 9], length 0
Servers
vlan0.100 2024-06-28
07:37:52.462713 00:50:56:82:d8:b4 f4:90:ea:00:9f:72 ethertype IPv4 (0x0800), length 74: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.21.8080 > 192.168.100.1.31315: Flags [S.], cksum 0xe182 (correct), seq 3873949677, ack 445912425, win 43440, options [mss 1460,sackOK,TS val 3838082784 ecr 1292126707,nop,wscale 9], length 0
Servers
vlan0.100 2024-06-28
07:37:53.642675 f4:90:ea:00:9f:72 00:50:56:82:d8:b4 ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.1.31315 > 192.168.100.21.8080: Flags [S], cksum 0x73ef (correct), seq 445912424, win 65535, options [mss 8960,nop,wscale 12,sackOK,TS val 1292129908 ecr 0], length 0
Servers
vlan0.100 2024-06-28
07:37:53.643161 00:50:56:82:d8:b4 f4:90:ea:00:9f:72 ethertype IPv4 (0x0800), length 74: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.21.8080 > 192.168.100.1.31315: Flags [S.], cksum 0xdce6 (correct), seq 3873949677, ack 445912425, win 43440, options [mss 1460,sackOK,TS val 3838083964 ecr 1292126707,nop,wscale 9], length 0
Servers
vlan0.100 2024-06-28
07:37:55.662758 00:50:56:82:d8:b4 f4:90:ea:00:9f:72 ethertype IPv4 (0x0800), length 74: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.21.8080 > 192.168.100.1.31315: Flags [S.], cksum 0xd502 (correct), seq 3873949677, ack 445912425, win 43440, options [mss 1460,sackOK,TS val 3838085984 ecr 1292126707,nop,wscale 9], length 0
Servers
vlan0.100 2024-06-28
07:37:57.842474 f4:90:ea:00:9f:72 00:50:56:82:d8:b4 ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.1.31315 > 192.168.100.21.8080: Flags [S], cksum 0x6387 (correct), seq 445912424, win 65535, options [mss 8960,nop,wscale 12,sackOK,TS val 1292134108 ecr 0], length 0
Servers
vlan0.100 2024-06-28
07:37:57.842885 00:50:56:82:d8:b4 f4:90:ea:00:9f:72 ethertype IPv4 (0x0800), length 74: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.21.8080 > 192.168.100.1.31315: Flags [S.], cksum 0xcc7e (correct), seq 3873949677, ack 445912425, win 43440, options [mss 1460,sackOK,TS val 3838088164 ecr 1292126707,nop,wscale 9], length 0
Servers
vlan0.100 2024-06-28
07:38:01.966765 00:50:56:82:d8:b4 f4:90:ea:00:9f:72 ethertype IPv4 (0x0800), length 74: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.21.8080 > 192.168.100.1.31315: Flags [S.], cksum 0xbc62 (correct), seq 3873949677, ack 445912425, win 43440, options [mss 1460,sackOK,TS val 3838092288 ecr 1292126707,nop,wscale 9], length 0


    Repeating of seq#s indicates (to me) that .100.1 (opnsense) is:

    • establishing socket open to .100.21:8080 (server in question)
    • server responds with SYN ACK
    • but opnsense doesn't respond with an ACK

    iii would mean opnsense is eating it up? But, why?

    I've tried enabling various combination of explicit rules to allow "opt 6 address" —> "server net + ports" combination to no avail. On disabling entire firewall, the first issuance of curl command succeeds. In that I get a 401 unauthorized. But immediately following it, subsequent connection attempts end up in a black hole.

    How might I go about troubleshooting this behavior?

    Edit#1: What is strange(r) indeed is that this behavior is occuring on every subnet as long as a) traffic originates from opnsense and b) initial few attempts of connection establishment succeed, but then subsequent attempts time out.


#nc -4znvw 10 192.168.0.58 443
Connection to 192.168.0.58 443 port [tcp/*] succeeded!
#nc -4znvw 10 192.168.0.58 443
nc: connect to 192.168.0.58 port 443 (tcp) failed: Operation timed out
# nc -4znvw 10 192.168.0.58 443
nc: connect to 192.168.0.58 port 443 (tcp) failed: Operation timed out