[BUG] OVN EIP, FIP External Gateway is not configured. #5050

inyongma1 · 2025-03-04T04:07:59Z

Kube-OVN Version

v1.13

Kubernetes Version

v1.30

Operation-system/Kernel Version

"Rocky Linux 8.10 (Green Obsidian)"

Description

vnode-103-177, vnode-117-46 are the external gw nodes and also worker nodes and vnode-103-176 is the master node and I deleted taint

NAME STATUS ROLES AGE VERSION
vnode-103-176 Ready control-plane 3d20h v1.30.10
vnode-103-177 Ready 3d20h v1.30.10
vnode-117-46 Ready 3d20h v1.30.10

vnode-103-177 ovn.kubernetes.io/external-gw=true
vnode-117-46 ovn.kubernetes.io/external-gw=true

vpc1 starter-backend-7ff5f85b46-8d9gh 1/1 Running 0 16m 192.168.0.4 vnode-103-176

starter-backend-7ff5f85b46-8d9gh is the pod which is running on the master node and non-external gw node.

my question is that I reboot two external gw nodes(vnode-103-177, vnode-117-46) but starter-backend-7ff5f85b46-8d9gh is still communicated, and I also checked ofctl of master node(non-external gw node vnode-103-176). I found some nat rule in there.

cookie=0x8f0de392, duration=1176.940s, table=15, n_packets=64, n_bytes=4994, idle_age=466, priority=100,ip,reg14=0x1,metadata=0x5,nw_dst=10.9.101.9 actions=ct(commit,table=16,zone=NXM_NX_REG11[0..15],nat(dst=192.168.0.4))

I thought that if all external gw is down, I should not be communicated by OEIP.

these are my configuration.

apiVersion: kubeovn.io/v1
kind: ProviderNetwork
metadata:
  name: external0
spec:
  defaultInterface: eth0
---
apiVersion: kubeovn.io/v1
kind: Vlan
metadata:
  name: vlan0
spec:
  id: 0
  provider: external0
---
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  name: external0
spec:
  protocol: IPv4
  cidrBlock: 10.9.0.0/16
  gateway: 10.9.0.1
  vlan: vlan0
  excludeIps:
  - 10.9.0.1..10.9.101.1
---
apiVersion: v1
kind: Namespace
metadata:
  name: vpc1
---
kind: Vpc
apiVersion: kubeovn.io/v1
metadata:
  name: vpc1
spec:
  namespaces:
  - vpc1
  enableExternal: true
---
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  name: vpc1-subnet1
spec:
  cidrBlock: 192.168.0.0/24
  default: false
  disableGatewayCheck: false
  disableInterConnection: true
  enableEcmp: true
  gatewayNode: ""
  gatewayType: distributed
  #gatewayType: centralized
  natOutgoing: false
  private: false
  protocol: IPv4
  provider: ovn
  vpc: vpc1
  namespaces:
  - vpc1
---
kind: OvnEip
apiVersion: kubeovn.io/v1
metadata:
  name: eip-static
spec:
  externalSubnet: external0
  type: nat
---
kind: OvnFip
apiVersion: kubeovn.io/v1
metadata:
  name: eip-static
spec:
  ovnEip: eip-static
  ipName: starter-backend-7ff5f85b46-8d9gh.vpc1

Steps To Reproduce

I referenced this page and configuration is same as https://kubeovn.github.io/docs/stable/en/advance/ovn-eip-fip-snat/
oeip, ofip is confiugured.

Current Behavior

If two external gw nodes are rebooted, communicated by OEIP

curl 10.9.101.9

<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

Expected Behavior

If two external gw nodes are rebooted, I should not be communicated by OEIP

dosubot · 2025-03-04T04:09:56Z

The continued communication via OEIP even after rebooting both external gateway nodes is likely due to a fix implemented in Kube-OVN. This fix ensures that IP and route information is properly transferred during a node reboot, allowing for uninterrupted communication. The specific fix is mentioned in the changelog entry:

feedaca88 fix IP/route transfer on node reboot ^[1].

This means that even if the external gateway nodes are rebooted, the system is designed to maintain communication by transferring necessary IP and route information to ensure continuity.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other}

zbb88888 · 2025-03-04T11:17:21Z

Expected Behavior
If two external gw nodes are rebooted, I should not be communicated by OEIP

do you mean ? If two external gw nodes are rebooted, I should be communicated by OEIP

inyongma1 · 2025-03-04T11:20:14Z

I mean If two external gw nodes are down, external client should not be communicated by oeip.
Because packet from external should be passed through external gw node

inyongma1 · 2025-03-05T01:23:57Z

There are 3 nodes in the cluster: one control plane node and two worker nodes. I removed the taint from the control plane node so that pods can run on it. However, if the two worker nodes are set up as external gateways and both of these external gateway nodes go down, then there should no longer be an external gateway in the cluster, meaning that external communication should fail. Yet, in this case, it was confirmed that communication still occurs via the control plane.

zbb88888 · 2025-03-05T05:51:10Z

you have 3 nodes, only two nodes are gw node. there two gw nodes both down, the eip should be failed to connect.

do you mean? the eip is still working?

you can use k ko nbctl show to check the gw nodes before your test

inyongma1 · 2025-03-05T10:06:08Z

do you mean? the eip is still working?
Answer: yes It still working.

2nodes are down but, nat rule is still remain
router 2d09e8e5-856f-4487-9cf5-d01d18a4aa0b (vpc1)
port vpc1-vpc1-subnet1
mac: "3a:aa:5f:38:fd:5f"
networks: ["192.168.0.1/24"]
port vpc1-external0
mac: "36:98:75:25:01:a4"
networks: ["10.9.101.8/16"]
gateway chassis: [df13bc9c-f6bb-4162-ba49-8c8798746d31 b9e3320b-b025-4b2f-b7ba-0520725de523]
nat a023aada-62d9-4df1-882a-191dc3fd4d6f
external ip: "10.9.101.9"
logical ip: "192.168.0.4"
type: "dnat_and_snat"

zbb88888 · 2025-03-06T08:46:20Z

where is your client to test ping to the eip?

inyongma1 · 2025-03-06T08:49:21Z

Another node that is not in the Kubernetes cluster.
The node's ip and public ip's cidr is same.

zbb88888 · 2025-03-06T08:52:57Z

Does node down mean the node has no power and is not in the restart process?

cloud you tcpdump host <10.9.101.9> in the only one (non-gw) node? and see the packets?

inyongma1 · 2025-03-06T10:00:47Z

01:58:51.730161 3c:ec:ef:7f:49:23 > 8a:72:bb:b8:26:3f, ethertype IPv4 (0x0800), length 98: 10.9.20.130 > 10.9.101.9: ICMP echo request, id 10492, seq 8, length 64
01:58:51.730210 8a:72:bb:b8:26:3f > 3c:ec:ef:7f:49:23, ethertype IPv4 (0x0800), length 98: 10.9.101.9 > 10.9.20.130: ICMP echo reply, id 10492, seq 8, length 64
01:58:52.754429 3c:ec:ef:7f:49:23 > 8a:72:bb:b8:26:3f, ethertype IPv4 (0x0800), length 98: 10.9.20.130 > 10.9.101.9: ICMP echo request, id 10492, seq 9, length 64
01:58:52.754496 8a:72:bb:b8:26:3f > 3c:ec:ef:7f:49:23, ethertype IPv4 (0x0800), length 98: 10.9.101.9 > 10.9.20.130: ICMP echo reply, id 10492, seq 9, length 64
01:58:53.778219 3c:ec:ef:7f:49:23 > 8a:72:bb:b8:26:3f, ethertype IPv4 (0x0800), length 98: 10.9.20.130 > 10.9.101.9: ICMP echo request, id 10492, seq 10, length 64
01:58:53.778285 8a:72:bb:b8:26:3f > 3c:ec:ef:7f:49:23, ethertype IPv4 (0x0800), length 98: 10.9.101.9 > 10.9.20.130: ICMP echo reply, id 10492, seq 10, length 64
01:58:54.802328 3c:ec:ef:7f:49:23 > 8a:72:bb:b8:26:3f, ethertype IPv4 (0x0800), length 98: 10.9.20.130 > 10.9.101.9: ICMP echo request, id 10492, seq 11, length 64
01:58:54.802375 8a:72:bb:b8:26:3f > 3c:ec:ef:7f:49:23, ethertype IPv4 (0x0800), length 98: 10.9.101.9 > 10.9.20.130: ICMP echo reply, id 10492, seq 11, length 64
01:58:55.826239 3c:ec:ef:7f:49:23 > 8a:72:bb:b8:26:3f, ethertype IPv4 (0x0800), length 98: 10.9.20.130 > 10.9.101.9: ICMP echo request, id 10492, seq 12, length 64
01:58:55.826295 8a:72:bb:b8:26:3f > 3c:ec:ef:7f:49:23, ethertype IPv4 (0x0800), length 98: 10.9.101.9 > 10.9.20.130: ICMP echo reply, id 10492, seq 12, length 64
01:58:56.850282 3c:ec:ef:7f:49:23 > 8a:72:bb:b8:26:3f, ethertype IPv4 (0x0800), length 98: 10.9.20.130 > 10.9.101.9: ICMP echo request, id 10492, seq 13, length 64
01:58:56.850336 8a:72:bb:b8:26:3f > 3c:ec:ef:7f:49:23, ethertype IPv4 (0x0800), length 98: 10.9.101.9 > 10.9.20.130: ICMP echo reply, id 10492, seq 13, length 64

zbb88888 · 2025-03-06T10:47:22Z

router 2d09e8e5-856f-4487-9cf5-d01d18a4aa0b (vpc1)
port vpc1-vpc1-subnet1
mac: "3a:aa:5f:38:fd:5f"
networks: ["192.168.0.1/24"]
port vpc1-external0
mac: "36:98:75:25:01:a4"
networks: ["10.9.101.8/16"]
gateway chassis: [df13bc9c-f6bb-4162-ba49-8c8798746d31 b9e3320b-b025-4b2f-b7ba-0520725de523]
nat a023aada-62d9-4df1-882a-191dc3fd4d6f
external ip: "10.9.101.9"
logical ip: "192.168.0.4"
type: "dnat_and_snat"

the nat rule is exist is not a problem.

please run k ko sbctl show to check the gateway chassis: [df13bc9c-f6bb-4162-ba49-8c8798746d31 b9e3320b-b025-4b2f-b7ba-0520725de523]

check the chassis's hostname (node name), make sure the node is already shut down,

Please show the kubectl get nodes.,
The shutdown node is in not ready status?

inyongma1 · 2025-03-06T12:30:25Z

yes, node is not ready,
[root@vnode-103-176 ~]# k get nodes
NAME STATUS ROLES AGE VERSION
vnode-103-176 Ready control-plane 6d5h v1.30.10
vnode-103-177 NotReady 6d5h v1.30.10
vnode-117-46 NotReady 6d5h v1.30.10

[root@vnode-103-176 ~]# k ko sbctl show
Chassis "df13bc9c-f6bb-4162-ba49-8c8798746d31"
hostname: vnode-103-177
Encap geneve
ip: "10.9.103.177"
options: {csum="true"}
Port_Binding kube-ovn-pinger-xqpfk.kube-system
Port_Binding node-vnode-103-177
Chassis "257fcb36-ca3d-4da8-a931-d9e73b2ff76e"
hostname: vnode-103-176
Encap geneve
ip: "10.9.103.176"
options: {csum="true"}
Port_Binding starter-backend-7ff5f85b46-9rhcp.vpc1
Port_Binding kube-ovn-pinger-5mnlm.kube-system
Port_Binding coredns-55cb58b774-tfwwl.kube-system
Port_Binding starter-backend-7ff5f85b46-8d9gh.vpc1
Port_Binding starter-backend-7ff5f85b46-6x8g2.vpc1
Port_Binding starter-backend-7ff5f85b46-tvsxr.vpc1
Port_Binding starter-backend-7ff5f85b46-khc7b.vpc1
Port_Binding starter-backend-7ff5f85b46-5qp27.vpc1
Port_Binding coredns-55cb58b774-m7bp5.kube-system
Port_Binding starter-backend-7ff5f85b46-r7gqg.vpc1
Port_Binding starter-backend-7ff5f85b46-rkz4f.vpc1
Port_Binding starter-backend-7ff5f85b46-jknq8.vpc1
Port_Binding starter-backend-7ff5f85b46-wmw4q.vpc1
Port_Binding node-vnode-103-176
Chassis "b9e3320b-b025-4b2f-b7ba-0520725de523"
hostname: vnode-117-46
Encap geneve
ip: "10.9.117.46"
options: {csum="true"}
Port_Binding cr-ovn-cluster-external0
Port_Binding kube-ovn-pinger-bkj4k.kube-system
Port_Binding cr-vpc1-external0
Port_Binding node-vnode-117-46

zbb88888 · 2025-03-07T01:14:20Z

ok, it is an issue.

[root@vnode-103-176 ~]# k get nodes
NAME STATUS ROLES AGE VERSION
vnode-103-176 Ready control-plane 6d5h v1.30.10
vnode-103-177 NotReady 6d5h v1.30.10
vnode-117-46 NotReady 6d5h v1.30.10

Chassis "257fcb36-ca3d-4da8-a931-d9e73b2ff76e"
hostname: vnode-103-176

Chassis "b9e3320b-b025-4b2f-b7ba-0520725de523"
hostname: vnode-117-46

router 2d09e8e5-856f-4487-9cf5-d01d18a4aa0b (vpc1)
port vpc1-vpc1-subnet1
mac: "3a:aa:5f:38:fd:5f"
networks: ["192.168.0.1/24"]
port vpc1-external0
mac: "36:98:75:25:01:a4"
networks: ["10.9.101.8/16"]
gateway chassis: [df13bc9c-f6bb-4162-ba49-8c8798746d31 b9e3320b-b025-4b2f-b7ba-0520725de523]
nat a023aada-62d9-4df1-882a-191dc3fd4d6f
external ip: "10.9.101.9"
logical ip: "192.168.0.4"
type: "dnat_and_snat"

vnode-103-176 is not a gw nodes, but eip traffic via it.

@oilbeater could you please help take a look at this issue?

inyongma1 · 2025-03-11T06:26:31Z

@oilbeater @zbb88888 please help me.

oilbeater · 2025-03-11T08:38:10Z

Can you check which external IP address the Pod use when visit the external network. And it's better to run kubectl ko trace to see the logical flows for the external traffic.

zbb88888 · 2025-03-12T01:09:11Z

Oh, ovn fip can bond to the pod located node, which is distributed

oilbeater · 2025-03-12T02:01:23Z

@zbb88888 can you find the related document about distributed dnat_and_snat, I also remember it's distributed but I cannot find the document.

zbb88888 · 2025-03-12T03:20:17Z

please refrer this doc: https://kubeovn.github.io/docs/v1.13.x/en/vpc/ovn-eip-fip-snat/?h=ovn+eip

inyongma1 added the bug Something isn't working label Mar 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] OVN EIP, FIP External Gateway is not configured. #5050

[BUG] OVN EIP, FIP External Gateway is not configured. #5050

inyongma1 commented Mar 4, 2025 •

edited

Loading

dosubot bot commented Mar 4, 2025

zbb88888 commented Mar 4, 2025

inyongma1 commented Mar 4, 2025 •

edited

Loading

inyongma1 commented Mar 5, 2025

zbb88888 commented Mar 5, 2025

inyongma1 commented Mar 5, 2025 •

edited

Loading

zbb88888 commented Mar 6, 2025

inyongma1 commented Mar 6, 2025 •

edited

Loading

zbb88888 commented Mar 6, 2025

inyongma1 commented Mar 6, 2025

zbb88888 commented Mar 6, 2025 •

edited

Loading

inyongma1 commented Mar 6, 2025

zbb88888 commented Mar 7, 2025

inyongma1 commented Mar 11, 2025 •

edited

Loading

oilbeater commented Mar 11, 2025

zbb88888 commented Mar 12, 2025

oilbeater commented Mar 12, 2025

zbb88888 commented Mar 12, 2025

[BUG] OVN EIP, FIP External Gateway is not configured. #5050

[BUG] OVN EIP, FIP External Gateway is not configured. #5050

Comments

inyongma1 commented Mar 4, 2025 • edited Loading

Kube-OVN Version

Kubernetes Version

Operation-system/Kernel Version

Description

Steps To Reproduce

Current Behavior

Expected Behavior

dosubot bot commented Mar 4, 2025

zbb88888 commented Mar 4, 2025

inyongma1 commented Mar 4, 2025 • edited Loading

inyongma1 commented Mar 5, 2025

zbb88888 commented Mar 5, 2025

inyongma1 commented Mar 5, 2025 • edited Loading

zbb88888 commented Mar 6, 2025

inyongma1 commented Mar 6, 2025 • edited Loading

zbb88888 commented Mar 6, 2025

inyongma1 commented Mar 6, 2025

zbb88888 commented Mar 6, 2025 • edited Loading

inyongma1 commented Mar 6, 2025

zbb88888 commented Mar 7, 2025

inyongma1 commented Mar 11, 2025 • edited Loading

oilbeater commented Mar 11, 2025

zbb88888 commented Mar 12, 2025

oilbeater commented Mar 12, 2025

zbb88888 commented Mar 12, 2025

inyongma1 commented Mar 4, 2025 •

edited

Loading

inyongma1 commented Mar 4, 2025 •

edited

Loading

inyongma1 commented Mar 5, 2025 •

edited

Loading

inyongma1 commented Mar 6, 2025 •

edited

Loading

zbb88888 commented Mar 6, 2025 •

edited

Loading

inyongma1 commented Mar 11, 2025 •

edited

Loading