-
Notifications
You must be signed in to change notification settings - Fork 460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] OVN EIP, FIP External Gateway is not configured. #5050
Comments
The continued communication via OEIP even after rebooting both external gateway nodes is likely due to a fix implemented in Kube-OVN. This fix ensures that IP and route information is properly transferred during a node reboot, allowing for uninterrupted communication. The specific fix is mentioned in the changelog entry: This means that even if the external gateway nodes are rebooted, the system is designed to maintain communication by transferring necessary IP and route information to ensure continuity. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
Expected Behavior do you mean ? If two external gw nodes are rebooted, I should be communicated by OEIP |
I mean If two external gw nodes are down, external client should not be communicated by oeip. |
There are 3 nodes in the cluster: one control plane node and two worker nodes. I removed the taint from the control plane node so that pods can run on it. However, if the two worker nodes are set up as external gateways and both of these external gateway nodes go down, then there should no longer be an external gateway in the cluster, meaning that external communication should fail. Yet, in this case, it was confirmed that communication still occurs via the control plane. |
you have 3 nodes, only two nodes are gw node. there two gw nodes both down, the eip should be failed to connect. do you mean? the eip is still working? you can use |
do you mean? the eip is still working? 2nodes are down but, nat rule is still remain |
where is your client to test ping to the eip? |
Another node that is not in the Kubernetes cluster. |
Does cloud you tcpdump host <10.9.101.9> in the only one (non-gw) node? and see the packets? |
01:58:51.730161 3c:ec:ef:7f:49:23 > 8a:72:bb:b8:26:3f, ethertype IPv4 (0x0800), length 98: 10.9.20.130 > 10.9.101.9: ICMP echo request, id 10492, seq 8, length 64 |
router 2d09e8e5-856f-4487-9cf5-d01d18a4aa0b (vpc1)
port vpc1-vpc1-subnet1
mac: "3a:aa:5f:38:fd:5f"
networks: ["192.168.0.1/24"]
port vpc1-external0
mac: "36:98:75:25:01:a4"
networks: ["10.9.101.8/16"]
gateway chassis: [df13bc9c-f6bb-4162-ba49-8c8798746d31 b9e3320b-b025-4b2f-b7ba-0520725de523]
nat a023aada-62d9-4df1-882a-191dc3fd4d6f
external ip: "10.9.101.9"
logical ip: "192.168.0.4"
type: "dnat_and_snat"
the nat rule is exist is not a problem. please run check the chassis's hostname (node name), make sure the node is already shut down, Please show the |
yes, node is not ready, [root@vnode-103-176 ~]# k ko sbctl show |
ok, it is an issue. [root@vnode-103-176 ~]# k get nodes Chassis "257fcb36-ca3d-4da8-a931-d9e73b2ff76e" Chassis "b9e3320b-b025-4b2f-b7ba-0520725de523" router 2d09e8e5-856f-4487-9cf5-d01d18a4aa0b (vpc1) vnode-103-176 is not a gw nodes, but eip traffic via it. @oilbeater could you please help take a look at this issue? |
@oilbeater @zbb88888 please help me. |
Can you check which external IP address the Pod use when visit the external network. And it's better to run |
Oh, ovn fip can bond to the pod located node, which is distributed |
@zbb88888 can you find the related document about distributed dnat_and_snat, I also remember it's distributed but I cannot find the document. |
please refrer this doc: https://kubeovn.github.io/docs/v1.13.x/en/vpc/ovn-eip-fip-snat/?h=ovn+eip |
Kube-OVN Version
v1.13
Kubernetes Version
v1.30
Operation-system/Kernel Version
"Rocky Linux 8.10 (Green Obsidian)"
Description
vnode-103-177, vnode-117-46 are the external gw nodes and also worker nodes and vnode-103-176 is the master node and I deleted taint
NAME STATUS ROLES AGE VERSION
vnode-103-176 Ready control-plane 3d20h v1.30.10
vnode-103-177 Ready 3d20h v1.30.10
vnode-117-46 Ready 3d20h v1.30.10
vnode-103-177 ovn.kubernetes.io/external-gw=true
vnode-117-46 ovn.kubernetes.io/external-gw=true
vpc1 starter-backend-7ff5f85b46-8d9gh 1/1 Running 0 16m 192.168.0.4 vnode-103-176
starter-backend-7ff5f85b46-8d9gh is the pod which is running on the master node and non-external gw node.
my question is that I reboot two external gw nodes(vnode-103-177, vnode-117-46) but starter-backend-7ff5f85b46-8d9gh is still communicated, and I also checked ofctl of master node(non-external gw node vnode-103-176). I found some nat rule in there.
cookie=0x8f0de392, duration=1176.940s, table=15, n_packets=64, n_bytes=4994, idle_age=466, priority=100,ip,reg14=0x1,metadata=0x5,nw_dst=10.9.101.9 actions=ct(commit,table=16,zone=NXM_NX_REG11[0..15],nat(dst=192.168.0.4))
I thought that if all external gw is down, I should not be communicated by OEIP.
these are my configuration.
Steps To Reproduce
Current Behavior
If two external gw nodes are rebooted, communicated by OEIP
Expected Behavior
If two external gw nodes are rebooted, I should not be communicated by OEIP
The text was updated successfully, but these errors were encountered: