-
Notifications
You must be signed in to change notification settings - Fork 928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom Resources deleted during Karmada Control Plane Upgrade #6098
Comments
Yes, you are right, Karmada wouldn't and shouldn't delete users' resources. In this case, the |
Can you elaborate on what you did? How bad the image it is? Does the controller-manager go to the crash loop? The |
This is the first log that shows that the resource was not found from the informer cache and karmada-apiserver. Do you have any operator or something running at that time? which deletes the resource unexpectedly? |
I was doing a quick rebase of our internal fork (which includes a custom FederatedResourceQuota controller, which I will discuss in the proposed design for that feature). While resolving merge conflicts I missed including one of my changes which caused the controller-manager cmd to CrashLoopBackoff. Obviously in DEV we have a deployment pipeline which prevents these kinds of bad images from being promoted, but since this was this was QA I was less careful. Additionally we'll soon be relying on the platform @jabellard is helping build, so these types of upgrade errors should be minimized.
None that would delete custom resources. I'll try to reproduce this and add more information. |
OK. By the way, if you have the karmada-apiserver audit log, you can find when and by whom those resources were deleted. |
What happened:
Upon upgrading our QA Karmada-control plane to update deployment images and update some member cluster kubeconfigs, two things happened:
When the controller-manager came back up, we noticed that all custom resources (including FlinkDeployments) that had been scheduled on the two member clusters with outdated kubeconfig files were deleted from the Karmada control-plane. Below I've included relevant logs from the controller-manager for identity resource
test-dev-identity
.My understanding is that even if Karmada is unable to contact one or more clusters, the worst it should do is just deschedule work on those clusters. I was not under the impression that Karmada would delete the resources from the control-plane altogether.
Did this happen because the controller-manager crashed along with the clusters using outdated kubeconfigs? So when it attempted to initialize informers for the respective clusters it was unable to do so, and it decided to clean up orphan work?
Is there a way to prevent Karmada from deleting resources that have been applied to the control-plane by itself?
What you expected to happen:
Karmada should ideally never delete user-applied resources without the user explicitly deleting the resources themselves. Is there a configuration that we could set to mitigate this?
How to reproduce it (as minimally and precisely as possible):
In our QA setup, we pushed a bad image to the controller-manager and also happened to use two kubeconfigs that had outdated certs. When the controller manager was fixed and brought back up, all resources that had been scheduled on the member clusters with bad kubeconfigs were deleted.
Anything else we need to know?:
Environment:
The text was updated successfully, but these errors were encountered: