Flow control throttling kangal and sysdig? #333

flah00 · 2024-02-12T22:34:17Z

I recently installed sysdig on a test cluster. As it happens, it's the same cluster I run load tests on. While running sysdig I started a load test. Initially kangal controller timedout creating kubernetes resources. I increased the kubernetes client timeout.

And then the kangal controller was unable to create all of the kubernets resources on the first pass. But it succeeded on the second attempt. The error and stack trace are included.

Feb 12 09:30:50.961 kangal-controller E0212 14:30:50.108353 1 loadtest.go:472] there is a conflict with loadtest 'loadtest-coiling-lightningbug' between datastore and cache. it might be because object has been removed or modified in the datastore
Feb 12 09:30:50.961 kangal-controller Created JMeter resources
Feb 12 09:30:40.866 kangal-controller Created pods with test data
Feb 12 09:30:10.769 kangal-controller Remote custom data enabled, creating PVC
Feb 12 09:29:55.762 kangal-controller E0212 14:29:54.895207 1 loadtest.go:309] error syncing 'loadtest-coiling-lightningbug': client rate limiter Wait returned an error: context deadline exceeded, requeuing
Feb 12 09:29:55.762 kangal-controller error syncing loadtest, re-queuing
Feb 12 09:29:55.762 kangal-controller Error on creating new JMeter service
Feb 12 09:29:55.762 kangal-controller Created pods with test data
Feb 12 09:29:15.659 kangal-controller Remote custom data enabled, creating PVC
Feb 12 09:29:00.590 kangal-controller Created new namespace

Stack trace

github.com/hellofresh/kangal/pkg/controller.(*Controller).processNextWorkItem.func1
	/home/runner/work/kangal/kangal/pkg/controller/loadtest.go:299
github.com/hellofresh/kangal/pkg/controller.(*Controller).processNextWorkItem
	/home/runner/work/kangal/kangal/pkg/controller/loadtest.go:307
github.com/hellofresh/kangal/pkg/controller.(*Controller).runWorker
	/home/runner/work/kangal/kangal/pkg/controller/loadtest.go:240
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
	/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:157
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
	/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:158
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:135
k8s.io/apimachinery/pkg/util/wait.Until
	/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:92

Work around

I uninstalled sysdig and k8s api response time was much peppier. I'm already in touch with their support regarding the problem. Kangal controller also succeeds on its first pass. Clearly they have some work to do. But maybe kangal does as well?

Solution?

I'm not really sure what the expectation of flow control is... Should this be the exclusive province of cluster admins? Should charts offer some guidance for their apps? Should kangal include a priority level configuration and flow schema for its service account?

What do folks think?

The text was updated successfully, but these errors were encountered:

stale · 2024-03-28T09:10:15Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ChrisSmiles · 2025-03-06T13:12:39Z

This seems related so posting here, we've been noticing an issue where loadtests wouldnt start due to this error.

E0306 10:26:24.096471 1 loadtest.go:309] error syncing 'loadtest-ns': Post "https://172.20.0.1:443/api/v1/namespaces?timeout=5s": net/http: request canceled (Client.Timeout exceeded while awaiting headers), requeuing

We've now found the root cause of the delay in time taken to create a namespace (it was the webhook amazon-cloudwatch-observability-mutating-webhook-configuration - either removing or reducing the timeout to 3 seconds fixed the loadtest issue)

Would be nice to be able to configure an increased timeout on the kangal side to 15 seconds rather than 5 for the namespace creation as i'm not aware its a requirement for it to within 5 seconds functionally? Obviously you'd still always want to find the root cause but not sure kangal loadtests need to stop working if the issue does occur

stale bot added the stale label Mar 28, 2024

lucasmdrs added pinned issues that should be kept open and removed stale labels Mar 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flow control throttling kangal and sysdig? #333

Flow control throttling kangal and sysdig? #333

flah00 commented Feb 12, 2024

stale bot commented Mar 28, 2024

ChrisSmiles commented Mar 6, 2025

Flow control throttling kangal and sysdig? #333

Flow control throttling kangal and sysdig? #333

Comments

flah00 commented Feb 12, 2024

Stack trace

Work around

Solution?

stale bot commented Mar 28, 2024

ChrisSmiles commented Mar 6, 2025