Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes metadata decoration #223

Merged
merged 23 commits into from
Aug 4, 2023
Merged

Conversation

mariomac
Copy link
Contributor

@mariomac mariomac commented Aug 3, 2023

Again, the huge number of files is caused by vendored dependencies from the Kubernetes API.

When Beyla runs in a K8s environment, it decorates the OTEL traces and metrics with source and destination pods/services, when available.

By default it's disabled, so it shouldn't affect the performance or memory of any running instance.

You can enable it explicitly or leave it in "autodectect" mode: the decorator will be enabled if it detects it's running inside Kubernetes.

The Prometheus metrics aren't yet supported, as I found many issues related to cardinality and variable number of labels (metadata of source or destination pods might not be always available with the Ingress/Egress traffic). I will enable them in a later PR.

It is also missing the documentation.

Copy link
Contributor

@grcevski grcevski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Wow this is incredible, amazing work!

I left a few questions, some because of my lack of knowledge, some because I thought there might be a bug.

@@ -85,6 +86,10 @@ prometheus_export:
Path: "/internal/metrics",
},
},
Kubernetes: transform.KubernetesDecorator{
Enable: transform.KubeDisabled,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming seems confusing here, at least to me. Should it say Enable: !transform.KubeDisabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, KubeDisabled is a string, as Enable can take three values: "true", "false" and "autodetect", defined by the constants KubeEnabled, KubeDisabled, KubeAutodetect.

return nil, fmt.Errorf("instantiating kubernetes metadata decorator: %w", err)
}
return func(in <-chan []HTTPRequestSpan, out chan<- []HTTPRequestSpan) {
decorator.refreshOwnPodMetadata()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for the pod metadata to change after it has been launched or it's immutable and it will require the pod to be restarted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's immutable as pods do not migrate from one host to another. Each time Kubernetes needs to change something about a Pod, it destroys it and starts a new pod.

for _, address := range addrs {
// check the address type and if it is not a loopback the display it
if ipnet, ok := address.(*net.IPNet); ok && !ipnet.IP.IsLoopback() {
if ipnet.IP.To4() != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have k8s cluster configured only with IPv6? Will the net interface still give us IPv4 version in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. If this address is IPv6, IP.To4() will return the lower 4 bytes of the address, so we can be sure that there is an address there. We don't store the returned ipv4 value, we only use it to check that the address is not nil.

In the later ipnet.IP.String(), either the IPv4 or IPv6 address is returned.

@@ -46,18 +46,22 @@ var defaultConfig = Config{
Path: "/internal/metrics",
},
},
Kubernetes: transform.KubernetesDecorator{
Enable: transform.KubeDisabled,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about the naming of Enable and KubeDisabled? Should it be !KubeDisabled? Can we perhaps use the Enabled() function in k8s.go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't because actually the Enabled() function in K8s.go uses this Enable property to decide whether the node is enabled, disabled, or needs to be autodetected.

The current configuration, from the user perspective would be as defaulting:

kubernetes:
    enabled: false

I will rename the constants to avoid that confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed constants to: EnableTrue, EnableFalse, EnableAutodetect and added an EnableDefault

@@ -149,6 +155,7 @@ func convertFromHTTPTrace(trace *ebpfcommon.HTTPRequestTrace) HTTPRequestSpan {
hostname = extractIP(trace.Host[:], int(trace.HostLen))
case EventTypeGRPCClient:
hostname, hostPort = extractHostPort(trace.Host[:])
peer = hostname
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! I'm wondering if we are messing up the peer and host for HTTPClient, also I'm wondering is it legal to set hostname and peer to be the same for a gRPC client? Perhaps we can get RemoteAddr on the eBPF side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we can get remoteAddr from the client-side calls. The way I understand it works (maybe I'm wrong) is that the connection is requested from the client side before the DNS are resolved, so from eBPF we can see the remote host but not the hostname.

Anyway I realized that this line is not needed with the current version of the code so I will remove it to avoid confusion, and we can decide later.

Copy link
Contributor

@MattFrick MattFrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of great stuff here. I agree with Nikola regarding the inverted enable logic, otherwise approve. Thanks.

@codecov-commenter
Copy link

codecov-commenter commented Aug 4, 2023

Codecov Report

Merging #223 (07148ea) into main (b0bb692) will decrease coverage by 43.29%.
Report is 3 commits behind head on main.
The diff coverage is 8.56%.

@@             Coverage Diff             @@
##             main     #223       +/-   ##
===========================================
- Coverage   81.23%   37.95%   -43.29%     
===========================================
  Files          35       35               
  Lines        2809     2951      +142     
===========================================
- Hits         2282     1120     -1162     
- Misses        387     1765     +1378     
+ Partials      140       66       -74     
Flag Coverage Δ
integration-test ?
unittests 37.95% <8.56%> (-2.81%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
pkg/internal/export/otel/metrics.go 72.04% <0.00%> (-8.03%) ⬇️
pkg/internal/export/otel/traces.go 80.00% <0.00%> (-8.31%) ⬇️
pkg/internal/pipe/config.go 56.41% <ø> (ø)
pkg/internal/transform/spanner.go 85.84% <ø> (-1.77%) ⬇️
pkg/internal/transform/k8s.go 4.39% <4.39%> (ø)
pkg/internal/transform/kube/informer.go 10.12% <10.12%> (ø)
pkg/internal/pipe/instrumenter.go 84.84% <100.00%> (-5.78%) ⬇️

... and 24 files with indirect coverage changes

@mariomac
Copy link
Contributor Author

mariomac commented Aug 4, 2023

As you both approved, merging to start working in further improvements. Please ping me if any of the recent changes need some attention. They are mostly to fix the Integration tests in K8s, which weren't running.

@mariomac mariomac merged commit 21a294c into grafana:main Aug 4, 2023
@mariomac mariomac deleted the k8s-decoration branch August 4, 2023 10:27
mattdurham pushed a commit to mattdurham/beyla that referenced this pull request Jan 22, 2025
* barely tested k8s decorator

* started to properly test k8s decoration

* Tested server-side stuff

* basic testing of client-side

* Pod2Pod & Pod2Service tests

* Tests moved to k8s testenv

* tested Pod2Pod as client

* tested pod 2 external

* tested external 2 pod

* Improved informer unit test

* grpc tests

* Moving templating to its own structure and proper deletion of resources

* OTEL traces decoration and tests

* Barely working prometheus k8s metadata decoration

* Release candidate

* removed leaked println and increased linter timeout

* Addressed code comments

* testing passing true as string to yaml

* Run kubernetes integration tests

* removed unneded metadata from K8s informers

* Fix integration tests logs for kind && removed unneeded metadata

* increasing integration test timeout to 60m

* increase timeouts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants