-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubernetes metadata decoration #223
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Wow this is incredible, amazing work!
I left a few questions, some because of my lack of knowledge, some because I thought there might be a bug.
pkg/internal/pipe/config_test.go
Outdated
@@ -85,6 +86,10 @@ prometheus_export: | |||
Path: "/internal/metrics", | |||
}, | |||
}, | |||
Kubernetes: transform.KubernetesDecorator{ | |||
Enable: transform.KubeDisabled, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming seems confusing here, at least to me. Should it say Enable: !transform.KubeDisabled
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, KubeDisabled
is a string, as Enable
can take three values: "true"
, "false"
and "autodetect"
, defined by the constants KubeEnabled
, KubeDisabled
, KubeAutodetect
.
return nil, fmt.Errorf("instantiating kubernetes metadata decorator: %w", err) | ||
} | ||
return func(in <-chan []HTTPRequestSpan, out chan<- []HTTPRequestSpan) { | ||
decorator.refreshOwnPodMetadata() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for the pod metadata to change after it has been launched or it's immutable and it will require the pod to be restarted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's immutable as pods do not migrate from one host to another. Each time Kubernetes needs to change something about a Pod, it destroys it and starts a new pod.
for _, address := range addrs { | ||
// check the address type and if it is not a loopback the display it | ||
if ipnet, ok := address.(*net.IPNet); ok && !ipnet.IP.IsLoopback() { | ||
if ipnet.IP.To4() != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have k8s cluster configured only with IPv6? Will the net interface still give us IPv4 version in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. If this address is IPv6, IP.To4()
will return the lower 4 bytes of the address, so we can be sure that there is an address there. We don't store the returned ipv4 value, we only use it to check that the address is not nil.
In the later ipnet.IP.String()
, either the IPv4 or IPv6 address is returned.
pkg/internal/pipe/config.go
Outdated
@@ -46,18 +46,22 @@ var defaultConfig = Config{ | |||
Path: "/internal/metrics", | |||
}, | |||
}, | |||
Kubernetes: transform.KubernetesDecorator{ | |||
Enable: transform.KubeDisabled, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about the naming of Enable and KubeDisabled? Should it be !KubeDisabled? Can we perhaps use the Enabled() function in k8s.go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't because actually the Enabled()
function in K8s.go uses this Enable
property to decide whether the node is enabled, disabled, or needs to be autodetected.
The current configuration, from the user perspective would be as defaulting:
kubernetes:
enabled: false
I will rename the constants to avoid that confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed constants to: EnableTrue
, EnableFalse
, EnableAutodetect
and added an EnableDefault
pkg/internal/transform/spanner.go
Outdated
@@ -149,6 +155,7 @@ func convertFromHTTPTrace(trace *ebpfcommon.HTTPRequestTrace) HTTPRequestSpan { | |||
hostname = extractIP(trace.Host[:], int(trace.HostLen)) | |||
case EventTypeGRPCClient: | |||
hostname, hostPort = extractHostPort(trace.Host[:]) | |||
peer = hostname |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! I'm wondering if we are messing up the peer and host for HTTPClient, also I'm wondering is it legal to set hostname and peer to be the same for a gRPC client? Perhaps we can get RemoteAddr on the eBPF side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we can get remoteAddr from the client-side calls. The way I understand it works (maybe I'm wrong) is that the connection is requested from the client side before the DNS are resolved, so from eBPF we can see the remote host but not the hostname.
Anyway I realized that this line is not needed with the current version of the code so I will remove it to avoid confusion, and we can decide later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of great stuff here. I agree with Nikola regarding the inverted enable logic, otherwise approve. Thanks.
Codecov Report
@@ Coverage Diff @@
## main #223 +/- ##
===========================================
- Coverage 81.23% 37.95% -43.29%
===========================================
Files 35 35
Lines 2809 2951 +142
===========================================
- Hits 2282 1120 -1162
- Misses 387 1765 +1378
+ Partials 140 66 -74
Flags with carried forward coverage won't be shown. Click here to find out more.
|
As you both approved, merging to start working in further improvements. Please ping me if any of the recent changes need some attention. They are mostly to fix the Integration tests in K8s, which weren't running. |
* barely tested k8s decorator * started to properly test k8s decoration * Tested server-side stuff * basic testing of client-side * Pod2Pod & Pod2Service tests * Tests moved to k8s testenv * tested Pod2Pod as client * tested pod 2 external * tested external 2 pod * Improved informer unit test * grpc tests * Moving templating to its own structure and proper deletion of resources * OTEL traces decoration and tests * Barely working prometheus k8s metadata decoration * Release candidate * removed leaked println and increased linter timeout * Addressed code comments * testing passing true as string to yaml * Run kubernetes integration tests * removed unneded metadata from K8s informers * Fix integration tests logs for kind && removed unneeded metadata * increasing integration test timeout to 60m * increase timeouts
Again, the huge number of files is caused by vendored dependencies from the Kubernetes API.
When Beyla runs in a K8s environment, it decorates the OTEL traces and metrics with source and destination pods/services, when available.
By default it's disabled, so it shouldn't affect the performance or memory of any running instance.
You can enable it explicitly or leave it in "autodectect" mode: the decorator will be enabled if it detects it's running inside Kubernetes.
The Prometheus metrics aren't yet supported, as I found many issues related to cardinality and variable number of labels (metadata of source or destination pods might not be always available with the Ingress/Egress traffic). I will enable them in a later PR.
It is also missing the documentation.