Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: prevent duplicate service_name and app_name labels during ingestion #3951

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

marcsanmi
Copy link
Contributor

This PR fixes two issues related to duplicate labels during profile ingestion:

  • Prevents duplicate service_name labels in the createLabels function in the pprof converter
  • Prevents duplicate app_name labels in the pyroscope ingester adapter

This is particularly important because in Alloy, we have logic that ensures profiles always have a service_name label, and adds an app_name label when service_name already exists.

Our current implementation could lead to duplicate labels when this Alloy logic is applied.

Resolves https://github.com/grafana/pyroscope-squad/issues/357

@marcsanmi marcsanmi requested a review from a team as a code owner February 25, 2025 18:10
@marcsanmi marcsanmi force-pushed the marcsanmi/fix-dup-labels-alloy branch from 9b6d842 to 2ca121a Compare February 25, 2025 21:28
Copy link
Collaborator

@kolesnikovae kolesnikovae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for fixing this!

I'm not 100% sure I understand why we add the app_name label; I assume it's for backward compatibility, but I don't fully understand why we would create it if it's absent. This is clearly out of scope of the PR, but it would be nice if could clarify this moment

@korniltsev
Copy link
Collaborator

I think I added an app_name to not drop the app name from flameql if the service_name is also present.
Bsaically if we have flameql appname{service_name=svc}
Honestly I now regret doing this I think this is redundant and this logic can be dropped

@korniltsev
Copy link
Collaborator

I think in may be useful to add an integration test ingesting foo{service.name=bar} . I have suspicuous it may also introduce duplicate labels after sanitization in distributor. Feel free to do this in a followup or just create an issue to do it later for someone else.


if hasDotServiceName {
if hasServiceName && serviceDotNameValue != "" {
labels[serviceNameIndex].Value = serviceDotNameValue
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this? I've decided service.name value to replace service_name if both exists.

Copy link
Contributor

@simonswine simonswine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this. I have a few more questions before we merge this

@@ -900,6 +897,64 @@ func extractSampleSeries(
return result, bytesRelabelDropped, profilesRelabelDropped
}

func processServiceNameLabels(labels []*typesv1.LabelPair) []*typesv1.LabelPair {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func processServiceNameLabels(labels []*typesv1.LabelPair) []*typesv1.LabelPair {
func processServiceNameLabels(labels []*typesv1.LabelPair) []*typesv1.LabelPair

I found the implementation quite hard to follow and also the intend of the change is not necessarily covered in the PR description (or at least as far as I understood it 🙂 ).

Generally a little later in the code

if err = validation.ValidateProfile(d.limits, tenantID, p.Profile, decompressedSize, series.Labels, now); err != nil {
, we would validate the labels and ensure:

  • Only valid label names are allowed (no dots, only alpha numeric + _)
  • A label can only exist once

So we are trying to adapt a received profile to later be successfully validated (and not thrown away).

So some clients are sending service.name labels, and also attaches them twice.
Both are invalid per API definition.

I definitely would worry that this causes to hide errors in clients like our SKDs/Alloys, that later will surface somewhere else (maybe worse). I think we would be better to fix those bug at source, esp. if we can change the clients.

I think if we do something in that space it should be more general (but I don't suggest we do that):

  • On duplicate label names we take the last value in the slice.
  • We convert all label names . to _ and if there is a conflict with another label name we resolve it by appending _x to the conflicting label name

What I think we should do is: If there is a service.name and no service_name we change the name to be service_name. Everything else is too invasive.

In the midterm, I also think we would like to work on at least supporting utf8 label names: https://github.com/grafana/pyroscope-squad/issues/343

Copy link
Contributor Author

@marcsanmi marcsanmi Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should control somehow that when converting labels from . to _ there aren't any duplicates. Otherwise, people using otel instrumentation sending service.name as per the semantic conventions, will hit this; we're adding service_name in our otlp ingest handler here.

So, not sure whether this is going to happen, but worth checking

What I think we should do is: If there is a service.name and no service_name we change the name to be service_name. Everything else is too invasive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonswine quick update on this,

I updated it to support "one level" of sanitization-induced duplicates by adding a "_dup" suffix to the sanitized name. For example, if both "service.name" and "service_name" exist, "service.name" will be sanitized to "service_name_dup"; more than the same 2 duplicates exist because of sanitization, we still fail.

I've also added the support to add service_name when only service.name is present.

LMKWYT.

Alloy's related PR

@marcsanmi marcsanmi requested a review from simonswine March 6, 2025 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants