Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Docker tag component detection and comparison #11679

Merged

Conversation

robaiken
Copy link
Contributor

@robaiken robaiken commented Feb 25, 2025

What are you trying to accomplish?

I'm improving how Dependabot selects Docker images for updates by enhancing the tag comparison logic. Currently, we sometimes incorrectly select images with different component patterns - for example, suggesting an update from image:3.3-apache to image:4-apache-alpine when we should only be matching images with identical components (just "apache" in this case).

This refactoring makes tag comparison more intelligent by analyzing patterns in image tags to identify their key components (like OS, packages, etc.) and using those patterns as a basis to filter out images that don't match the original structure. By dynamically identifying these components across all available tags, we can ensure we only suggest updates that maintain the same component structure as the original dependency.

Related issues: #8648, #390

Anything you want to highlight for special attention from reviewers?

I've taken a component-based approach to image selection rather than the previous method of stripping characters. The key strategy involves:

  1. Analyzing all available tags to identify recurring components (like alpine, apache, amd64)
  2. Extracting these components from both the original dependency and candidate updates
  3. Ensuring exact component matching between tags (so apache doesn't match with apache-alpine)

This approach makes more semantic sense for Docker tags and better respects their internal structure. I've also split the logic into smaller, focused methods for better maintainability. The refactoring preserves all existing functionality while adding more precise tag filtering.

How will you know you've accomplished your goal?

We'll see more accurate image selection during updates, particularly:

When updating from owasp/modsecurity-crs:3.3-apache-202209221209 we won't incorrectly select owasp/modsecurity-crs:4-apache-alpine-202502070602 (adding alpine)

Users should experience more predictable and semantically correct Docker dependency updates.

Checklist

  • I have run the complete test suite to ensure all tests and linters pass.
  • I have thoroughly tested my code changes to ensure they work as expected, including adding additional tests for new functionality.
  • I have written clear and descriptive commit messages.
  • I have provided a detailed description of the changes in the pull request, including the problem it addresses, how it fixes the problem, and any relevant details about the implementation.
  • I have ensured that the code is well-documented and easy to understand.

@robaiken robaiken requested a review from a team as a code owner February 25, 2025 18:17
@github-actions github-actions bot added the L: docker Docker containers label Feb 25, 2025
components = []

common_components.each do |component|
components << component if tag_name.match?(/\b#{Regexp.escape(component)}\b/)
Copy link
Contributor

@kbukum1 kbukum1 Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that can be changed into?

components << component if tag_name.include?(component)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to use word boundaries to ensure we only match complete component names, not partial substrings. If we have a component like "ruby" and we use include?, it will incorrectly match in tags like "rubygem" as substrings. I don't know if there is going to be a real world scenario where this would happen, but it's a possibility and I want to play it safe to avoid potentially creating issues.

kbukum1
kbukum1 previously approved these changes Feb 26, 2025
Copy link
Contributor

@kbukum1 kbukum1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robaiken, I’ve added a few suggestions, but overall, this looks good to me! 👍 If you’re considering a feature flag, it could help with a smoother rollout in case any issues arise.

Copy link
Contributor

@markhallen markhallen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Nice idea!

@robaiken robaiken force-pushed the robaiken/improve-Docker-tag-component-detection-and-comparison branch from e54450d to 4692acc Compare February 26, 2025 14:17
@robaiken robaiken merged commit 7323e6c into main Feb 26, 2025
60 checks passed
@robaiken robaiken deleted the robaiken/improve-Docker-tag-component-detection-and-comparison branch February 26, 2025 14:31
dmitris pushed a commit to dmitris/dependabot-core that referenced this pull request Feb 26, 2025
* Improve Docker tag component detection and comparison

* lint

* Adding `:docker_tag_component_comparison` feature flag

* lint

* fixing rubocops catch 22 between explicit and implicit subject

* refactoring code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
L: docker Docker containers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants