Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeing these vulnerability issues on Dogfood every night #25090

Open
getvictor opened this issue Jan 2, 2025 · 5 comments
Open

Seeing these vulnerability issues on Dogfood every night #25090

getvictor opened this issue Jan 2, 2025 · 5 comments
Assignees
Labels
~backend Backend-related issue. bug Something isn't working as documented ~dogfood Issue resulted from Fleet's product dogfooding. #g-software Software product group :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. ~released bug This bug was found in a stable release.

Comments

@getvictor
Copy link
Member

getvictor commented Jan 2, 2025

image.png

@lucasmrod : It does seem benign, so we should log error as debug and continue the for loop.

if err := fsClient.Delete(d); err != nil {

Repro/QA

  1. Enroll two hosts with different builds of the same version of Windows.
  2. Run the vulnerabilities cron to pull the appropriate MSRC file.
  3. Rename the MSRC file to match yesterday's date.
  4. Run the vulnerabilities cron again.

Repro will show the above error. Fixed version won't.

@getvictor getvictor added #g-software Software product group :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. bug Something isn't working as documented ~dogfood Issue resulted from Fleet's product dogfooding. ~released bug This bug was found in a stable release. labels Jan 2, 2025
@iansltx iansltx added the ~backend Backend-related issue. label Jan 2, 2025
@iansltx iansltx added this to the 4.63.0-tentative milestone Jan 2, 2025
@mostlikelee mostlikelee removed this from the 4.63.0-tentative milestone Jan 2, 2025
@iansltx
Copy link
Member

iansltx commented Jan 15, 2025

Digging deeper on this, as the deletion should only be picking up files that exist, so if we're attempting to delete files that we can't find when deleting something else may be at play. Still troubleshooting this.

@mostlikelee mostlikelee added this to the 4.64.0-tentative milestone Jan 15, 2025
@iansltx iansltx modified the milestones: 4.64.0, 4.65.0-tentative Feb 4, 2025
@mostlikelee mostlikelee removed this from the 4.65.0-tentative milestone Feb 5, 2025
@lukeheath lukeheath added :product Product Design department (shows up on 🦢 Drafting board) and removed :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. labels Feb 7, 2025
@mostlikelee mostlikelee added :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. and removed :product Product Design department (shows up on 🦢 Drafting board) labels Feb 20, 2025
@mostlikelee mostlikelee assigned iansltx and unassigned ksykulev Feb 26, 2025
iansltx added a commit that referenced this issue Mar 11, 2025
For #25090.

Not 100% sure why we're seeing this issue but this will drop the error severity, and even if a delete fails and leaves a file we'll pick the correct (latest) MSRC file for each OS anyway, so this is low-risk.
@iansltx
Copy link
Member

iansltx commented Mar 11, 2025

@rfairburn Why might these files be getting deleted between the time we calculate a delta between files to download/delete and actually deleting the files? We're consistently seeing this once per day, with one file failing to delete on one hour (~1:22a UTC) and one failing to delete the next hour (~2:22a UTC). Given that we run the vulns cron hourly, this seems odd. Before downgrading the error (see the associated PR), I want to know why we're seeing this.

@rfairburn
Copy link
Contributor

Does Dogfood already have the fix that prevents alerts from repeating every time a cron runs until the service is restarted? I'm pretty sure that made it into the RC, but not sure if that version of the RC has been applied to Dogfood yet.

This could have been a one-off thing for any number of reasons (we don't have persistent or shared storage at all for example as containers are intended to be stateless as much as possible), but would alert every cron interval with the same error if the alerting fix has not been deployed yet.

@iansltx
Copy link
Member

iansltx commented Mar 11, 2025

The info above was from CloudWatch Logs, not alerts, and it only happens 2x per day (but happens 2x every day), so I don't think it's one-off, nor is it related to the the repeating alerts thing...I think?

@iansltx
Copy link
Member

iansltx commented Mar 12, 2025

@rfairburn Your theory on this being due to multiple matches to the same MSRC file to delete was a sound one. Different builds of the same version of Windows is the culprit here (which is why you didn't see this in every environment). The included PR fixes that issue; thanks for the assist here!

iansltx added a commit that referenced this issue Mar 12, 2025
…ilds of the same version of Windows (#27060)

For #25090.

# Checklist for submitter

If some of the following don't apply, delete the relevant line.

<!-- Note that API documentation changes are now addressed by the
product design team. -->

- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
See [Changes
files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/Committing-Changes.md#changes-files)
for more information.
- [x] Input data is properly validated, `SELECT *` is avoided, SQL
injection is prevented (using placeholders for values in statements)
- [x] Added/updated automated tests
- [x] A detailed QA plan exists on the associated ticket (if it isn't
there, work with the product group's QA engineer to add it)
- [x] Manual QA for all new/changed functionality
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
~backend Backend-related issue. bug Something isn't working as documented ~dogfood Issue resulted from Fleet's product dogfooding. #g-software Software product group :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. ~released bug This bug was found in a stable release.
Projects
None yet
Development

No branches or pull requests

6 participants