-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Yarn 3+] fallback to inspecting package.json if no git remote origin defined #855
Comments
See also this PR to update the docs with a known pitfall + workaround, until the above is implemented: https://github.com/hermetoproject/cachi2/pull/856/files |
This behavior is consistent across all package managers, since there are different needs for the metadata that is tied to a git repo (such as tags). The suggestion I brought up was that we changed this fundamental proposition, and allowed users to prefetch content from a folder that is not a Git repository. This would result in less accurate data in the SBOM (we wouldn't know the VCS url for the repository packages for instance), but that is probably a not a big issue for local testing, and the fallback behavior is probably better than the strict failure Cachi2 produces today. |
I disagree, the suggested fallback is only a ticking time bomb IMO leading to more errors in the future when we actually try to query metadata which would simply not be present, IOW we'd have to make sure on the global level that we first run a |
I think we should strive to make Cachi2 work with the less amount of hassle from the user. I don't think the argument of less quality SBOM is strong here. Cachi2 should simply work with the data it can gather. If it is a git repo, it potentially will have more relevant data, otherwise, it has less relevant data. It is quite different from having arbitrary code execution and getting unknown components into the output folder. Local use cases are also valid, and one could be using another VCS than git. This is just one of many things I'd like to change regarding UX, but in my mind, the overall goal is to make Cachi2 easy and intuitive to use. I should be able to download Cachi2 and process my local project without having to go over documentation. We should strive to give a hard failure only on things that are really causing security issues. |
If the user is able to do
This is an unachievable goal the more features you pack. Sooner or later a user will be required to go over the documenation and that is expected and encouraged, hence we need to invest much more time and effort into improving our docs such that many pitfalls/gotchas would be covered and most of the issues would ideally be resolved via "self-service". Other than that though, I think this statement is very bold and dangerous at the same time, because you very much need to strike the balance of putting in code just for user convenience for the sake of UX and not turn it into a living maintenance nightmare at the same time. Docs is great and I don't think engineers are afraid of it, they just get frustrated with their lack of quality OR its complete lack of, hence my point.
Intuitive to use doesn't necessarily have to mean going against our mission of providing highly accurate SBOMs which is our top selling point ATM and the user needs to understand that - again, if they don't -> we need to improve the docs to make it clear.
Security issues are not the only types of issues that may impede with our ability to process an input, e.g. repository misconfigurations are typically not security issues, but it definitely does get in our way of processing inputs, requiring a hard failure in order for the user to go and address it before re-running, we simply cannot have a workaround for every combination of output that we could not use/process in a straightforward manner, that itself is a practice which may result into injecting more bugs and security holes on our side, in the worst case scenario leading to compromising the whole Secure Software Supply Chain pipeline, NB we are at the very beginning of it.
Yep, this is a valid point. However, the overall sentiment of the comment being quoted and the workaround discussed in this issue don't IMHO go in the same direction as this very statement alone. I agree that there may be other types of VCS (although not so common nowadays I think) which means that we can simply add a tracking issue to explore other VCS types other than git, keep it at low priority and when the need comes, we can start investigating how we can go about it, because a lot of our current functionality would be impacted by it. |
This is the hassle. I think the project should just work, not make the user bang his head, go over to docs and try to figure out why it is giving a InvalidGitRepositoryError. You may argue it is not too much of a hassle, but I think these small things pile up to make the project a real pain to use.
I don't think dropping the requirement of a folder being a Git repo goes against it. Cachi2 is being as accurate as it can with the data it has access to. I don't want to drag this into a huge discussion. I think the situation here speaks for itself: users are creating workarounds, potentially feeding fake data to generate a Git repo just to make the project work, and I think that is a clear sign of bad UX. Let's hear what other folks have to say, and then we can decide to close or go ahead with this. |
The other option (as suggested in #856 (comment) ) would be to have in bold print somewhere in the statement of what Cachi2 is and what it's for... that you need to refer to a git repo because the whole reason it exists to to pull information from a cached copy of sources for a given repo and commit SHA. If that's the case then the "initial setup / how to use" readme would just explain that you need to either:
or
|
NB the following
would still not be enough unless |
@nickboldt if you as our end user don't mind reading docs and would have appreciated ^this particular bit of info in the docs that would have prevented you from opening the issue in the first place, then I think we have the answer we needed :) . |
If I understand correctly, the project that triggered this issue is not a Git repository at all, it is generated as part of a build process. |
Correct, in our case the build consists of the main application for which we're currently building a cache for and then a number of additional packages that are artifacts that could be loaded at runtime. In this case the artifacts are currently part of the source tree here and each of these wrappers contain a command like this which creates a yarn project (from there our tooling usually does a |
@nickboldt I noticed you linked a private Slack thread in your issue description - this is a public OSS project so we can't reference private/unresolvable resources, please extract the relevant bits of information and put it in the description :), thanks. |
@gashcrumb @nickboldt I'm sorry for my ignorance, I'm not well versed at the Yarn ecosystem at all, so I'll need more help understanding the problem here because what I'm seeing based on the links provided, these plugins are still part of the main git tree, so what is the motivation to prefetch for these from extracted tarballs (i.e. git metadata missing) instead of doing prefetch on over the main repo, but specifying the subpath to these plugins individually which would result in the prefetch working, wouldn't it? |
In the case of Yarn 3+, cachi2 uses
git remote show origin
to define the VCS url of the main package (the one that is being prefetched) and all the other packages that might exist in the repository.This means that you can only run cachi2 locally for a yarn3+ resolution if you're in a git repo.
Without a remote repo configured, you might see:
The workaround (if you're not already in a git repo) is to:
But as suggested in this thread it might be better if a fallback behaviour was implemented where the package.json could be introspected instead, if no git origin was defined.
The text was updated successfully, but these errors were encountered: