Skip to content

Commit ee8c0d8

Browse files
authored
Write and use FETCH_HEAD (#7)
This PR makes a big change to how the RemoteGitIndex performs fetches (and clones) and subsequently reads blobs, to resolve issues discovered while doing obi1kenobi/cargo-semver-checks#506 Previously when performing a fetch, we also updated references so that HEAD pointed at the same commit as the remote HEAD. This works fine in local operation and in tests, but had a drawback, namely ``` Error: Failed to update references to their new position to match their remote locations Caused by: 0: The reflog could not be created or updated 1: reflog messages need a committer which isn't set ``` can occur in CI environments. My attempts to fix this...failed (configuring a committer name + email before editing references). Then I noticed frewsxcv/rust-crates-index#129 which....is just way better than that previous approach since we don't need to edit references any longer just to be able to use `repo.head_commit` for reading blobs, but rather just store the remote HEAD when opening/fetching and retrieving the tree to lookup blobs from that, sidestepping the whole issue with updating references. There is a big difference between that PR though, in that this PR also adds a function (`crate::utils::git::write_fetch_head`) to actually write a FETCH_HEAD _similarly_ to how cargo, via git or libgit2 writes FETCH_HEAD when performing a fetch on an index. This was done for 2 reasons: 1. Performing a fetch of an index via this crate will now update the index repository the same as cargo would, which makes this play nicer with cargo and thus the wider ecosystem 2. I had already done (a slightly worse) version of writing a FETCH_HEAD for [cargo-deny](https://github.com/EmbarkStudios/cargo-deny/blob/main/src/advisories/helpers/db.rs#L451-L457), because, AFAIK, retrieving the timestamp of the FETCH_HEAD file is by far the [most/only reliable](https://stackoverflow.com/questions/2993902/how-do-i-check-the-date-and-time-of-the-latest-git-pull-that-was-executed) way to determine when a fetch was last performed on a repository. The reason this was important for cargo deny is that it fetches advisory databases, but can decide _not_ to fetch them if the user doesn't want to perform network calls, however if they _never_ fetch from the remote advisory db they run the risk of mistakenly thinking everything is fine even if they have crate dependencies that have active advisories that were added/modified since the last time they fetched. So until `gix` adds support for writing FETCH_HEAD (which I think is planned, but couldn't find it), making this function public allows cargo-deny or others to also have a (simple, definitely not git/libgit2 compliant) way to write a FETCH_HEAD with gix.
1 parent 3a335ef commit ee8c0d8

File tree

5 files changed

+287
-159
lines changed

5 files changed

+287
-159
lines changed

CHANGELOG.md

+6
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99

1010
<!-- next-header -->
1111
## [Unreleased] - ReleaseDate
12+
### Fixed
13+
- [PR#7](https://github.com/EmbarkStudios/tame-index/pull/7) fixed an issue where `RemoteGitIndex::fetch` could fail in environments where the git committer was not configured.
14+
15+
### Changed
16+
- [PR#7](https://github.com/EmbarkStudios/tame-index/pull/7) change how `RemoteGitIndex` looks up blobs. Previously fetching would actually update references, now however we write a `FETCH_HEAD` similarly to git/libgit2, and uses that (or other) reference to find the commit to use, rather than updating the HEAD to point to the same commit as the remote HEAD.
17+
1218
## [0.2.3] - 2023-07-26
1319
### Fixed
1420
- [PR#6](https://github.com/EmbarkStudios/tame-index/pull/6) fixed two bugs with git registries.

src/index/git_remote.rs

+118-124
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ use std::sync::atomic::AtomicBool;
99
pub struct RemoteGitIndex {
1010
index: GitIndex,
1111
repo: gix::Repository,
12+
head_commit: gix::ObjectId,
1213
}
1314

1415
const DIR: gix::remote::Direction = gix::remote::Direction::Fetch;
@@ -93,28 +94,42 @@ impl RemoteGitIndex {
9394
})
9495
.or_else(|| gix::open_opts(&index.cache.path, open_with_complete_config).ok());
9596

96-
let repo = if let Some(repo) = repo {
97-
repo
97+
let res = if let Some(repo) = repo {
98+
(repo, None)
9899
} else {
99-
let (repo, _out) = gix::prepare_clone_bare(index.url.as_str(), &index.cache.path)
100+
let (repo, out) = gix::prepare_clone_bare(index.url.as_str(), &index.cache.path)
100101
.map_err(Box::new)?
101102
.with_remote_name("origin")?
102103
.configure_remote(|remote| {
103104
Ok(remote.with_refspecs(["+HEAD:refs/remotes/origin/HEAD"], DIR)?)
104105
})
105106
.fetch_only(progress, should_interrupt)?;
106-
repo
107+
108+
(repo, Some(out))
107109
};
108110

109-
Ok(repo)
111+
Ok(res)
110112
};
111113

112-
let mut repo = open_or_clone_repo()?;
114+
let (mut repo, fetch_outcome) = open_or_clone_repo()?;
115+
116+
if let Some(fetch_outcome) = fetch_outcome {
117+
crate::utils::git::write_fetch_head(
118+
&repo,
119+
&fetch_outcome,
120+
&repo.find_remote("origin").unwrap(),
121+
)?;
122+
}
123+
113124
repo.object_cache_size_if_unset(4 * 1024 * 1024);
114125

115-
Self::set_head(&mut index, &repo)?;
126+
let head_commit = Self::set_head(&mut index, &repo)?;
116127

117-
Ok(Self { repo, index })
128+
Ok(Self {
129+
repo,
130+
index,
131+
head_commit,
132+
})
118133
}
119134

120135
/// Gets the local index
@@ -139,15 +154,58 @@ impl RemoteGitIndex {
139154
/// Sets the head commit in the wrapped index so that cache entries can be
140155
/// properly filtered
141156
#[inline]
142-
fn set_head(index: &mut GitIndex, repo: &gix::Repository) -> Result<(), Error> {
143-
let head = repo.head_commit().ok().map(|head| {
144-
let gix::ObjectId::Sha1(sha1) = head.id;
145-
sha1
146-
});
157+
fn set_head(index: &mut GitIndex, repo: &gix::Repository) -> Result<gix::ObjectId, Error> {
158+
let find_remote_head = || -> Result<gix::ObjectId, GitError> {
159+
const CANDIDATE_REFS: &[&str] = &[
160+
"FETCH_HEAD", /* the location with the most-recent updates, as written by git2 */
161+
"origin/HEAD", /* typical refspecs update this symbolic ref to point to the actual remote ref with the fetched commit */
162+
"origin/master", /* for good measure, resolve this branch by hand in case origin/HEAD is broken */
163+
"HEAD",
164+
];
165+
let mut candidates: Vec<_> = CANDIDATE_REFS
166+
.iter()
167+
.enumerate()
168+
.filter_map(|(i, refname)| {
169+
let ref_id = repo
170+
.find_reference(*refname)
171+
.ok()?
172+
.into_fully_peeled_id()
173+
.ok()?;
174+
175+
let commit = ref_id.object().ok()?.try_into_commit().ok()?;
176+
let commit_time = commit.time().ok()?.seconds;
177+
178+
Some((i, commit.id, commit_time))
179+
})
180+
.collect();
181+
182+
// Sort from oldest to newest, the last one will be the best reference
183+
// we could reasonably locate, and since we are on second resolution,
184+
// prefer the ordering of candidates if times are equal.
185+
//
186+
// This allows FETCH_HEAD to be authoritative, unless one of the other
187+
// references is more up to date, which can occur in (at least) 2 scenarios:
188+
//
189+
// 1. The repo is a fresh clone by cargo either via git or libgit2,
190+
// neither of which write FETCH_HEAD during clone
191+
// 2. A fetch was performed by an external crate/program to cargo or
192+
// ourselves that didn't update FETCH_HEAD
193+
candidates.sort_by(|a, b| match a.2.cmp(&b.2) {
194+
std::cmp::Ordering::Equal => b.0.cmp(&a.0),
195+
o => o,
196+
});
197+
198+
// get the most recent commit, the one with most time passed since unix epoch.
199+
Ok(candidates
200+
.last()
201+
.ok_or_else(|| GitError::UnableToFindRemoteHead)?
202+
.1)
203+
};
147204

148-
index.set_head_commit(head);
205+
let gix::ObjectId::Sha1(sha1) = find_remote_head()?;
206+
index.set_head_commit(Some(sha1));
149207

150-
Ok(())
208+
Ok(gix::ObjectId::Sha1(sha1))
151209
}
152210

153211
/// Attempts to read the specified crate's index metadata
@@ -182,7 +240,12 @@ impl RemoteGitIndex {
182240
}
183241

184242
fn read_blob(&self, path: &str) -> Result<Option<gix::ObjectDetached>, GitError> {
185-
let tree = self.repo.head_commit()?.tree()?;
243+
let tree = self
244+
.repo
245+
.find_object(self.head_commit)
246+
.map_err(Box::new)?
247+
.try_into_commit()?
248+
.tree()?;
186249

187250
let mut buf = Vec::new();
188251
let Some(entry) = tree.lookup_entry_by_path(path, &mut buf).map_err(|err| GitError::BlobLookup(Box::new(err)))? else { return Ok(None) };
@@ -253,117 +316,46 @@ impl RemoteGitIndex {
253316
P: gix::Progress,
254317
P::SubProgress: 'static,
255318
{
256-
let mut perform_fetch = || -> Result<(), GitError> {
257-
let mut remote = self.repo.find_remote("origin").ok().unwrap_or_else(|| {
258-
self.repo
259-
.remote_at(self.index.url.as_str())
260-
.expect("owned URL is always valid")
261-
});
262-
263-
let remote_head = "refs/remotes/origin/HEAD";
264-
265-
remote
266-
.replace_refspecs(Some(format!("+HEAD:{remote_head}").as_str()), DIR)
267-
.expect("valid statically known refspec");
268-
269-
// Perform the actual fetch
270-
let fetch_response: gix::remote::fetch::Outcome = remote
271-
.connect(DIR)
272-
.map_err(Box::new)?
273-
.prepare_fetch(&mut progress, Default::default())
274-
.map_err(Box::new)?
275-
.receive(&mut progress, should_interrupt)?;
276-
277-
// Find the commit id of the remote's HEAD
278-
let remote_head_id = fetch_response.ref_map.mappings.iter().find_map(|mapping| {
279-
let gix::remote::fetch::Source::Ref(rref) = &mapping.remote else { return None; };
280-
281-
if mapping.local.as_deref()? != remote_head.as_bytes() {
282-
return None;
283-
}
284-
285-
let gix::protocol::handshake::Ref::Symbolic {
286-
full_ref_name,
287-
object,
288-
..
289-
} = rref else { return None; };
290-
291-
(full_ref_name == "HEAD").then_some(*object)
292-
}).ok_or(GitError::UnableToFindRemoteHead)?;
293-
294-
use gix::refs::{transaction as tx, Target};
295-
296-
// In all (hopefully?) cases HEAD is a symbolic reference to
297-
// refs/heads/<branch> which is a peeled commit id, if that's the case
298-
// we update it to the new commit id, otherwise we just set HEAD
299-
// directly
300-
use gix::head::Kind;
301-
let edit = match self.repo.head().map_err(Box::new)?.kind {
302-
Kind::Symbolic(sref) => {
303-
// Update our local HEAD to the remote HEAD
304-
if let Target::Symbolic(name) = sref.target {
305-
Some(tx::RefEdit {
306-
change: tx::Change::Update {
307-
log: tx::LogChange {
308-
mode: tx::RefLog::AndReference,
309-
force_create_reflog: false,
310-
message: "".into(),
311-
},
312-
expected: tx::PreviousValue::MustExist,
313-
new: gix::refs::Target::Peeled(remote_head_id),
314-
},
315-
name,
316-
deref: true,
317-
})
318-
} else {
319-
None
320-
}
321-
}
322-
Kind::Unborn(_) | Kind::Detached { .. } => None,
323-
};
319+
// We're updating the reflog which requires a committer be set, which might
320+
// not be the case, particular in a CI environment, but also would default
321+
// the the git config for the current directory/global, which on a normal
322+
// user machine would show the user was the one who updated the database which
323+
// is kind of misleading, so we just override the config for this operation
324+
325+
let mut config = self.repo.config_snapshot_mut();
326+
config
327+
.set_raw_value("committer", None, "name", "tame-index")
328+
.map_err(GitError::from)?;
329+
// Note we _have_ to set the email as well, but luckily gix does not actually
330+
// validate if it's a proper email or not :)
331+
config
332+
.set_raw_value("committer", None, "email", "")
333+
.map_err(GitError::from)?;
334+
335+
let repo = config
336+
.commit_auto_rollback()
337+
.map_err(|err| GitError::from(Box::new(err)))?;
338+
339+
let mut remote = repo.find_remote("origin").ok().unwrap_or_else(|| {
340+
repo.remote_at(self.index.url.as_str())
341+
.expect("owned URL is always valid")
342+
});
324343

325-
// We're updating the reflog which requires a committer be set, which might
326-
// not be the case, particular in a CI environment, but also would default
327-
// to the git config for the current directory/global, which on a normal
328-
// user machine would show the user was the one who updated the database which
329-
// is kind of misleading, so we just override the config for this operation
330-
{
331-
let repo = {
332-
let mut config = self.repo.config_snapshot_mut();
333-
config.set_raw_value("committer", None, "name", "tame-index")?;
334-
// Note we _have_ to set the email as well, but luckily gix does not actually
335-
// validate if it's a proper email or not :)
336-
config.set_raw_value("committer", None, "email", "")?;
337-
338-
config.commit_auto_rollback().map_err(Box::new)?
339-
};
340-
341-
repo.edit_reference(edit.unwrap_or_else(|| tx::RefEdit {
342-
change: tx::Change::Update {
343-
log: tx::LogChange {
344-
mode: tx::RefLog::AndReference,
345-
force_create_reflog: false,
346-
message: "".into(),
347-
},
348-
expected: tx::PreviousValue::Any,
349-
new: gix::refs::Target::Peeled(remote_head_id),
350-
},
351-
name: "HEAD".try_into().unwrap(),
352-
deref: true,
353-
}))?;
354-
}
344+
remote
345+
.replace_refspecs(Some("+HEAD:refs/remotes/origin/HEAD"), DIR)
346+
.expect("valid statically known refspec");
355347

356-
// Sanity check that the local HEAD points to the same commit
357-
// as the remote HEAD
358-
if remote_head_id != self.repo.head_commit()?.id {
359-
Err(GitError::UnableToUpdateHead)
360-
} else {
361-
Ok(())
362-
}
363-
};
348+
// Perform the actual fetch
349+
let outcome = remote
350+
.connect(DIR)
351+
.map_err(|err| GitError::from(Box::new(err)))?
352+
.prepare_fetch(&mut progress, Default::default())
353+
.map_err(|err| GitError::from(Box::new(err)))?
354+
.receive(&mut progress, should_interrupt)
355+
.map_err(GitError::from)?;
364356

365-
perform_fetch()?;
366-
Self::set_head(&mut self.index, &self.repo)?;
357+
crate::utils::git::write_fetch_head(&repo, &outcome, &remote)?;
358+
self.head_commit = Self::set_head(&mut self.index, &repo)?;
367359

368360
Ok(())
369361
}
@@ -394,6 +386,8 @@ pub enum GitError {
394386
#[error(transparent)]
395387
Commit(#[from] gix::object::commit::Error),
396388
#[error(transparent)]
389+
InvalidObject(#[from] gix::object::try_into::Error),
390+
#[error(transparent)]
397391
ReferenceLookup(#[from] Box<gix::reference::find::existing::Error>),
398392
#[error(transparent)]
399393
BlobLookup(#[from] Box<gix::odb::find::existing::Error<gix::odb::store::find::Error>>),

src/utils.rs

+3
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,9 @@
33
44
use crate::{Error, InvalidUrl, InvalidUrlError, PathBuf};
55

6+
#[cfg(feature = "git")]
7+
pub mod git;
8+
69
/// Returns the storage directory (in utf-8) used by Cargo, often knowns as
710
/// `.cargo` or `CARGO_HOME`
811
#[inline]

0 commit comments

Comments
 (0)