Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check if runner needs to be installed #84

Merged
merged 3 commits into from
Apr 19, 2023

Conversation

maigl
Copy link
Contributor

@maigl maigl commented Apr 12, 2023

I'm trying your new garm-provider-openstack. So far it's looking good. But we don't have some long time numbers yet.

I wanted to discuss this little idea that could save us some time..

For green IT reasons we've already pre-installed the github runner in our runner images. Hence, no need to install it during boot/cloud-init. This PR shows a quick check if the runner dir is already there. Probably this check should be a little bit more robust.. What do you think?

This saves us thousands of downloads and installs ..

I also added the curl -k as we currently do need this in our test pipeline and I believe in the bash provider you also had the -k added to the curl requests.

Copy link
Member

@gabriel-samfira gabriel-samfira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The insecure flag is only needed if you don't specify a CA bundle in the config. I know it's a bit counter intuitive, but you can add a CA bundle here:

https://github.com/cloudbase/garm/blob/main/testdata/config.toml#L237

The bundle added here will be installed on the runners by cloud-init/cloudbase-init. You can add your own self-signed root certificate as well as any other root certificate you wish.

@gabriel-samfira
Copy link
Member

My concern is that this might lead to outdated runners if the images are not regularly maintained and updated. How about this:

  • Place the downloaded archive on the runner
  • Compare the sha256 sum of the existing archive with the sha256 sum that comes in the tools object
  • If the hashes match, we just unarchive and install as usual
  • if the hashes differ, we download the new runner

Runner versions only change when there is a new release, so it shouldn't be that often.

How does that sound?

@maigl
Copy link
Contributor Author

maigl commented Apr 12, 2023

okay .. this would spare us the download .. but not the unzip and install. Fair enough.

How oft does this version change? Or how would I know that I need to put a newer runner.tar into my runner image?

@MoritzKeppler
Copy link

As long as we don't disable autoupdate, a runner with an outdated image will automatically update, or? see https://github.blog/changelog/2022-02-01-github-actions-self-hosted-runners-can-now-disable-automatic-updates/

We have cases where it's the other way round: the runner already uses a newer image than the GH instance provides. Maybe not officially supported, but quite helpful to get access to new features...

@gabriel-samfira
Copy link
Member

okay .. this would spare us the download .. but not the unzip and install. Fair enough.

The unzip should be quick, and the dependencies for that version can be pre-installed which should make the installation process quick as well. Cloud-init should update any pre-installed packages by default.

How oft does this version change?

It depends. Roughly once a month if no security issues show up. We can see the release frequency on their releases page:

https://github.com/actions/runner/releases

Or how would I know that I need to put a newer runner.tar into my runner image?

We get the file name (which includes the version) in the tools object. This gets downloaded to /home/{{ .RunnerUsername }}/{{ .FileName }}. We can check for the existence of that file and match the sha256 sum. if either of those checks fail, we download the file again. Something like (pseudo code):

function maybeDownloadRunner() {
    if [ ! -f $RUNNER ] || [ $RUNNED_SHA != $ACTUAL_SHA ];then
        downloadRunner()
    fi
}

Or we could just rely on the existence of the file, given that it includes the minor version.

As long as we don't disable autoupdate, a runner with an outdated image will automatically update, or?

Yes, that should happen. We don't disable automatic updates. Although, I am not sure it does that on startup before it becomes available in github.

In general I am inclined to go with the pre-downloaded archive at that pre-defined location. We could also check for an extra file like actions-runner-override.tgz and prefer that if you want to force it to use the binary you put there.

What I would like to have is:

  • Be safe by default. Use the latest official build of the runner.
  • Automatic updates should be enabled. This will keep the existing idle runners updated.
  • Allow caching of the archive to save on bandwith.
  • If we allow overriding the runner version, make sure that it is an explicit decision made by the operator, after they are made fully aware of any risks inherent in running potentially outdated or pre-release versions of the runner.

As a side note, I will be adding the ability to customize these templates in the medium term. So users will be able to add their own userdata templates for each supported OS (Linux and Windows).

@gabriel-samfira
Copy link
Member

I was thinking about this and I think there is a clean way to implement this. We can use a local cache for packages we care about in a way that allows flexibility but that also does the right thing if we don't have a local copy of the version we care about. And also skip unarchiving every time.

  • Create a cache with the desired unarchived action runner version in /opt/cache/[APP_NAME]/[APP_VERSION]. For example: /opt/cache/actions-runner/2.299.1.
  • Fetch the app version number we care about from the tools object. The tools are of the form actions-runner-PLATFORM-CPU_ARCHITECTURE-VERSION.(tar.gz|zip). There is a sample tools object in the docs folder.
  • Check we have an equivalent, unarchived version in /opt/cache
  • If a cache exists, we:
    • Remove /home/runner/actions-runner (if it exists)
    • Create a symlink or copy: ln -s /opt/cache/actions-runner/2.299.1 /home/runner/actions-runner
  • If a cache does not exist, we run the usual installation process that implies downloading and unarchiving.

We could also add a file with an override version number somewhere on disk and use that version instead of the one that comes from the tools object. The format of that override file needs discussion though, because simply adding a version may not be enough. In some cases, to download tools we need a download token. So we either have to always cache the version in the override file, or we need to point the userdata script to a 3rd party URL from where to download it.

But for now, I think a local cache should do. What do you think?

@@ -36,11 +36,11 @@ if [ -z "$METADATA_URL" ];then
echo "no token is available and METADATA_URL is not set"
exit 1
fi
GITHUB_TOKEN=$(curl --fail -s -X GET -H 'Accept: application/json' -H "Authorization: Bearer ${BEARER_TOKEN}" "${METADATA_URL}/runner-registration-token/")
GITHUB_TOKEN=$(curl -k --fail -s -X GET -H 'Accept: application/json' -H "Authorization: Bearer ${BEARER_TOKEN}" "${METADATA_URL}/runner-registration-token/")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the insecure flag still needed if you add your root CA certificate in https://github.com/cloudbase/garm/blob/main/testdata/config.toml#L237 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the -k flag.

But, this problem cannot be fixed by configuring garm, as this request happens inside the runner, so the runner's curl needs to trust garms (maybe self-signed) certificate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config section I linked to allows you to specify a list of concatenated root certificates. You can add any self signed certificate there as well. That blob gets pushed to the runner in the bootstrap params and should get installed by cloud-init as system wide trusted root certificates before the runner starts up or any curl calls are made to garm or any other URL. It should remove the need for the -k flag.

@@ -72,22 +72,26 @@ if [ ! -z $GH_RUNNER_GROUP ];then
RUNNER_GROUP_OPT="--runnergroup=$GH_RUNNER_GROUP"
fi

# test if runner is already installed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try something like this:

# This will echo the version number in the filename. Given a file name like: actions-runner-osx-x64-2.299.1.tar.gz
# this will output: 2.299.1
function getRunnerVersion() {
    FILENAME="{{ .FileName }}"
    [[ $FILENAME =~ ([0-9]+\.[0-9]+\.[0-9+]) ]]
    echo $BASH_REMATCH
}

function getCachedToolsPath() {
    VERSION=$(getRunnerVersion)
    if [ -z "$VERSION" ]; then
        return 0
    fi

    CACHED_RUNNER="/opt/cache/actions-runner/$VERSION"
    if [ -d "$CACHED_RUNNER" ];then
        echo "$CACHED_RUNNER"
        return 0
    fi
    return 0
}

function downloadAndExtractRunner() {
    if [ ! -z "{{ .TempDownloadToken }}" ]; then
	TEMP_TOKEN="Authorization: Bearer {{ .TempDownloadToken }}"
    fi
    curl -L -H "${TEMP_TOKEN}" -o "/home/{{ .RunnerUsername }}/{{ .FileName }}" "{{ .DownloadURL }}" || fail "failed to download tools"
    mkdir -p /home/runner/actions-runner || fail "failed to create actions-runner folder"
    sendStatus "extracting runner"
    tar xf "/home/{{ .RunnerUsername }}/{{ .FileName }}" -C /home/{{ .RunnerUsername }}/actions-runner/ || fail "failed to extract runner"
    chown {{ .RunnerUsername }}:{{ .RunnerGroup }} -R /home/{{ .RunnerUsername }}/actions-runner/ || fail "failed to change owner"
}

CACHED_RUNNER=$(getCachedToolsPath)
if [ -z "$CACHED_RUNNER" ];then
    downloadAndExtractRunner
else
    sudo cp -a "$CACHED_RUNNER"  "/home/{{ .RunnerUsername }}/actions-runner"
    chown {{ .RunnerUsername }}:{{ .RunnerGroup }} -R "/home/{{ .RunnerUsername }}/actions-runner" || fail "failed to change owner"
fi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I have not tested the above 😄 , but something like that should work. You will need to rebuild your image with that cache in place. Let me know if this would be acceptable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm (without having tested it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts on this topic:
For us it's crucial to make the runner boot/setup phase as lean as possible. The simple reason is that we have many many many runs a day. This means we want to avoid any boot-time updates (apt update ..) , downloads and install steps and so on. Our goal is to provide ready-to-use runner images.
Also, as Moritz pointed out, we would like to be able to (pre) install a specific runner version, sometimes even newer versions than our github instance would tell us.

So, we would like to add a override cached_runner, maybe at
/opt/cache/actions-runner/override ?!

But we also need to be able to disable any updates and installs in
cloudconfig/cloudconfig.go. E.g.

func NewDefaultCloudInitConfig() *CloudInit {
        return &CloudInit{
                PackageUpgrade: true,
                Packages: []string{"tar","curl"},

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we would like to add a override cached_runner, maybe at
/opt/cache/actions-runner/override

Let's go with /opt/cache/actions-runner/latest

If it exists we just blindly trust that it is indeed the latest version and prefer it over whichever version is supplied by the instance.

We can also add a config flag to disable system updates, but that should come with a big warning for users to make sure they periodically generate new, updated images

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the disable_updates flag a config value, as there is little chance to want to enable updates for one pool but not another.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you like to configure the disable_updates flag? I think this is business of the provider not garm, but all code that builds and serializes the userdata is vendored library code from garm.
That means we would either need to add more params to

func GetCloudConfig(bootstrapParams params.BootstrapInstance, tools github.RunnerApplicationDownload, runnerName string) (string, error) {

or extend BootstrapInstance by some new Userdata options?
What do you recommend?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed a provider scope change. The new providers reuse code from garm, but they don't have to. There is no reason why we can't have an alternate GetCloudConfig() that composes the cloud config with different values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added cloudbase/garm-provider-openstack#2 .. but it will only really work together if this is merged and the vendored garm is updated in garm-provider-openstack.

Copy link
Member

@gabriel-samfira gabriel-samfira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! thanks!

@gabriel-samfira gabriel-samfira merged commit 690db91 into cloudbase:main Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants