Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search for the CodeQL bundle in multiple places. #123

Merged
merged 10 commits into from Jul 30, 2020
Merged

Conversation

chrisgavin
Copy link
Contributor

The aim of this change is to resolve a number of issues with where the CodeQL bundle is fetched from. In many cases it is either not possible or not desired to fetch the CodeQL bundle from the canonical location at https://github.com/github/codeql-action/releases/download/codeql-bundle-20200630/codeql-bundle.tar.gz. This might be because you are using the Action on a runner that does not have internet access. It may also be because you've copied the CodeQL Action repository to your own organization so that you can control what code is running on your runner machines.

The general idea of this change was inspired by the way the Actions runner attempts to find an Action. We can apply a similar principle to finding the CodeQL bundle. This process works by looking for the CodeQL bundle in a place that's as local as possible and then working outwards. It checks if the bundle exists:

  1. On $our_github_instance at the repository name for the current Action.
  2. On $our_github_instance at github/codeql-action.
  3. On GitHub.com at github/codeql-action.

It downloads the bundle from the first place it finds it. For symmetry I initially considered also checking GitHub.com at the repository name for the current Action, however I decided against that due to the accidental security risk someone might expose themselves to if they incorrectly mirror the Action without copying the bundle.

To check whether the bundle exists in these locations we use the GitHub API. This is required because on GHES you must authenticate to access the API, even for accessing public projects. The one exception to this is when we're finally getting the bundle from the canonical location. In this case we do not use the API. This is because if we access the API from GHES we cannot authenticate with our token, meaning we have to make the request anonymously. You could easily run out of API rate limit making these anonymous requests. Because we know for sure the bundle exists in the canonical location we can safely download it directly rather than using the API.

This change currently relies on two hacks to work around limitations in other parts of Actions. I've opened pull requests to solve these limitations but it's unclear when they might be reviewed. The two pull requests are:

Sorry for the excessive description. This change ended up being a bit more complicated than I expected and I wanted to make it clear why I made the design decision I have.

Merge / deployment checklist

  • Confirm this change is backwards compatible with existing workflows.
  • Confirm the readme has been updated if necessary.

@chrisgavin chrisgavin force-pushed the search-for-bundle branch 3 times, most recently from e6f00e5 to 5254a0d Compare July 24, 2020 13:12
src/codeql.ts Outdated Show resolved Hide resolved
for (let downloadSource of uniqueDownloadSources) {
let [apiURL, repository] = downloadSource;
// If we've reached the final case, short-circuit the API check since we know the bundle exists and is public.
if (apiURL === GITHUB_DOTCOM_API_URL && repository === CODEQL_DEFAULT_ACTION_REPOSITORY) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue you could simplify the logic by removing this default case from the list of potential download sources, and moving this return statement to after the loop.
This would have a number of knock-on advantages:

  • We get rid of the worrying "could not find an accessible CodeQL bundle" error case too which is actually dead code at the moment.
  • It would also let you simplify the list of potential sources to just the repository name, as the instance URL is no longer needed there as it's always the current instance.
  • We can avoid adding octokit as a explicit dependency and thus keep the size of this PR down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code can definitely be simplified, but I don't think we can easily remove the default case from the list, as we rely on filtering out the non-unique sources to handle the case where INSTANCE_API_URL is the same as GITHUB_DOTCOM_API_URL so we can break out early.

I've restructured the code slightly to achieve some of the advantages you've mentioned while preserving the deduplication , but let me know if I've missed a neater way of doing it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it up to you to decide if it's simpler but I think this function is equivalent to the following which doesn't use a look at all. You'd better check my working though as I found it hard getting my head around the logic to work out if the two implementations are indeed equivalent or not.

async function checkRepoForCodeQLBundle(instanceApiUrl: string, repository: string): Promise<string | undefined> {
  let [repositoryOwner, repositoryName] = repository.split("/");
  try {
    const release = await api.getApiClient().repos.getReleaseByTag({
      owner: repositoryOwner,
      repo: repositoryName,
      tag: CODEQL_BUNDLE_VERSION
    });
    for (let asset of release.data.assets) {
      if (asset.name === CODEQL_BUNDLE_NAME) {
        core.info(`Found CodeQL bundle in ${repository} on ${instanceApiUrl} with URL ${asset.url}.`);
        return asset.url;
      }
    }
  } catch (e) {
    core.info(`Looked for CodeQL bundle in ${repository} on ${instanceApiUrl} but got error ${e}.`);
  }
  return undefined;
}

async function getCodeQLBundleDownloadURL(): Promise<string> {
  const instanceApiUrl = getInstanceAPIURL();
  const codeQLActionRepository = getCodeQLActionRepository();
  if (codeQLActionRepository !== CODEQL_DEFAULT_ACTION_REPOSITORY) {
    const assetUrl = await checkRepoForCodeQLBundle(instanceApiUrl, codeQLActionRepository);
    if (assetUrl !== undefined) {
      return assetUrl;
    }
  }
  if (instanceApiUrl !== GITHUB_DOTCOM_API_URL) {
    const assetUrl = await checkRepoForCodeQLBundle(instanceApiUrl, CODEQL_DEFAULT_ACTION_REPOSITORY);
    if (assetUrl !== undefined) {
      return assetUrl;
    }
  }
  return `https://github.com/${CODEQL_DEFAULT_ACTION_REPOSITORY}/releases/download/${CODEQL_BUNDLE_VERSION}/${CODEQL_BUNDLE_NAME}`;
}

src/codeql.ts Outdated Show resolved Hide resolved
src/codeql.ts Show resolved Hide resolved
src/codeql.ts Outdated Show resolved Hide resolved
src/codeql.ts Outdated Show resolved Hide resolved
@robertbrignull
Copy link
Contributor

For symmetry I initially considered also checking GitHub.com at the repository name for the current Action, however I decided against that due to the accidental security risk someone might expose themselves to if they incorrectly mirror the Action without copying the bundle.

I agree, this seem does seem like a security risk if they pick a non-default repository name. Also I think it's unlikely it would ever usefully trigger.

Sorry for the excessive description. This change ended up being a bit more complicated than I expected and I wanted to make it clear why I made the design decision I have.

Thank you for the description. Also for raising two improvements to the actions core code.

@chrisgavin chrisgavin marked this pull request as ready for review July 27, 2020 09:29
@@ -139,13 +142,32 @@ async function getCodeQLBundleDownloadURL(): Promise<string> {
core.info(`Looked for CodeQL bundle in ${downloadSource[1]} on ${downloadSource[0]} but got error ${e}.`);
}
}
throw new Error("Could not find an accessible CodeQL bundle.");
return `https://github.com/${CODEQL_DEFAULT_ACTION_REPOSITORY}/releases/download/${CODEQL_BUNDLE_VERSION}/${CODEQL_BUNDLE_NAME}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it's possible to run into an error on line 142 above and then still return the default URL for the bundle.

This seems dangerous to me. If a user specified a bundle that can't be found, should we end the action with an error so this can be fixed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, to me the alternative seems more dangerous. People expect to be able to fork the CodeQL Action and have it still work, but if we require that CodeQL exists on the forked repository that won't necessarily be the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the implementation that's there now is equivalent to how it would work now. Currently the download URL is hardcoded to point to the default action, unless a fork explicitly changes that behaviour. So that makes me lean towards keeping it as it is and always checking the default action.

Note that if a user specifies to use a specific bundle by setting the tools input then all of this implementation is skipped and we just use the value they provided.

Copy link
Contributor

@robertbrignull robertbrignull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a couple more points for potential minor improvements but I'm going to tentatively approve this with the goal of unblocking it. This looks good to go to me assuming it's had enough manual testing.

const CODEQL_DEFAULT_ACTION_REPOSITORY = "github/codeql-action";

function getInstanceAPIURL(): string {
return process.env["GITHUB_API_URL"] || GITHUB_DOTCOM_API_URL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that this environment variable should always be populated now, so the default is unnecessary. We could use util.getRequiredEnvParam to ensure it exists too. Do you agree?

for (let downloadSource of uniqueDownloadSources) {
let [apiURL, repository] = downloadSource;
// If we've reached the final case, short-circuit the API check since we know the bundle exists and is public.
if (apiURL === GITHUB_DOTCOM_API_URL && repository === CODEQL_DEFAULT_ACTION_REPOSITORY) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it up to you to decide if it's simpler but I think this function is equivalent to the following which doesn't use a look at all. You'd better check my working though as I found it hard getting my head around the logic to work out if the two implementations are indeed equivalent or not.

async function checkRepoForCodeQLBundle(instanceApiUrl: string, repository: string): Promise<string | undefined> {
  let [repositoryOwner, repositoryName] = repository.split("/");
  try {
    const release = await api.getApiClient().repos.getReleaseByTag({
      owner: repositoryOwner,
      repo: repositoryName,
      tag: CODEQL_BUNDLE_VERSION
    });
    for (let asset of release.data.assets) {
      if (asset.name === CODEQL_BUNDLE_NAME) {
        core.info(`Found CodeQL bundle in ${repository} on ${instanceApiUrl} with URL ${asset.url}.`);
        return asset.url;
      }
    }
  } catch (e) {
    core.info(`Looked for CodeQL bundle in ${repository} on ${instanceApiUrl} but got error ${e}.`);
  }
  return undefined;
}

async function getCodeQLBundleDownloadURL(): Promise<string> {
  const instanceApiUrl = getInstanceAPIURL();
  const codeQLActionRepository = getCodeQLActionRepository();
  if (codeQLActionRepository !== CODEQL_DEFAULT_ACTION_REPOSITORY) {
    const assetUrl = await checkRepoForCodeQLBundle(instanceApiUrl, codeQLActionRepository);
    if (assetUrl !== undefined) {
      return assetUrl;
    }
  }
  if (instanceApiUrl !== GITHUB_DOTCOM_API_URL) {
    const assetUrl = await checkRepoForCodeQLBundle(instanceApiUrl, CODEQL_DEFAULT_ACTION_REPOSITORY);
    if (assetUrl !== undefined) {
      return assetUrl;
    }
  }
  return `https://github.com/${CODEQL_DEFAULT_ACTION_REPOSITORY}/releases/download/${CODEQL_BUNDLE_VERSION}/${CODEQL_BUNDLE_NAME}`;
}

@@ -139,13 +142,32 @@ async function getCodeQLBundleDownloadURL(): Promise<string> {
core.info(`Looked for CodeQL bundle in ${downloadSource[1]} on ${downloadSource[0]} but got error ${e}.`);
}
}
throw new Error("Could not find an accessible CodeQL bundle.");
return `https://github.com/${CODEQL_DEFAULT_ACTION_REPOSITORY}/releases/download/${CODEQL_BUNDLE_VERSION}/${CODEQL_BUNDLE_NAME}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the implementation that's there now is equivalent to how it would work now. Currently the download URL is hardcoded to point to the default action, unless a fork explicitly changes that behaviour. So that makes me lean towards keeping it as it is and always checking the default action.

Note that if a user specifies to use a specific bundle by setting the tools input then all of this implementation is skipped and we just use the value they provided.

@chrisgavin chrisgavin merged commit e8896a9 into main Jul 30, 2020
@chrisgavin chrisgavin deleted the search-for-bundle branch July 30, 2020 09:56
@github-actions github-actions bot mentioned this pull request Aug 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants