Skip to content

ProjectConfig and Git Context Info

David Reed edited this page Dec 15, 2022 · 7 revisions
  • The ProjectConfig stores repo context info in an untyped dict property, _repo_info.
  • The property can be supplied to the constructor as a kwarg, where it's not validated.
  • The repo_info property returns the underlying dict if present, but otherwise attempts to lazy-load it.

Possible keys in this dict:

  • ci: Optional[str]
  • branch: Optional[str]
  • commit: Optional[str]
  • root: Optional[Path] (string holding a filesystem path)
  • url: Optional[AnyUrl]
  • name: Optional[str]
  • owner Optional[str]
  • domain: Optional[str] - used only for GitHub Enterprise?

The sequence of data sources for repo_info inside ProjectConfig runs like this:

  • If data was provided at the initializer, we use that, and no further action is taken.
  • If data was already loaded from a previous call to the repo_info property, no further action is taken.
  • If the env CUMULUSCI_AUTO_DETECT is set to a falsy value, no inference takes place, and the dict may be empty of Git information, holding only ci: None.
  • If HEROKU_TEST_RUN_ID is present:
    • We set branch to HEROKU_TEST_RUN_BRANCH.
    • We set commit to HEROKU_TEST_RUN_COMMIT_VERSION.
    • We set ci to heroku.
    • We set root to /app.
  • If CUMULUSCI_REPO_BRANCH is set, use its value as branch.
  • If CUMULUSCI_REPO_COMMIT, use its value as commit.
  • If CUMULUSCI_REPO_ROOT, use its value as root.
  • If CUMULUSCI_REPO_URL is set, use its value as as url, and also parse out owner and name.
  • If and only if ci is truthy, we validate that the keys branch, commit, name, owner, root, and url are present. This code runs only under Heroku CI if initializer overrides were not provided. This code path may never execute.

How ProjectConfig Gets Actual Git Details

The ProjectConfig has properties:

  • repo_root
  • server_domain
  • repo_name
  • repo_url
  • repo_owner
  • repo_branch
  • repo_commit

Each property checks the repo_info property for its related key, and returns it if found. Otherwise, it will touch the local filesystem to attempt inference, based on the idea that the local working directory is in fact a Git repo.

Some of the inference strategies go to other properties, while others use code paths like current_branch() or directly examine the filesystem.

Note that accessing the property may return a completely different result than accessing repo_info['some_key']. Property results are not persisted into the _repo_info dict.

Places We Inject Context Info for the ProjectConfig

  • The GitHubSource class (cumulusci/core/source/github.py). We provide all of the known keys except ci.
  • The LocalFolderSource class, which provides only root
  • In the MetaDeploy Publish class, we create a ProjectConfig for the commit we've downloaded and want to publish. We provide all of the keys except ci, but branch contains either a tag or a SHA, not an actual branch name.
  • The BaseCumulusCI class, which is the parent class of CliRuntime class, passes its own kwargs to the BaseProjectConfig, including any potential repo_info overrides. This doesn't appear to be used inside the cumulusci package, but it is used in MetaDeploy.

Potential Refactor Opportunities

What if we did something like this?

from pydantic import BaseModel

# class BaseProjectConfig:
#     context: LocalContextInfo | RepoInfo

class LocalContextInfo(BaseModel):
    root: Path

class RepoInfo(LocalContextInfo):
    ci: Optional[str]
    branch: str  # May actually be a ref, not a branch
    commit: str
    url: AnyUrl
    domain: Optional[str] # used only for GitHub Enterprise?

    @property
    def name(self) -> Optional[str]:
        ...

    def owner(self) ->  Optional[str]:
        ...

    def from_web_service():
        ...

    def from_env():
        ...

    def from_dict():
        pass

    def from_local_context():
        pass

Do we need the results from our properties to actually be pulled dynamically (live) at time of access? Or can we cache them at first access?

We could potentially do a refactor while keeping the current on-the-fly retrieval of the data.

Web App Uses

MetaDeploy

MetaDeploy has its own BaseProjectConfig subclass, MetadeployProjectConfig. Its initializer both looks for a repo_info kwarg, and uses it if present, and constructs one from the MetaDeploy database for the current installation job.

The MetaDeploy repo_info contains root, url, name, owner, and commit, but not branch, which MetaDeploy doesn't have. The commit may not be a SHA here - it may be a ref (tag/branch).

The codepath of passing the repo_info kwarg to MetaDeployProjectConfig does not appear to be used.

The pathway through kwargs on BaseCumulusCI is used in MetaDeploy, where create_scratch_org() provides repo_info with root, url, name, owner, commit. commit is the Plan's commit_ish, so it's really a ref.

Metecho

Metecho runs CumulusCI in a subprocess (in run_flow()). It does not provide any repo-related env vars in that context.

It also uses the BaseCumulusCI route to override in multiple places.

  • In dataset_env;
  • In run_retrieve_task;
  • In create_org().

All of these uses provide root, url, name, owner, commit, but not branch.