Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for go workspaces #457

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ejegrova
Copy link
Contributor

@ejegrova ejegrova commented Jan 24, 2024

discussion: #398

checksum validation:

  • go.work.sum + all go.sum files in modules are used to get checksums
  • if checksum is not found, the property cachi2:missing_hash:in_file has value of go.work.sum

JIRA: STONEBLD-2043

Maintainers will complete the following section

  • Commit messages are descriptive enough
  • Code coverage from testing does not decrease and new code is covered
  • Docs updated (if applicable)
  • Docs links in the code are still valid (if docs were updated)

Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:

Copy link
Collaborator

@eskultety eskultety left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is still a draft I only had some immediate comments on the proposal but didn't do a detailed in-depth review of the logic fitting the overall design.

cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved
cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved
cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved
cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved
cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved
cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved
cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved
@ejegrova ejegrova force-pushed the workspaces branch 4 times, most recently from 63706be to 3be712f Compare March 14, 2024 12:05
@ejegrova ejegrova marked this pull request as ready for review March 14, 2024 12:41
@ejegrova ejegrova marked this pull request as draft April 10, 2024 09:49
@ejegrova ejegrova force-pushed the workspaces branch 2 times, most recently from c67073f to a1cbcdb Compare April 15, 2024 13:00
@ejegrova ejegrova force-pushed the workspaces branch 2 times, most recently from 3c186c7 to a9b5e69 Compare April 17, 2024 08:19
@ejegrova ejegrova marked this pull request as ready for review April 17, 2024 12:29
@ejegrova
Copy link
Contributor Author

changed integration test repo to include checks for missing_hash property

Copy link
Contributor

@taylormadore taylormadore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally, it looks good to me 🚀 Hopefully my knowledge of go workspaces is sufficient for that statement to mean anything :harold:

Left a couple of minor nitpicks. Also, can the packages with missing checksums be noted in the README of the integration test repo+branch?

cachi2/core/package_managers/gomod.py Show resolved Hide resolved
cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved

repo_root = RootedPath(tmp_path)

go_work_root = _get_go_work_root(repo_root, Go(), {"cwd": repo_root})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this calling go directly? Should we be mocking that?

Copy link
Collaborator

@eskultety eskultety left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, I learnt something new during the review, e.g. that any repo-local submodule still needs to be declared uniquely within all Go modules and so it is imperative one still uses the replace keyword to denote that the local implementation should be used even though that is the only one!
Anyway, I think we need some documentation update on workspaces, don't we?

tests/unit/package_managers/test_gomod.py Outdated Show resolved Hide resolved
@@ -32,6 +32,8 @@
_create_modules_from_parsed_data,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit fails unit tests with:

FAILED tests/unit/package_managers/test_gomod.py::test_resolve_gomod_vendor_without_flag - FileNotFoundError: [Errno 2] No such file or directory: '/usr/bin/go'
FAILED tests/unit/package_managers/test_gomod.py::test_get_go_work_root - FileNotFoundError: [Errno 2] No such file or directory: '/usr/bin/go'
FAILED tests/unit/package_managers/test_gomod.py::test_get_go_work_root_when_go_work_does_not_exist - FileNotFoundError: [Errno 2] No such file or directory: '/usr/bin/go'
FAILED tests/unit/package_managers/test_gomod.py::test_get_go_work_root_when_go_work_is_outside_of_repo - FileNotFoundError: [Errno 2] No such file or directory: '/usr/bin/go'

so a mock to subprocess.run or maybe even better get_go_work_root needs to be added to the test cases in question, though test_resolve_gomod_vendor_without_flag in particular might be a harder one to mock correctly without affecting anything else, but it might turn out just fine as well, you'll see.

cachi2/core/package_managers/gomod.py Show resolved Hide resolved
else:
module_list.append(module)

# should never happen, since the main module will always be a part of the json stream
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so all the workspace branches in the integration test repo are invalid then, correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can currently see 3 branches that are related to workspaces: nested_main_module, worspace_separate_modules and go_workspaces. I believe the first two ones are leftovers that should be deleted. All of these were created as we evolved the code to try to cover more corner cases.

The go_workspaces branch now covers the most weird scenario: the root repo is not a go project, and the workspaces root is the first nested directory (worspace_modules). I am not sure we need such a complicated case in the integration test, but the bright side is that we're covering the worst we could come up with.


def _process_modules_json_stream(
app_dir: RootedPath, modules_json_stream: str
) -> tuple[ModuleDict, list[ModuleDict]]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: we should return a single list with the main module as the first element, the extraction should be done on the caller's end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, that will make the output a little less clear, since there's no way to tell that the first module is always the main module unless you look at this function's docstring or implementation. By returning a tuple, it is at least implicit that the first tuple element is unique somehow when compared to the elements in the list.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is IMO an inferior approach from design POV. If an element is special it needs to be reflected in the docs instead of trying to figure out a creative way how such a semantics needs to be represented in a data type.

cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved
Comment on lines 849 to 857
go_sum_files = _get_go_sum_files(go_work_root, go, run_params)
modules_in_go_sum = _parse_go_sum_files(go_sum_files)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact we need to first get the list of sum files should be transparent to _resolve_gomod as a caller and instead be done in _parse_go_sum_files for enhanced clarity, especially since the variable is not being used anywhere later in the function.
Alternatively, you could keep _get_go_sum_files here as a convenience function call, however, you'd then loop directly over it as _parse_go_sum_files doesn't seem to bring much value really since it's just a simple loop wrapper which is maybe even better than the former suggestion.

Comment on lines 863 to 868
main_module_name = go([*go_list, "-m"], run_params).rstrip()
modules_json_stream = go([*go_list, "-m", "-json"], run_params).rstrip()
main_module_dict, workspace_dict_list = _process_modules_json_stream(
app_dir, modules_json_stream
)

path = main_module_dict["Path"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having recently worked on the toolchain selection which gave me headaches wrt/ unit tests because _resolve_gomod is a beast of a function my impression has become that anything that needs to execute a Go command should go into a separate function to potentially make mocking in test_resolve_gomod much easier.

Also from logical perspective this particular block seems to be open-coded quite a bit. What if we introduced a function, say _parse_modules along the following lines:

def _parse_modules(app_dir, version_resolver) -> list[ParsedModule]:
  run_go_list_json
  process_modules_json_stream

  ...
  return [main_module] + workspace_modules

and then in resolve_gomod we would just pop the main module out of the list for some further processing, but the idea is to make the code in resolve_gomod cleaner, leaner and easier to follow. Do you think the ^above would help achieving that by consolidating the special casing of the main module we're doing here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree this is a better approach, but since this is refactoring work, can we do it as a follow up?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be afraid it would remain TODO forever :), but okay, UNLESS it turns out more substantial changes are needed within this PR.

cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved
# Test case checks if cachi2 can process go workspaces properly.
pytest.param(
utils.TestParameters(
repo="https://github.com/cachito-testing/cachi2-gomod.git",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the repo the workspace cases don't seem to include a nuance of a main module i.e. having a go.mod file checked in - shouldn't we change that and make sure the integration test repos are proper Go projects?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case does have a go.mod file, it's just buried very deep within: https://github.com/cachito-testing/cachi2-gomod/tree/go_workspaces/workspace_modules/hello

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the main go.mod module, should a project, even if using workspaces, have a go.mod file right next to the go.work file?

@ejegrova ejegrova force-pushed the workspaces branch 4 times, most recently from 6c17a97 to cc7eb2d Compare May 28, 2024 15:26
ejegrova and others added 3 commits May 28, 2024 18:17
Whenever workspaces are enabled, the "go list -m" command will return a
list of all workspaces modules instead of the usual single module
present in the path being processed by Cachi2.

For this reason, we need to properly parse this extra data so that they
can be included in the resulting SBOM.

Signed-off-by: ejegrova <ejegrova@redhat.com>
All go.sum files and go.work.sum are checked for checksums. If not
found, the property cachi2:missing_hash:in_file has value of
go.work.sum.

Signed-off-by: Bruno Pimentel <bpimente@redhat.com>
Signed-off-by: ejegrova <ejegrova@redhat.com>
@eskultety
Copy link
Collaborator

My question is this: what is the main problem of letting users point cachi2 to the go.work file directly the same way as we expect them to do with go.mod for the main module to be processed? That would lead to a consistent behaviour from UX perspective where we'd have to process all dependent modules. What are the drawbacks of that? @brunoapimentel do you see a use case where ^this wouldn't work and simply fail with cachi2 logic?

@@ -860,13 +880,27 @@ def _resolve_gomod(
# Make Go ignore the vendor dir even if there is one
go_list.extend(["-mod", "readonly"])

main_module_name = go([*go_list, "-m"], run_params).rstrip()
# breakpoint()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a left-over from some debugging.

Comment on lines +991 to +992
else:
go_work_root = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: if go_work_root is initialized at the top of the function, this clause isn't needed.

Comment on lines +282 to +288
run_side_effects.append(
proc_mock(
"go work edit -json",
returncode=0,
stdout=get_mocked_data(data_dir, "workspaces/go_work.json"),
)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from this particular hunk, this test function is pretty much bit-for-bit identical to the existing test_resolve_gomod function. Given that that's the case, we should better mock _get_go_sum_files and add this as another parametrize case to the existing test case, this is simply a beast of a test testing a beast of a function which leads to enormous code redundancy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants