Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cargo support PoC #157

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

bruno-fs
Copy link

PoC adding support for rust package manager

ATM the usage documentation is more important than code itself (which is quite messy).
I'd like to use this PR to kickoff a discussion on the design for this integration.

Maintainers will complete the following section

  • Commit messages are descriptive enough
  • Code coverage from testing does not decrease and new code is covered
  • New code has type annotations
  • Docs updated (if applicable)
  • Docs links in the code are still valid (if docs were updated)

docs/usage.md Outdated
Comment on lines 298 to 301
mkdir -p playground/pure-rust
cd playground/pure-rust
git clone git@github.com:sharkdp/bat.git --branch=v0.22.1
cachi2 fetch-deps --source bat '{"type": "cargo"}'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling this "pure-rust" because tomorrow I'm going to add a different use case: python with rust dependencies (👀 cryptography 👀 )

@bruno-fs
Copy link
Author

docs/usage.md Outdated
replace-with = "local"

[source.local]
local-registry = "/tmp/cachi2-output"
Copy link
Author

@bruno-fs bruno-fs Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine cachi2 has a way to handle paths like this one (given the replacements made on pip requirements).
Can I use something already built to create a file like this one without hardcoding a path?

Copy link
Contributor

@chmeliik chmeliik May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the RequestOutput returned from fetch_*_source functions, you can return ProjectFiles which can have an ${output_dir} placeholder

It might look something like this

ProjectFile(
    abspath=project_dir / ".cargo/config.toml",
    template=dedent(
        """
        ...

        [source.local]
        local-registry = "${output_dir}/deps/cargo"
        """
    ),
)

Environment variables are always better, but if this is the only option it should work

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added .cargo/config.toml as a project file

docs/usage.md Outdated
COPY cargo_config.toml /app/.cargo/config.toml
COPY bat /app
WORKDIR /app
RUN cargo install --locked --path .
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing cachito env part, since we currently have no env :/

docs/usage.md Outdated Show resolved Hide resolved
docs/usage.md Outdated

[cargo docs on using an alternate registry](https://doc.rust-lang.org/cargo/reference/registries.html#using-an-alternate-registry)

Unfortunately I'm still clueless on how to run an alternative registry (and this don't seem to align with cachi2 ethos), so I'm skipping this option.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd prefer a completely offline install, yeah

Though we will eventually have to figure out how to store the downloaded dependencies permanently, either for the Cachito+OSBS integration or in the future for RHTAP builds

docs/usage.md Outdated
git clone git@github.com:rust-lang/crates.io-index.git --depth 1 cachi2-output/index
```

There's a [cargo local registry package](https://crates.io/crates/cargo-local-registry) that prepares dependencies like this, but it is pretty buggy and fails for some packages I played around (including the one used on this PoC).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some way down the rabbit hole, I came across rust-lang/cargo#2857 (comment). Can cargo vendor perhaps be used to create a local index?

Even if it can't, how would using cargo vendor compare to what we're doing here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played a lot with cargo vendor and I think we can use it. It does not generate a index - instead unpack the cargos (which are essentially just tarballs) in a folder an adds a file with the checksums for all files included in each cargo.
It would be easier to reproduce cargo vendor behavior instead of creating the indexes.

Only limitation for using actual cargo vendor would be on usecases where rust is a indirect dependency (like python+rust) because it requires not only cargo.toml/lock but the whole source code of the project.

IoW, the example usage for python+rust would not be just the following

$ tree dependencies
dependencies
├── bcrypt-rust
│   ├── Cargo.lock
│   └── Cargo.toml
└── cryptography-rust
    ├── Cargo.lock
    └── Cargo.toml

It would have to contain copies of all rust sources :/

Copy link
Contributor

@chmeliik chmeliik May 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an ideal world, I think the python+rust support could get the Cargo.lock, Cargo.toml and all source code directly from the tarball of the python dependency

https://files.pythonhosted.org/packages/f7/80/04cc7637238b78f8e7354900817135c5a23cf66dfb3f3a216c6d630d6833/cryptography-40.0.2.tar.gz

$ tar tf cryptography-40.0.2.tar.gz cryptography-40.0.2/src/rust/
cryptography-40.0.2/src/rust/
cryptography-40.0.2/src/rust/Cargo.lock
cryptography-40.0.2/src/rust/Cargo.toml
cryptography-40.0.2/src/rust/build.rs
cryptography-40.0.2/src/rust/src/
...

But that would be a follow-up effort to the Rust support being discussed here. Until then, to enable the workaround outlined above, not relying on cargo vendor does sound better

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm that's an interesting idea. I was thinking on just grabbing the toml files (cargo.toml+cargo.lock) or parsing them and generating just one file (improving usability for projects with lots of rust subdependencies).

PTAL at the rendered notebook included in this PR https://github.com/containerbuildsystem/cachi2/blob/77a1de3cc3a1f0f6b6b122b530140ded19f37496/playground/python-and-rust/case-study-quipucords.ipynb

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 maybe (something like) the script that looks for Cargo.{lock,toml} in python dependency tarballs could eventually be built directly into cachi2's pip code and python+rust could be supported without any workarounds on the user's part

until that happens, it's good to know that a workaround is feasible

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cargo vendor definitely works when you have the whole source code available, but it's worth pointing out it won't work in the approach adopted in the python+rust PoC (since it is only providing Cargo.toml and Cargo.lock).

❯ tree quipucords/dependencies 
quipucords/dependencies
├── bcrypt-rust
│   ├── Cargo.lock
│   └── Cargo.toml
└── cryptography-rust
    ├── Cargo.lock
    └── Cargo.toml

3 directories, 4 files
❯ cargo vendor --manifest-path quipucords/dependencies/bcrypt-rust/Cargo.toml -s quipucords/dependencies/cryptography-rust/Cargo.toml
error: failed to parse manifest at `/home/bciconel/src/cachi2/playground/python-and-rust/quipucords/dependencies/bcrypt-rust/Cargo.toml`

Caused by:
  can't find library `bcrypt_rust`, rename file to `src/lib.rs` or specify lib.path

A custom file similar to build_requirements.txt (but for rust dependencies) + a script that can generate it smells like a better UX than what I originally proposed here, but that would obviously not work with cargo vendor as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(BTW, I'm finally working on a design doc. I shall share it soon:tm: 😅 )

dep_version = dep["version"]
download_path = Path(output_path.join_within_root(f"{dep_name}-{dep_version}.crate"))
download_path.parent.mkdir(exist_ok=True)
download_url = f"https://crates.io/api/v1/crates/{dep_name}/{dep_version}/download"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thus far we're assuming that all dependencies come from (and only from) crates.io. I imagine cargo supports many different types of dependencies (directly from github, from a local directory, etc.). Which dependency types are we planning to support?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it can be a local path or a git repo.
I believe we can support git repos, but not local paths (unless they are somehow included on the source code).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, that would be consistent with what we do for other package managers

For Go for example, we validate that local paths are subpaths of the repo

def _validate_local_replacements(modules: Iterable[ParsedModule], app_path: RootedPath) -> None:

@chmeliik
Copy link
Contributor

@bruno-fs thanks very much for getting this started!

Left some questions for the overall approach, mainly I'd like to see

  • a comparison of possible approaches to offline installs
  • description of the dependency types supported by cargo and how/if we can support them

It might also be worth looking into whether any cargo subcommands could be used to achieve everything we need. If there's a command that downloads everything and another one that returns the metadata we need to generate the SBOM, it might be preferable over custom parsing and downloading. We'd need to consider the pros and cons of both (maintainability of the code vs. the need to keep cachi2's cargo up to date, etc.).

If this is too many talking points for a PR, maybe a design doc (Google doc) would be better?

@chmeliik
Copy link
Contributor

Another unfortunate circumstance has come up: https://www.bleepingcomputer.com/news/security/rust-devs-push-back-as-serde-project-ships-precompiled-binaries/

We will eventually need to do something about that (at least report the prebuilt binaries). Probably not in the initial implementation though

@chmeliik
Copy link
Contributor

Another unfortunate circumstance has come up: https://www.bleepingcomputer.com/news/security/rust-devs-push-back-as-serde-project-ships-precompiled-binaries/

We will eventually need to do something about that (at least report the prebuilt binaries). Probably not in the initial implementation though

And the good news is that the prebuilt binary was removed in the next release, so only one version of serde has them https://github.com/serde-rs/serde/releases/tag/v1.0.184

@bruno-fs
Copy link
Author

I think this is unsolvable on our side. Crates.io would need to enforce rules to not allow people to push binaries.

@bruno-fs
Copy link
Author

What we could do is have a list of known offenders (pkg name and version) and don't allow using them, but IDK if this is a widespread issue or one edge case.

@chmeliik
Copy link
Contributor

What we could do is have a list of known offenders (pkg name and version) and don't allow using them, but IDK if this is a widespread issue or one edge case.

Or some heuristic to find binaries in the source archives. But yeah, not a concern for the Cargo support design

@bruno-fs bruno-fs force-pushed the cargo-support-poc branch 2 times, most recently from 3aecab7 to f06468c Compare October 18, 2023 20:27
- save crates to the correct deps/cargo path.
- add .cargo/config.toml as project file
- format crates as a folder with sha's for compatibility with cargo
  vendor (and thus removing the need to use crates indexes)
@cgwalters
Copy link

Just linking https://github.com/coreos/cargo-vendor-filterer/ as some related work (it's a different thing than caching individual crates, but accomplishes the same high level goal in a different way).

Was the approach of reusing an existing project like https://github.com/panamax-rs/panamax considered?

@bruno-fs
Copy link
Author

bruno-fs commented Mar 28, 2024

Just linking https://github.com/coreos/cargo-vendor-filterer/ as some related work (it's a different thing than caching individual crates, but accomplishes the same high level goal in a different way).

Neat project! Didn't knew it. Since I didn't knew, I didn't consider it but it is definitely worth a PoC. Filtering dependencies per architecture is an interesting feature.

Was the approach of reusing an existing project like https://github.com/panamax-rs/panamax considered?

I did consider using https://crates.io/crates/cargo-local-registry, but at the time I faced some bugs and decided to try a pure python implementation so we don't have to deal with bugs on other packages. It also didn't help in any way dealing with indirect rust + python dependencies (which were the main pain I wanted to solve).

Panamax doesn't seem to fit cachi2 ethos of storing dependencies locally. Building containers with rust dependencies would require a panamax instance, while strategies like the one implemented here, built in cargo vendor or cargo local-registry mentioned above wouldn't. Panamax seems like good fit for the original cachito.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants