Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notes and request for advice on Perl bioconda => conda-forge migration #25110

Open
jvolkening opened this issue Jan 22, 2024 · 11 comments
Open
Labels

Comments

@jvolkening
Copy link
Contributor

General comment:

Hello,

I would like to continue and build upon the efforts of others to bulk migrate bioconda Perl packages to conda-forge. I fully realize that general interest in this will fall somewhere in the range of weak to nonexistent, but I'd like to make sure that it is done properly the first time. https://conda-forge.org/docs/orga/guidelines.html#transferring-to-conda-forge provides some guidelines that will be followed wherever possible.

Below is a list of rules/observations/questions that have resulted from my initial exploration of the migration process. Some of them are Perl-specific and some are more general. Any advice on how to proceed with any of them is welcome and appreciated.

  1. (mostly opinion) simple build scripts should be moved from build.sh to the script section of meta.yaml. I don't know that there is any official guidance on this, but it makes sense to me.

  2. Architecture: Packages should be noarch: generic wherever possible. When this is not appropriate, packages should be built for as many platforms as is conveniently possible.

  3. Core modules: There are, according to my tabulation, 74 perl-* recipes (some conda-forge, some bioconda) which are for Perl packages that are part of the "core" distribution for any moderately recent version of Perl, including all of those for which conda packages exist. These should not be needed as external dependencies (i.e. requiring 'perl' should be enough). During migration, I would remove these as dependencies in recipes, and also work on removing them from existing conda-forge recipes. If all reverse dependencies are migrated with these dependencies removed, then the packages themselves can be removed from bioconda without migrating them to conda-forge.

  4. run_exports: The conda skeleton cpan output contains run_exports: weak: {{ name }} ={{ version }}. Many existing recipes do not include run_exports at all. In bioconda, this is now required to be including in all recipes by their linting, and I have using {{ pin_subpackage(name, max_pin="x") }} as a generic entry based on some of the bioconda docs. Personally, I don't want to introduce unnecessary restrictions on the solver. I would appreciate advice on what, if any, run_exports should be specified by default.

  5. Jinja: All migrated recipes will use Jinja variables for, at a minimum, version and sha256. I don't see a real need to use a name variable as some recipes do, as this should never change (right?).

  6. Testing: All migrated recipes will include, at a minimum, imports statements for all of the modules included in the package (based on MetaCPAN API). Additional tests included in existing bioconda recipes will be transferred as well.

  7. Dependency chain: Conda-forge docs state

    As a general rule: all dependencies have to be packaged by conda-forge as well

    The order of migration will therefore need to be chosen such that only packages with no reverse-dependencies not in conda-forge will be migrated first, and so forth. The table below should help with this ordering.

More points may be added to this comment as issues arise. I hope that another thing to arise out of this is completion and merging of PR conda-forge/conda-forge.github.io#1790 based on any discussions here.

Here conda_perl_packages.20240121.csv is a table with information that I have compiled from conda-forge, bioconda, and MetaCPAN (linked rather than embedded as it is rather long). A few of the entries may be incomplete as in rare cases my code couldn't match the conda recipe names with top-level packages in CPAN. I think that most of the columns are self-explanatory, but I can add details as needed. The "river_bucket" column is a way that MetaCPAN uses to indicate how far up the CPAN dependency chain a package exists -- the higher the bucket the more downstream dependencies a package has. It may be a way to rank importance of migration, although the actual number of conda reverse-dependencies may be a more important criterion.

@jakirkham
Copy link
Member

cc @conda-forge/bioconda-recipes

@jvolkening jvolkening mentioned this issue Jan 23, 2024
10 tasks
@jvolkening
Copy link
Contributor Author

  1. host/run dependencies: I believe I understand now that build dependencies for most or all perl recipes will only ever contain make and C/C++ compiler Jinja shortcuts, when needed. I believe host needs to contain any other dependencies needed for building and testing the package (e.g. other perl packages, etc). I thought run needed to contain any runtime dependencies. However, the recipe generated e.g. by conda skeleton cpan DBD::SQLite contains:
requirements:
  build:
    - {{ compiler('c') }}
    - make  # [not win]
    - m2-make  # [win]

  # Run exports are used now
  host:
    - perl
    - perl-dbi

  run:
    - perl
    #- perl-dbi

So, although I know perl-dbi to be a run-time dependency, it is not included in run. Is this because dependencies in host are also automatically included in run?

@jvolkening
Copy link
Contributor Author

9. Platform-specific make? I noticed conda skeleton cpan adds this for recipes which use Makefiles:

 build:
    - make  # [not win]
    - m2-make  # [win]

Is this as it should be? I take it m2-make is a Windows make part of MSYS, but I don't see it in conda-forge or default repos, unless it is part of m2-binary-packages (or it is in the msys2 channel). Or is conda skeleton doing something funny -- I've noticed other things that it doesn't handle correctly for perl packages.

@jvolkening
Copy link
Contributor Author

10. Build number For bioconda packages where the version is not changing, should build_number be incremented for the conda-forge package? I believe this will tell conda to pull the package from conda-forge, right? Otherwise I'm not sure how this would be resolved.

@jvolkening
Copy link
Contributor Author

11. No license: Somewhat surprisingly I'm running into packages with no license whatsoever. Can these even be included in conda-forge? If so, what should be used for the license string? In some cases contacting the author may work, but in other cases there is clearly no active development and even outstanding issues about the lack of license.

@BastianZim
Copy link
Member

I'm not a perl or bioconda reviewer but just to answer your more general comments:

  1. Yes, if it's only a line or two, it can be moved there.
  2. Correct, that eases the strain on our infrastructure
  3. Usually, we have two jinja variables: name and version. The sha can be defined under source, and the bot will be able to handle it. This is just how we usually handle it so you can adhere to it for uniformity reasons, but you are correct, the name doesn't have to be.
  4. That is correct. Theoretically, you could add all recipes in one PR, but that will make reviewing and submitting very difficult. Ideally, you build a graph and traverse it from the beginning.
  5. I will leave it to a bioconda expert but yes, increasing the build number should make that version the preferred one.
  6. If there is no license at all, we cannot add them as we don't have the rights to distribute it.

@jvolkening
Copy link
Contributor Author

@BastianZim thanks for the reply.

  • That is correct. Theoretically, you could add all recipes in one PR, but that will make reviewing and submitting very difficult. Ideally, you build a graph and traverse it from the beginning.

Given that there are roughly 450 packages in need of migration, my plan right now would be to start in batches of 10 or so and see how that goes. Much can be automated but not everything (at least not without making more work).

  • I will leave it to a bioconda expert but yes, increasing the build number should make that version the preferred one.

Thanks -- on gitter I just got advice that this was not needed because of strict channel priority, so all migrated packages should start fresh at build: number: 0. I was planning to move ahead as such, but then there is no guarantee that someone has strict channel priority configured -- build numbers I guess are more excplicit? Would there be any actual harm in bumping the build number from bioconda rather than resetting it, just to ensure priority?

  • If there is no license at all, we cannot add them as we don't have the rights to distribute it.

That's what I suspected. I will try to contact authors as much as possible when this comes up. Unfortunately at least one I've come across so far has some rather widely-used dependencies in my field.

@BastianZim
Copy link
Member

Given that there are roughly 450 packages in need of migration, my plan right now would be to start in batches of 10 or so and see how that goes. Much can be automated but not everything (at least not without making more work).

I see; that is substantial. In that case (and depending how much work you want to invest) it might make sense to make PRs to the respective toolings such as skeleton, Grayskull etc. and add everything that you find from there. That way, it will be easier for you to create recipes. But that's up to you and how you want to approach it.

Would there be any actual harm in bumping the build number from bioconda rather than resetting it, just to ensure priority?

Yes I saw, that is correct, and newer setups should have strict channel priority enabled. In fact, we explicitly mention it in all of our guides and set up instructions, so there should be few cases where that is not set. To answer your question, no, there is no harm, and if you want to be on the safe side, you can do that. It could be though, that some bioconda recipes already have an increased build number which might have to be handled. Just note, that increased build numbers don't guarantee either that that is chosen. It just means that it is preferred.

That's what I suspected. I will try to contact authors as much as possible when this comes up. Unfortunately at least one I've come across so far has some rather widely-used dependencies in my field.

Hmm that is annoying. Unfortunately, that is the current legal opinion so I'm not sure if we can go around that. But I understand the pain... :)

@jvolkening
Copy link
Contributor Author

Yes I saw, that is correct, and newer setups should have strict channel priority enabled

increased build numbers don't guarantee either that that is chosen. It just means that it is preferred.

Okay, I will just reset to zero. I had some code to extract the current build number from the bioconda recipe and increment it, but given the advice it makes sense to start fresh.

That's what I suspected. I will try to contact authors as much as possible when this comes up. Unfortunately at least one I've come across so far has some rather widely-used dependencies in my field.

Hmm that is annoying. Unfortunately, that is the current legal opinion so I'm not sure if we can go around that. But I understand the pain... :)

Good news on this front, for the instances encountered so far I've been able to find licensing statements either in comments in the source or else in the README/docs. Not ideal but satisfactory, and the LICENSE file can be added to the recipe. Very commonly for Perl packages the license is Artistic-1.0-Perl OR GPL-1.0-or-later, and the code and/or DOCS just contain the statement somewhere:

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

@lskatz
Copy link

lskatz commented Apr 9, 2024

Hi, how do I get help on a perl module that would have been in bioconda but is now a PR for conda-forge? If this passes the tests and gets accepted, then this will have been my first conda contribution and so I appreciate any help.

@jvolkening
Copy link
Contributor Author

Hi Lee, I just commented on your Text::Fuzzy PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants