-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH(nextalign cli): show default values in --help usage statement #1253
Comments
Thanks for the feedback, much appreciated 😊
We're very soon going to release version 3 which uses a much more robust
seeding algorithm that should have no issues with Norovirus.
We're using clap for the cli, I'm not sure whether it's easy to display
defaults. I agree that it would be good to show what they are with help.
Meanwhile you can find the defaults in parameters.rs, let me look it up.
…On Mon, Sep 11, 2023, 23:10 Angie Hinrichs ***@***.***> wrote:
Hi! I'm trying out nextalign on norovirus genomes (small ssRNA, ~7.5kb,
but highly diverged), and most sequences are unalignable with nextalign's
default settings (Unable to align: low seed matching rate. Details:
number of seeds: 73, number of seed matches: 2, matching rate: 0.027,
required matching rate: 0.300. Note that this sequence will not be included
in the results.).
I'd like to try playing with the seed parameters. nextalign's --help
statement describes the params but not their default values:
--seed-length <SEED_LENGTH>
k-mer length to determine approximate alignments between query and reference and
determine the bandwidth of the banded alignment
--mismatches-allowed <MISMATCHES_ALLOWED>
Maximum number of mismatching nucleotides allowed for a seed to be considered a match
--min-seeds <MIN_SEEDS>
Minimum number of seeds to search for during nucleotide alignment. Relevant for short
sequences. In long sequences, the number of seeds is determined by `--seed-spacing`
--min-match-rate <MIN_MATCH_RATE>
Minimum seed mathing rate (a ratio of seed matches to total number of attempted seeds)
--seed-spacing <SEED_SPACING>
Spacing between seeds during nucleotide alignment
It would be nice to know the default values as a starting point for
exploring the parameter space. I guess I could figure out the seed length
from the 'Unable to align' messages 🙂 but it would be very nice if the
--help told them all. Thanks!
—
Reply to this email directly, view it on GitHub
<#1253>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF77AQNSGAWFCT46A6CALCLXZ545VANCNFSM6AAAAAA4T5LZ2A>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
The hardcoded defaults for v2 are here (branch v2): nextclade/packages_rs/nextclade/src/align/params.rs Lines 106 to 131 in 242d56f
For v3 (not stable, branch master) the hardcoded defaults are here: nextclade/packages_rs/nextclade/src/align/params.rs Lines 141 to 174 in 119cd4a
There are 2 important changes to consider in the upcoming Nextclade v3:
Because we are removing Nextalign, it does not make sense to add params into its help text anymore, as we are not planning any more releases. Regarding Nextclade: the datasets can (and do) override parameters (using In the meantime, one thing you can try is to add
UPD: This statement is incorrect for v2:
Nextclade/Nextalign v2 only print the CLI args, before merging-in the defaults, which is probably not very useful. This will change in v3. |
If you want to try Nextclade v3: You can download prebuilt binaries on GitHub Actions:
Or you can build it from source, from master branch, using our dev guide: But v3 is not released and not stable yet. It's a bit of a crazy land still, and things might break. In which case you can try a slightly earlier version in the list of GitHub Actions. When things calm down a bit, we'll probably release an alpha version, or a few. We appreciate early testing and feedback! |
Thanks @ivan-aksamentov! I will give both a try. I see v3 can be run without a dataset if --input-ref is provided, great. 🚀 |
Hi! I'm trying out nextalign on norovirus genomes (small ssRNA, ~7.5kb, but highly diverged), and most sequences are unalignable with nextalign's default settings (
Unable to align: low seed matching rate. Details: number of seeds: 73, number of seed matches: 2, matching rate: 0.027, required matching rate: 0.300. Note that this sequence will not be included in the results.
).I'd like to try playing with the seed parameters. nextalign's
--help
statement describes the params but not their default values:It would be nice to know the default values as a starting point for exploring the parameter space. I guess I could figure out the seed length from the 'Unable to align' messages 🙂 but it would be very nice if the
--help
told them all. Thanks!The text was updated successfully, but these errors were encountered: