Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

documentation for sourmash signature kmers is incorrect about exiting with bad kmers #2842

Open
jessicalumian opened this issue Nov 16, 2023 · 1 comment · May be fixed by #2856
Open

documentation for sourmash signature kmers is incorrect about exiting with bad kmers #2842

jessicalumian opened this issue Nov 16, 2023 · 1 comment · May be fixed by #2856

Comments

@jessicalumian
Copy link

Hi! I'm using sourmash signature kmers to extract kmers and a fasta from a signature of hashes of interest and the original fasta file.

My command:

sourmash sig kmers --signatures <sig file of matches> --sequences <fasta file> --save-sequences <output name> --save-kmers <output name2>

I got an error when a sourmash came across a kmer with an N.

Traceback (most recent call last):
  File "/home/jupyter-jessica/.local/bin/sourmash", line 8, in <module>
    sys.exit(main())
  File "/home/jupyter-jessica/.local/lib/python3.8/site-packages/sourmash/__main__.py", line 13, in main
    return mainmethod(args)
  File "/home/jupyter-jessica/.local/lib/python3.8/site-packages/sourmash/cli/sig/kmers.py", line 91, in main
    return sourmash.sig.__main__.kmers(args)
  File "/home/jupyter-jessica/.local/lib/python3.8/site-packages/sourmash/sig/__main__.py", line 1148, in kmers
    for kmer, hashval in kh_iter:
  File "/home/jupyter-jessica/.local/lib/python3.8/site-packages/sourmash/minhash.py", line 387, in kmers_and_hashes
    hashvals = self.seq_to_hashes(sequence,
  File "/home/jupyter-jessica/.local/lib/python3.8/site-packages/sourmash/minhash.py", line 360, in seq_to_hashes
    hashes_ptr = self._methodcall(lib.kmerminhash_seq_to_hashes, to_bytes(sequence), len(sequence), force, bad_kmers_as_zeroes, is_protein, size)
  File "/home/jupyter-jessica/.local/lib/python3.8/site-packages/sourmash/utils.py", line 25, in _methodcall
    return rustcall(func, self._get_objptr(), *args)
  File "/home/jupyter-jessica/.local/lib/python3.8/site-packages/sourmash/utils.py", line 78, in rustcall
    raise exc
ValueError: invalid DNA character in input k-mer: <kmer with an N>

The documentation for sourmash sig kmers says: By default, sig kmers ignores bad k-mers (e.g. non-ACGT characters in DNA). If --check-sequence is provided, sig kmers will error exit on the first bad k-mer.

Docs: https://sourmash.readthedocs.io/en/latest/command-line.html#sourmash-signature-kmers-extract-k-mers-and-or-sequences-that-match-to-signatures

So, the docs should be updated to say by default non-ACGT will cause sig kmers to exit.

@ctb
Copy link
Contributor

ctb commented Nov 28, 2023

aaaaactually I think the docs are the correct behavior 😆

Fixed in #2856

@ctb ctb linked a pull request Nov 29, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants