Skip to content

Hash name seq and quality, to test losslessness of tools and formats. Simple.

License

Notifications You must be signed in to change notification settings

EvanTheB/bam_nsq_hash

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hash a fastq or bam in an order independent way. This allows lossy data transformations to be detected. Only the name, sequence, and qualities are hashed. Only the 'primary' reads are hashed.

Raw htslib buffer values are used for hashing, so results may be dependent on version of that library. The transform to make fastq data equivalent to htslib may therefore become wrong. Reference to

Problems:

  • It does not handle 'U' '.' or '-'
  • It does not handle all the non-dna codes

Todo:

  • deal with /1 /2 named pairs.
  • test with valgrind and debugs
  • test O3 vs debug

About

Hash name seq and quality, to test losslessness of tools and formats. Simple.

Resources

License

Stars

Watchers

Forks

Packages

No packages published