Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hypothesis VCF #971

Merged
merged 6 commits into from
Dec 5, 2022
Merged

Hypothesis VCF #971

merged 6 commits into from
Dec 5, 2022

Conversation

tomwhite
Copy link
Collaborator

@tomwhite tomwhite commented Dec 2, 2022

Fixes #951

Example of a VCF that this produces:

>>> from sgkit.tests.io.vcf.hypothesis_vcf import vcf
>>> print(vcf().example())
##fileformat=VCFv4.3
##FILTER=<ID=PASS,Description="All filters passed">
##source=sgkit-vcf-hypothesis-0.4.1.dev20+g839eb9a9
##contig=<ID=nmq>
##INFO=<ID=YP,Type=Float,Number=.,Description="INFO,Type=Float,Number=.">
##INFO=<ID=A3,Type=Character,Number=1,Description="INFO,Type=Character,Number=1">
##FORMAT=<ID=B1,Type=Float,Number=3,Description="FORMAT,Type=Float,Number=3">
##FORMAT=<ID=C0,Type=Float,Number=A,Description="FORMAT,Type=Float,Number=A">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	6HBVKzs6yX
nmq	59926	.	G	ACGA,NNCCGCACAGCT,TTCCTANGTC	.	.	.	B1:C0	.,0.0,.:1.0,.,.
nmq	123311762	Q	C	.	.	.	YP=282574521892864.0	B1	6760902367903744.0,.,.

One notable limitation is that it doesn't generate GT fields. This could be added later.

I've already found (and fixed) one problem with the VCF reader. There are more problems to fix with the writer, which I will address in follow up issues.

@codecov-commenter
Copy link

codecov-commenter commented Dec 2, 2022

Codecov Report

Merging #971 (be706bf) into main (3e4e328) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##              main      #971    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           41        41            
  Lines         3841      4276   +435     
==========================================
+ Hits          3841      4276   +435     
Impacted Files Coverage Δ
sgkit/io/vcf/utils.py 100.00% <100.00%> (ø)
sgkit/stats/pedigree.py 100.00% <0.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Collaborator

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very impressive!

vcf_number: str

def get_header(self):
return f'##{self.category}=<ID={self.vcf_key},Type={self.vcf_type},Number={self.vcf_number},Description="{self.category},Type={self.vcf_type},Number={self.vcf_number}">'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised Black accepts this - any chance we could split it across multiple lines for the console based old fogeys?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flake8 Line length errors are disabled in this repo (https://github.com/pystatgen/sgkit/blob/main/setup.cfg#L88) and black won't split stings unless you use --preview (psf/black#1331)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @benjeffery. So presumably we'll pick that up in a future version of black.

@tomwhite tomwhite added the auto-merge Auto merge label for mergify test flight label Dec 5, 2022
@mergify mergify bot merged commit 4ce7e55 into sgkit-dev:main Dec 5, 2022
@tomwhite tomwhite deleted the hypothesis-vcf branch December 5, 2022 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Auto merge label for mergify test flight
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hypothesis VCF strategies
4 participants