Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out-of-memory issues with large tsv-file #19

Closed
3 of 6 tasks
Hoeze opened this issue Nov 18, 2021 · 4 comments
Closed
3 of 6 tasks

Out-of-memory issues with large tsv-file #19

Hoeze opened this issue Nov 18, 2021 · 4 comments

Comments

@Hoeze
Copy link

Hoeze commented Nov 18, 2021

Check list

  • I have read through the README
  • I have searched through the existing issues

Environment info

  • OS
    • Linux
    • Mac OS X
    • Windows
    • Others:

Version

csview 0.3.6

Problem / Steps to reproduce

First of all, thanks for this very nice and useful tool.
I'd like to view a large TSV-type file but it dies because of out-of-memory.

Steps to reproduce:

  • wget https://storage.googleapis.com/gcp-public-data--gnomad/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.1.vcf.bgz
  • zcat "gnomad.exomes.r2.1.1.sites.1.vcf.bgz" | grep -v "^##" | csview -t | less -S

Would it be possible to limit csview's memory usage and make it play nicely with less?

@wfxr
Copy link
Owner

wfxr commented Nov 19, 2021

Hi @Hoeze, how big is the file ? csview needs to load the entire file into memory to compute alignment. Fow now the easiest way to avoid oom on too large files is to filter the data you focus on then feed them into csview:

preview first 10,000 lines:

zcat "gnomad.exomes.r2.1.1.sites.1.vcf.bgz" | grep -v "^##" | head -n 10000 | csview -t | less -S

preview lines between 100 and 200:

zcat "gnomad.exomes.r2.1.1.sites.1.vcf.bgz" | grep -v "^##" | sed -n '100,200p;201q' | csview -t | less -S

@Hoeze
Copy link
Author

Hoeze commented Nov 19, 2021

I see, thanks for the explanation @wfxr.
Yes, the files can be up to 1TB of compressed size.

One solution could be to calculate alignment on the first 10,000 lines.
Another useful feature would be to split cells which are too large into multiple lines:

+------------------+
| too large column |
+------------------+
| some cell with v |
| ery long text    |
+------------------+

@wfxr
Copy link
Owner

wfxr commented Nov 22, 2021

@Hoeze Unfortunately the crate prettytable-rs which csview depends on does not support streaming rendering. But your advice will be considered when csview implements its own rendering someday.

@wfxr wfxr closed this as completed Nov 22, 2021
@wfxr wfxr reopened this Jan 3, 2022
@wfxr
Copy link
Owner

wfxr commented Jan 3, 2022

Hi @Hoeze, streaming rendering has been implemented by #27. Thanks for your contribution! You can try the latest release. Please let me know if there are any problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants