Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Vamb-jgi-filter memory issues #100

Closed

Conversation

AroneyS
Copy link
Collaborator

@AroneyS AroneyS commented Jul 3, 2023

  • Rule vamb_jgi_filter: Switch to polars streaming pipeline
  • Manually test using large coverage file

Fixes #99

@AroneyS
Copy link
Collaborator Author

AroneyS commented Jul 17, 2023

Still get errors. If manually fixed, then get more segmentation errors in finalize_stats rule. Both rules are written as "run" commands and so are directly run in the main python env.

@rhysnewell
Copy link
Owner

Hey Sam,

I think it might be best to set resources on both of these rules to pull from the maximum memory supplied by the user as is done in a few of the other rules

    resources:
        mem_mb=int(config["max_memory"])*1024

That would prevent snakemake from trying to auto-generate the memory requirements that end up being insufficient, but would also make sure the user is control of how much RAM can be used

@AroneyS
Copy link
Collaborator Author

AroneyS commented Jul 17, 2023

But doesn't that just affect how much memory snakemake assumes it will use (e.g. https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources)? So would only matter if there is memory pressure from another rule? For finalize_stats, only singlem_pipe_reads tends to run at the same time in my testing.

@rhysnewell
Copy link
Owner

It does, but probably wouldn't hurt in this scenario. The default-resources are also derived from the input file, I don't know what happens if the default resources and requested resources mismatch by a large margin.

The only thing that should kill a snakemake rule should be the scheduler or the system itself. It doesn't look like the scheduler is killing your job, but how large exactly are these coverm files that are being generated? Are they just too large for memory?

@AroneyS
Copy link
Collaborator Author

AroneyS commented Jul 17, 2023

The smallest is 1.2MB and still errored. The rule resources are resources: mem_mb=1000, disk_mb=1000, tmpdir=/data1/tmp, so 1GB RAM?

@rhysnewell
Copy link
Owner

And this is only happening in the co-assembly pipeline or just all the time?

Could you send me through a couple of example coverm.cov files that failed?

@AroneyS
Copy link
Collaborator Author

AroneyS commented Jul 17, 2023

I've sent one in an email. This is happening with both single and co-assembled assemblies, though I'm running recover independently.

@AroneyS
Copy link
Collaborator Author

AroneyS commented Jul 17, 2023

I'll try adding resources and putting them in a snakemake script

@rhysnewell
Copy link
Owner

rhysnewell commented Jul 17, 2023

Okay, so the coverm file should be totally fine just a kind of small assembly. but I looked back at the original error and noticed this warning line:
python3.10/site-packages/google/protobuf/internal/api_implementation.py:110: UserWarning: Selected implementation cpp is not available.

I think there might be something sus happening with python/protobuf/snakemake installations in the root snakemake/aviary environment that you are using. Maybe try uninstalling and re-installing protobuf. Also check out this thread: protocolbuffers/protobuf#9180

Note this is the original error, so it is occurring with pandas and not polars, is there a similar error when you are using polars?

@AroneyS
Copy link
Collaborator Author

AroneyS commented Jul 18, 2023

I am finding it difficult to reproduce the error in a simplified run (running just the vamb_jgi_filter rule). No error with pandas or polars, with or without the resources tag. But it still errors in the main Aviary v0.6.0 env. So looks like this whole thing is an env error.

@rhysnewell
Copy link
Owner

Okay cool, yeah refer to my previous comment for potential source of the issue in the environment. You should be able to update either python or protobuf without nuking the entire environment. But if you can't then, yeah you'll have to remake the environment

@AroneyS AroneyS marked this pull request as ready for review July 21, 2023 00:44
@AroneyS AroneyS closed this Jul 21, 2023
@AroneyS AroneyS deleted the fix-vamb-filter-memory-issues branch July 21, 2023 00:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

vamb_jgi_filter segmentation faults
2 participants