-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document datasets #3060
Document datasets #3060
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3060 +/- ##
==========================================
+ Coverage 75.87% 76.35% +0.47%
==========================================
Files 110 110
Lines 12536 12545 +9
==========================================
+ Hits 9512 9579 +67
+ Misses 3024 2966 -58
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially important is if its .X is logarithmized, normalized, and/or filtered
Are we documenting here which of these have counts vs log vs normalized?
continue to not run the internet tests in CI. A side effect of this PR is that our tests get less flaky by not running the flaky ebi_expression_atlas doctest
What was stopping this before?
run internet tests in CI
add caching to CI
make sure the dataset functions don’t download already-downloaded data
validate cached data instead
run the internet tests (with caching) in CI
why wouldn't we want to download the data everytime? I could see it slowing things down a bit but not so much. We should at least have some sort of cache timeout so that it forces re-download every so often to ensure that aspect of things still works
yeah, I’d like to do that! It’s really not bad ❯ hatch test --internet-tests scanpy/tests/test_datasets.py::test_doc_shape scanpy/datasets/
[...]
❯ du -a .pytest_cache/d/scanpy-data/ | reject directories files apparent
╭───┬──────────────────────────────────────────────────────────────────────┬──────────╮
│ # │ path │ physical │
├───┼──────────────────────────────────────────────────────────────────────┼──────────┤
│ 0 │ /home/phil/Dev/Python/Single Cell/scanpy/.pytest_cache/d/scanpy-data │ 199.6 MB │
╰───┴──────────────────────────────────────────────────────────────────────┴──────────╯
❯ du -a .pytest_cache/d/scanpy-data/* | reject directories files apparent
╭───┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬──────────╮
│ # │ path │ physical │
├───┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤
│ 0 │ /home/phil/Dev/Python/Single Cell/scanpy/.pytest_cache/d/scanpy-data/E-MTAB-4888 │ 71.1 MB │
│ 1 │ /home/phil/Dev/Python/Single Cell/scanpy/.pytest_cache/d/scanpy-data/Targeted_Visium_Human_Glioblastoma_Pan_Cancer │ 19.7 MB │
│ 2 │ /home/phil/Dev/Python/Single Cell/scanpy/.pytest_cache/d/scanpy-data/V1_Breast_Cancer_Block_A_Section_1 │ 48.3 MB │
│ 3 │ /home/phil/Dev/Python/Single Cell/scanpy/.pytest_cache/d/scanpy-data/burczynski06 │ 16.3 MB │
│ 4 │ /home/phil/Dev/Python/Single Cell/scanpy/.pytest_cache/d/scanpy-data/moignard15 │ 3.4 MB │
│ 5 │ /home/phil/Dev/Python/Single Cell/scanpy/.pytest_cache/d/scanpy-data/paul15 │ 10.3 MB │
│ 6 │ /home/phil/Dev/Python/Single Cell/scanpy/.pytest_cache/d/scanpy-data/pbmc3k_processed.h5ad │ 24.7 MB │
│ 7 │ /home/phil/Dev/Python/Single Cell/scanpy/.pytest_cache/d/scanpy-data/pbmc3k_raw.h5ad │ 5.9 MB │
╰───┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────╯
someone implementing the caching, so nothing much really |
4877994
to
f68c2de
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failing test seems to be coming from #3068
No blockers
@pytest.mark.internet | ||
def test_visium_datasets_dir_change(tmp_path: Path): | ||
"""Test that changing the dataset dir doesn't break reading.""" | ||
with pytest.warns(UserWarning, match=r"Variable names are not unique"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use here (and elsewhere) the r
prefix? It seems unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify that match
accepts regexes. If the r
wasn’t there, it would be easy to accidentally add a backslash escape that’s intended for re
and have Python do things with it instead.
Co-authored-by: Philipp A <flying-sheep@web.de>
TODO:
Optional:
ebi_expression_atlas
doctestRendered