Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

push: Assume 0 bytes pushed if git-annex does not provide bytesize #7049

Merged
merged 2 commits into from Sep 21, 2022

Conversation

yarikoptic
Copy link
Member

Otherwise we can get KeyError and underlying code can provide records without bytesize. Reported in https://neurostars.org/t/datalad-push-causes-keyerror-bytesize-keyerror/23570 with

datalad==0.15.4
git==2.35.0
git-annex==8.20210903

first I failed to reproduce with the following script following user report with --fast
#!/bin/bash
# https://neurostars.org/t/datalad-push-causes-keyerror-bytesize-keyerror/23570
set -eu

cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)"
set -eu

datalad create -f .

git annex addurl --fast --file sample3.csv https://filesamples.com/samples/document/csv/sample3.csv
datalad save sample3.csv

git remote add --fetch gin git@gin.g-node.org:/yarikoptic/test-push.git
datalad -l debug push --to gin
but then reproduced nicely with --relaxed
#!/bin/bash
# https://neurostars.org/t/datalad-push-causes-keyerror-bytesize-keyerror/23570
set -eu

cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)"
set -eu

datalad create -f .

git annex addurl --relaxed --file sample3.csv https://filesamples.com/samples/document/csv/sample3.csv
datalad save sample3.csv

ls -l 

git remote add --fetch gin git@gin.g-node.org:/yarikoptic/test-push.git
datalad push --to gin --force=all

and verified that with this PR we do not crash. Given that git-annex unlikely to even provide progress report if not enough time passed, I think I would not bother crafting a unittest which would trigger this condition atm.

Otherwise we can get KeyError and underlying code can
provide records without bytesize
@yarikoptic yarikoptic added cmd-push semver-patch Increment the patch version when merged labels Sep 20, 2022
yarikoptic added a commit to yarikoptic/datalad that referenced this pull request Sep 20, 2022
Inspired by a fix we needed for push:
datalad#7049

I looked into status to realize that we do not even consider those files
for which git-annex would not report bytesize.  This commit makes them
considered in count but would assume 0 bytesize for "recorded" etc.

It would not address the issue that we would still not
consider/interrogate sizes for those files even if their content
available:

	$> datalad status --annex=all
	2 annex'd files (0.0 B/0.0 B present/total size)

	$> ls -lL *
	-r-------- 1 yoh yoh   0 Sep 20 15:52 123.dat
	-r-------- 1 yoh yoh 723 Sep 20 15:57 sample3.csv

where sample3.csv is the relaxed one.
@yarikoptic yarikoptic added the CHANGELOG-missing When a PR's description does not contain a changelog item, yet. label Sep 20, 2022
@github-actions github-actions bot removed the CHANGELOG-missing When a PR's description does not contain a changelog item, yet. label Sep 20, 2022
@codecov
Copy link

codecov bot commented Sep 20, 2022

Codecov Report

Base: 45.45% // Head: 88.36% // Increases project coverage by +42.90% 🎉

Coverage data is based on head (69786a6) compared to base (3c220d6).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@             Coverage Diff             @@
##            maint    #7049       +/-   ##
===========================================
+ Coverage   45.45%   88.36%   +42.90%     
===========================================
  Files         354      354               
  Lines       58559    46336    -12223     
  Branches     6615        0     -6615     
===========================================
+ Hits        26618    40943    +14325     
+ Misses      31779     5393    -26386     
+ Partials      162        0      -162     
Impacted Files Coverage Δ
datalad/core/distributed/push.py 91.22% <100.00%> (+30.16%) ⬆️
datalad/cmdline/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/interface/run_procedure.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/cmdline/helpers.py 0.00% <0.00%> (-78.27%) ⬇️
datalad/tests/test_tests_utils.py 0.00% <0.00%> (-69.13%) ⬇️
datalad/support/nda_.py 0.00% <0.00%> (-54.55%) ⬇️
datalad/tests/utils.py 20.50% <0.00%> (-36.46%) ⬇️
datalad/distribution/uninstall.py 63.26% <0.00%> (-28.58%) ⬇️
datalad/api.py 75.60% <0.00%> (-17.73%) ⬇️
datalad/__main__.py 39.65% <0.00%> (-13.80%) ⬇️
... and 277 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@yarikoptic
Copy link
Member Author

uff all travis fails seems to be due to unrelated

____________________________ test_run_datalad_help _____________________________
[gw0] linux -- Python 3.8.6 /tmp/dl-miniconda-q9fqns6s/bin/python
    @assert_cwd_unchanged
    def test_run_datalad_help():
        out, err = check_run_and_get_output("datalad --help")
        ok_startswith(out, "Usage: ")
        # There could be a warning from coverage that no data was collected, should be benign
        lines = [l for l in err.split(os.linesep) if ('no-data-collected' not in l) and l]
>       eq_(lines, [])
../datalad/tests/test_installed.py:47: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
first = ["/tmp/dl-miniconda-q9fqns6s/lib/python3.8/site-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 ...ported version!", '  warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn\'t match a supported "']
second = [], msg = None
    def assert_equal(first, second, msg=None):
        if msg is None:
>           assert first == second
E           assert ['/tmp/dl-min... supported "'] == []
E             Left contains 2 more items, first extra item: "/tmp/dl-miniconda-q9fqns6s/lib/python3.8/site-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.8) or chardet (5.0.0)/charset_normalizer (2.0.4) doesn't match a supported version!"
E             Full diff:
E               [
E             -  ,
E             +  '/tmp/dl-miniconda-q9fqns6s/lib/python3.8/site-packages/requests/__init__.py:102: '
E             +  'RequestsDependencyWarning: urllib3 (1.26.8) or chardet '
E             +  "(5.0.0)/charset_normalizer (2.0.4) doesn't match a supported version!",
E             +  '  warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) '
E             +  'doesn\'t match a supported "',
E               ]
../datalad/tests/utils_pytest.py:107: AssertionError

so may be it was too early for 0f27cd5 in #7029 ?

@yarikoptic
Copy link
Member Author

no objections, the fix is trivial, let's proceed and retry releasing again!

@yarikoptic yarikoptic added the release Create a release when this pr is merged label Sep 21, 2022
@yarikoptic
Copy link
Member Author

@jwodder -- note that here as well we didn't have github actions ran and changelog entry was pushed by github-actions bot although I thought we should have switched to that yarikoptic-gitmate token/account... will need to check/keep an eye...

@yarikoptic yarikoptic merged commit 5128733 into datalad:maint Sep 21, 2022
@yarikoptic-gitmate
Copy link
Collaborator

PR released in 0.17.6

@yarikoptic yarikoptic deleted the bf-push-bytesize branch October 14, 2022 12:59
yarikoptic added a commit to yarikoptic/datalad that referenced this pull request Jun 9, 2023
Inspired by a fix we needed for push:
datalad#7049

I looked into status to realize that we do not even consider those files
for which git-annex would not report bytesize.  This commit makes them
considered in count but would assume 0 bytesize for "recorded" etc.

It would not address the issue that we would still not
consider/interrogate sizes for those files even if their content
available:

	$> datalad status --annex=all
	2 annex'd files (0.0 B/0.0 B present/total size)

	$> ls -lL *
	-r-------- 1 yoh yoh   0 Sep 20 15:52 123.dat
	-r-------- 1 yoh yoh 723 Sep 20 15:57 sample3.csv

where sample3.csv is the relaxed one.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmd-push release Create a release when this pr is merged semver-patch Increment the patch version when merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants