Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No canonical way to add an existing RIA remote, users following the manual/error message suggestions are lead into broken situations #7543

Open
mlell opened this issue Dec 21, 2023 · 0 comments

Comments

@mlell
Copy link

mlell commented Dec 21, 2023

What is the problem?

Given you have a RIA remote and clone from it a dataset A. Now, from A, clone a dataset B. Dataset B does not have the RIA remote enabled and there is no way to do it without manually setting up various git config settings.

Users trying to do this will probably try two ways, both of which lead to a broken situation with multiple annex UUIDs assigned to the same folder, which is a recipe for data loss (as git annex'es numcopies saveguard is fooled by this):

  • manually using git remote add ... and git annex enableremote: This leads to a git annex message telling the user that the RIA remote is probably broken and they should reinitialize it. If they follow this advice, they ruin the format of the RIA store:
Remote origin: This repository is not initialized for use by git-annex, but /test/datalad-ria-addremote/ria/20d/da232-002f-4e09-90b6-62924a994136/annex/objects/ exists, which indicates this repository was used by git-annex before, and may have lost its annex.uuid and annex.version configs. Either set back missing configs, or run git-annex init to initialize with a new uuid.
  • Running datalad create-sibling-ria --existing=reconfigure. This does not just add the RIA remote but modifies assigns a new annex UUID to the RIA store, which is a dangerous option. The RIA store then has two annex UUIDs in its config! I do not see any use for that, at the very least I think this option should be called --reckless reconfigure or similar, as for dangerous commands elsewhere
  • The only way is to use the manual way but not follow the error messages but call git config several times to recreate the desired configuration. Only users with in-depth knowledge of git annex and the RIA format can do this, i.e. only datalad developers, certainly not the user base I think datalad targets (scientists that are not familiar with the datalad codebase)

What steps will reproduce the problem?

Follow this script:

set -ue

# Example repository
datalad create a
cd a
echo "hello" > file
datalad save
id="$(git config -f .datalad/config datalad.dataset.id)"
riadir="${id::3}/${id:3}"

# Set up and sync the RIA store
datalad create-sibling-ria --new-store-ok -s ria ria+file://$PWD/../ria
datalad push --to ria
cd ..
datalad clone ria+file://$PWD/ria#$id b
datalad clone ria+file://$PWD/ria#$id c

# Dump the configurations for comparison
git -C a config --get-regexp remote | sort > a-create-ria.config
git -C b config --get-regexp remote | sort > b-clone-ria.config
git -C ria/$riadir config --list --local| sort > ria1.config

## Now simulate that we do not have the ria store initialized. Either
# we deleted it or we cloned another copy of the repo and not directly
# from the RIA store
cd b
git remote remove origin
git remote remove ria-storage


# First attempt: manually add
git remote add origin "$(realpath  ../ria/$riadir)"
git annex enableremote ria-storage
git config --get-regexp remote | sort > ../b-add-manual.config

# Config is erroneous
cd ..
diff -ud b-clone-ria.config b-add-manual.config > b-add-manual.diff || true
# Try --existing=reconfigure
cd c
git remote remove origin
git remote remove ria-storage

datalad create-sibling-ria --existing=reconfigure -s ria-new ria+file://$PWD/../ria

cd ..
git -C c config --get-regex remote | sort > c-after-reconfig.config
diff -ud b-clone-ria.config c-after-reconfig.config > c-after-reconfig.diff || true
pwd
git -C ria/$riadir config --list --local| sort > ria-after-reconfig.config
diff -ud ria1.config ria-after-reconfig.config > ria-after-reconfig.diff || true



####### Problem 1. Manual config does not set ignoreannex config option of RIA git remote,
####### causing a git-annex warning for many datalad operations. But following that warning
####### would change the remote annex UUID which is bad.
cat b-add-manual.diff

####### Problem 2. create-sibling-ria --existing=reconfigure leads
####### to duplicate annex IDs on the RIA remote, potentially creating
####### chaos
cat ria-after-reconfig.diff

DataLad information

No response

Additional context

No response

Have you had any success using DataLad before?

No response

@mlell mlell changed the title No canonical way to add an existing RIA remote, users readig the manual/help pages are lead into broken situations No canonical way to add an existing RIA remote, users following the manual/error message suggestions are lead into broken situations Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant