Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autocreate Zenodo DOIs and create shared access links #5879

Merged
merged 41 commits into from
Jun 3, 2024

Conversation

kathy-t
Copy link
Contributor

@kathy-t kathy-t commented May 6, 2024

Description

Updated PR Description Zenodo swagger PR: https://github.com/dockstore/swagger-java-zenodo-client/pull/21

This PR has been updated with the feedback from the story time discussion.

This PR requires the creation of a Dockstore-owned personal access token on Zenodo as well as a Dockstore community on Zenodo. This personal access token is used to automatically create DOIs.

Automatic DOI Creation Behaviour

  • DOIs are automatically created for valid published tags for Workflow-based entries on refresh, GitHub App release, and on publish. During automatic DOI creation, we skip creating a snapshot.
  • The DOI is only created the first time it is invoked. If tags are recreated, we do not create a new DOI. This can be an enhancement if we want to go down that route.
    • I originally didn't do it on publish but after manual testing, the user experience to automatically create DOIs for a new workflows that aren't originally published felt clunky. For example, a user creates a new GitHub app workflow that isn't published so no DOIs are created. They publish it. They must now generate push events for DOIs to be automatically created.
  • There is no option to opt-out.
  • A new Doi entity is added so that we can track multiple DOIs from multiple sources (user, GitHub, Dockstore). This means that users can have both user-generated and Dockstore-generated DOIs, along with GitHub DOIs.
    • I have deprecated the old DOI columns instead of removing them so that there is no down time during the release. There is no trigger to keep the new and old columns in sync. Is it worth it to add a trigger?
  • Note that we don't allow the deletion of entries that were previously published. Since the auto DOI creation only occurs for published tags, this means that DOIs in our database won't be orphaned.
  • Tags with only Dockstore-generated DOIs can be deleted since they are not frozen...this can result in an orphaned DOI.

Dockstore Zenodo Community

  • New DOIs, whether user or Dockstore generated, are added to the Dockstore Zenodo community. We do not add old records to the community retro-actively.

Shared Access Links

  • Three endpoints are added to manage shared access links that allow users to edit the Dockstore-generated DOIs. The access link returns an ID and a token. When this token is appended to the Zenodo DOI URL, the user is able to edit the record if they are logged into Zenodo. I did not store this token in the database because I feared that we could accidentally expose it. Only the link ID is stored and when requested, we can retrieve the token from Zenodo. This editing feature will likely not be used often.
  • The link has edit permissions and no expiration date. There is an endpoint to delete the access link.
  • The access link ID is stored in the concept DOI object because this link is applicable to all versions of the DOI

Hoverfly Tests

  • Created a few Zenodo tests using Hoverfly. These tests aren't super complex due to the limitations of mocked responses, but it's something 🙂
  • Created a new testing profile for tests that use Hoverfly because it needs the localtruststore in CircleCI

Follow-up

Draft PR Description Currently in draft mode to get feedback as I figure out some Hoverfly test weirdness in CircleCI.

Zenodo swagger PR: dockstore/swagger-java-zenodo-client#21

This PR adds a enableAutomaticDoiCreation boolean to Entry that is set to true by default.

When set to true, it automatically creates DOIs using Dockstore's own Zenodo personal access token, specified in the web.yml. The auto-creation occurs during refresh and GitHub App release for valid, published tags and involves snapshotting the version prior to creating the DOI. This boolean can be updated using a new endpoint, allowing users to opt-out in the UI. For GitHub app entries, they will need to specify it in their .dockstore.yml.

  • Should this feature include hosted entries?

This boolean can be updated using a new endpoint, allowing users to opt-out in the UI. For GitHub app entries, they will need to specify it in their .dockstore.yml.

This PR also adds an endpoint to create a shared access link with edit permissions so entry owners can modify their DOIs if they want. There is an option to set an expiration date, but I set it to never expire. The idea is that in the UI, the entry owner can click a button that creates this shared access link. After the first time, the UI can display this link somewhere so the user can modify their DOI. UI PR will be in follow-up ticket.

  • Should we allow the user to delete this access link?

This PR also adds DOIs to the Dockstore community, using the community ID specified in the web.yml. This is for all DOIs, whether Dockstore-created or end-user-created.

Some questions and thoughts
There is a potential exception if an end user has already requested a DOI for an entry version and our Dockstore user tries to create a DOI for a different version because our Dockstore user does not have permissions to modify some one else's DOI. Due to this, there is a check to see if the entry has 0 DOIs or if all DOIs are Dockstore-owned to ensure that no exception occurs. I think there's a bigger conversation here about how we want to handle this situation so this PR takes the simplest route for now, i.e., not allowing the mixing of end-user-created and Dockstore-created DOIs. Below are some thoughts about it, but I think we should have a follow-up ticket to decide what to do.

  • Do we solve this with communities? Using a one-time endpoint, we could try to add all DOIs to the Dockstore community. This would give the Dockstore Zenodo user permission to edit the DOI and create versions for DOIs that are owned by end users.
  • Do we allow multiple users to create DOIs and we store multiple DOIs? (similar ticket https://ucsc-cgl.atlassian.net/browse/DOCK-1922)
  • How about the reverse where the Dockstore Zenodo user creates the DOI first and the end user wants to create their own DOI?
    • Shared access links would allow them to create new versions of the DOI, but they would not "own" the DOI

Review Instructions
Note that the UI portion of this feature is not implemented yet, so these review instructions will require looking at the API responses for the DOIs.

To find the relevant DOI fields for the review steps below:

  • Use the getWorkflow endpoint on Swagger, specifying your workflow ID of interest for the id and versions as your include string.
  • To see the concept DOIs, view the workflow's conceptDois map in the response.
  • To see the version DOIs, look for the version of interest in the workflowVersions array and view the version's dois map.

On QA:

  1. First, verify that you can request a DOI for a workflow version the normal way, i.e. link your Zenodo account and request a DOI. Verify that the concept DOI and version DOI has the initiator USER (since it is user-generated)
  2. For the same workflow, unpublish and publish it. The act of publishing will automatically create DOIs for valid, published tags. Verify that there are now two concept DOIs and two version DOIs for the version in step 1. The new DOIs should have DOCKSTORE as the initiator.
  3. For the same workflow, if it is a manually registered workflow, create a new tag that is valid then refresh the workflow. If it is a GitHub App workflow, create a new tag and wait for the changes to be registered by Dockstore. Verify that the new tag has a DOI with the initiator as DOCKSTORE.
  4. For the same workflow, request an edit link for it using this endpoint. Confirm that the call was successful and that the response includes a token.
  5. Call this endpoint for the same workflow to get the existing DOI edit link. Confirm that the call was successful and that the response includes a token.
  6. Delete the edit link using this endpoint. Confirm that it was deleted by calling the getDOIEditLink endpoint (it should fail because there is no link for it).

Issue
https://ucsc-cgl.atlassian.net/browse/SEAB-5929
https://ucsc-cgl.atlassian.net/browse/SEAB-6371

Security and Privacy

If there are any concerns that require extra attention from the security team, highlight them here and check the box when complete.

  • Security and Privacy assessed

e.g. Does this change...

  • Any user data we collect, or data location?
  • Access control, authentication or authorization?
  • Encryption features?

Please make sure that you've checked the following before submitting your pull request. Thanks!

  • Check that you pass the basic style checks and unit tests by running mvn clean install
  • Ensure that the PR targets the correct branch. Check the milestone or fix version of the ticket.
  • Follow the existing JPA patterns for queries, using named parameters, to avoid SQL injection
  • If you are changing dependencies, check the Snyk status check or the dashboard to ensure you are not introducing new high/critical vulnerabilities
  • Assume that inputs to the API can be malicious, and sanitize and/or check for Denial of Service type values, e.g., massive sizes
  • Do not serve user-uploaded binary images through the Dockstore API
  • Ensure that endpoints that only allow privileged access enforce that with the @RolesAllowed annotation
  • Do not create cookies, although this may change in the future
  • If this PR is for a user-facing feature, create and link a documentation ticket for this feature (usually in the same milestone as the linked issue). Style points if you create a documentation PR directly and link that instead.

@kathy-t kathy-t self-assigned this May 6, 2024
@@ -87,33 +261,25 @@ public static ZenodoDoiResult registerZenodoDOI(ApiClient zenodoClient, Workflow
returnDeposit = depositApi.createDeposit(deposit);
depositionID = returnDeposit.getId();
depositMetadata = returnDeposit.getMetadata();
// Set the attribute that will reserve a DOI before publishing
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this because the DOI is automatically reserved the first time a deposit is created eblondel/zen4R#37 (comment)

// Retrieve the DOI so we can use it to create a Dockstore alias
// to the workflow; we will add that alias as a Zenodo related identifier
// TODO clean this after https://github.com/dockstore/swagger-java-zenodo-client/pull/20/files is merged and released
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tackled this TODO comment due to updating the zenodo swagger version

@denis-yuen
Copy link
Member

Due to this, there is a check to see if the entry has 0 DOIs or if all DOIs are Dockstore-owned to ensure that no exception occurs. I think there's a bigger conversation here about how we want to handle this situation so this PR takes the simplest route for now, i.e., not allowing the mixing of end-user-created and Dockstore-created DOIs.

I think this makes sense, basically allowing users to go from automatic to user-created but not vice versa

Do we allow multiple users to create DOIs and we store multiple DOIs?

A good observation, I guess we basically had this problem in some way already.

How about the reverse where the Dockstore Zenodo user creates the DOI first and the end user wants to create their own DOI?

I think we should allow this, there's a good chance the user may not know about the automatic doi feature until after it issues one, so we need to allow for this direction (automatic to end-user created)

Copy link
Member

@denis-yuen denis-yuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will double-back, but starting the conversation

.circleci/config.yml Show resolved Hide resolved
@@ -30,6 +30,8 @@ public class Service12 extends AbstractYamlService implements Workflowish {

private DescriptorLanguageSubclass subclass;

private boolean enableAutomaticDoiCreation = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick, since this is the default, maybe this should be flipped to disableAutomaticDoiCreation or enableEndUserDoiCreation

@@ -108,7 +108,7 @@
<dependency>
<groupId>io.dockstore</groupId>
<artifactId>swagger-java-zenodo-client</artifactId>
<version>2.0.2</version>
<version>2.0.3-zeta.0</version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just setting reminder to fix before merging

@@ -78,6 +78,14 @@ public class VersionMetadata {
@Enumerated(EnumType.STRING)
protected Version.DOIStatus doiStatus;

@Column()
@Schema(description = "The token for the Zenodo access link to edit the Dockstore-owned DOI")
protected String doiEditAccessLinkToken;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a relationship between these two columns that can be enforced through a check constraint

@svonworl
Copy link
Contributor

svonworl commented May 7, 2024

This PR also adds an endpoint to create a shared access link with edit permissions so entry owners can modify their DOIs if they want. There is an option to set an expiration date, but I set it to never expire. The idea is that in the UI, the entry owner can click a button that creates this shared access link. After the first time, the UI can display this link somewhere so the user can modify their DOI. UI PR will be in follow-up ticket.

  • Should we allow the user to delete this access link?

Whatever the editing mechanism, if it "leaks" (as the above link sounds like it could), it'd be great if there was a way to neutralize it.

@svonworl
Copy link
Contributor

svonworl commented May 7, 2024

Regarding the concept of auto-snapshotting before generating the DOI, some thoughts:

Will automatically snapshotting all tags mess up anyone's release flow? Once the version is snapshotted, it's frozen. Our assumption is that a github tag is a permanent fixture, but does everyone use it that way? For example, imagine a repo where the stable tag is attached to the latest stable release and moves around. The first stable tag would be snapshotted, and then Dockstore would reject subsequent retaggings. Do people do stuff like that, and if so, should we support it?

If the answer to the last question above is yes, we might need to figure out what to do if a user accidentally/inadvertently freezes a version before they opt-out.

@@ -667,6 +680,10 @@ private boolean createWorkflowsAndVersionsFromDockstoreYml(List<? extends Workfl
Workflow workflow = createOrGetWorkflow(workflowType, repository, user, workflowName, wf, gitHubSourceCodeRepo);
WorkflowVersion version = addDockstoreYmlVersionToWorkflow(repository, gitReference, dockstoreYml, gitHubSourceCodeRepo, workflow, latestTagAsDefault, yamlAuthors);

// Automatically register a DOI if it's enabled and the workflow is published and the version is a valid tag
if (canAutomaticallyCreateDockstoreOwnedDoi(workflow, version)) {
automaticallyRegisterDockstoreOwnedZenodoDOI(workflow, version, user, this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new DOI creation code appears to run in the same database transaction that will create the version. So, if the transaction is rolled back or fails, a DOI will be created that references a version that doesn't exist in the Dockstore database. Recommend that the DOI creation code is executed after successful version creation.

@svonworl
Copy link
Contributor

svonworl commented May 7, 2024

Regarding multiple DOIs, we've been discussing at least three sources of DOIs:

  • github-generated
  • user-generated Zenodo
  • dockstore-generated Zenodo

If you count "generate our own" custom DOIs, four sources. :)

Would be great if we could track all of them. Time for a new "DOI" database entity?

protected String doiURL;

@ManyToMany(fetch = FetchType.EAGER, cascade = CascadeType.ALL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate on https://github.com/dockstore/dockstore/pull/5879/files/6333c924f9561b0443c2134ca3993a16c58adaa1#r1612139110 with a concrete example maybe? Want to make sure I understand what we're trying to prevent...

WorkflowVersion version = addDockstoreYmlVersionToWorkflow(repository, gitReference, dockstoreYml, gitHubSourceCodeRepo, workflow, latestTagAsDefault, yamlAuthors);

workflowVersionId.set(version.getId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of https://ucsc-cgl.atlassian.net/browse/SEAB-6449, I can modify TransactionHelper to include a method that allows us to run a method in a transaction that returns a result. Then, we don't need to track the ids with the AtomicLongs if we opt not to.

@@ -304,7 +307,6 @@ public void setDirtyBit(boolean dirtyBit) {
void updateByUser(final Version<?> version) {
this.getVersionMetadata().hidden = version.isHidden();
this.setDoiStatus(version.getDoiStatus());
this.setDoiURL(version.getDoiURL());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code allows user to modify the DOI URL. I don't believe this is used and we probably shouldn't allow the user to edit DOI data so I removed this line

Copy link
Member

@denis-yuen denis-yuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as a first past, will be good to iterate/extend

@@ -398,6 +374,10 @@ private Workflow refreshWorkflow(User user, Long workflowId, Optional<String> ve
// Update file formats in each version and then the entry
FileFormatHelper.updateFileFormats(existingWorkflow, newWorkflow.getWorkflowVersions(), fileFormatDAO, true);

for (WorkflowVersion workflowVersion: existingWorkflow.getWorkflowVersions()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little doubtful about the utility of automatically creating a DOI for all versions of a workflow without limit.
Thinking things like a hard limit of the most recent 30 versions, or 20 tags (since only tags should be eligible), etc.
Partly a performance thing, can be a follow-up ticket. Think manually testing with a fork of the gatk repo for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it so it only looks at the 10 most recent tags

@@ -807,6 +757,7 @@ public Workflow publish(@ApiParam(hidden = true) @Parameter(hidden = true, name
checkNotArchived(workflow);

Workflow publishedWorkflow = publishWorkflow(workflow, request.getPublish(), userDAO.findById(user.getId()));
publishedWorkflow.getWorkflowVersions().forEach(version -> automaticallyRegisterDockstoreOwnedZenodoDOI(workflow, version, user, this));
Hibernate.initialize(publishedWorkflow.getWorkflowVersions());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto with above comment

assertTrue(workflow.getConceptDois().isEmpty());
assertTrue(tagVersion.getDois().isEmpty());

// Publish workflow, should automatically create DOIs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment elsewhere, can be follow-up ticket

registerZenodoDOI(zenodoClient, workflow, workflowVersion, workflowOwner, authenticatedResourceInterface,
DoiCreator.DOCKSTORE);
} catch (CustomWebApplicationException e) {
LOG.error("Could not automatically register DOI for {}", workflowNameAndVersion(workflow, workflowVersion), e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to fail silently for the end user and we probably won't notice it either. I wonder if it's worth capturing this somehow?

  • A CloudWatch Alarm based on this message?
  • Add it as a warning to validations, if possible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this could be a followup ticket; otherwise we won't know when/if things start failing (unless there's already another way we would notice).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pom.xml Outdated Show resolved Hide resolved

public enum DoiType {
CONCEPT,
VERSION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, "concept" and "version" are Zenodo concepts. The DOI standard has no such abstractions, you can point a DOI at whatever you want. No change necessary here.

@MapKeyEnumerated(EnumType.STRING)
@Size(max = MAX_NUMBER_OF_DOI_CREATORS)
@Schema(description = "The Digital Object Identifier (DOI) representing all of the versions of your workflow")
private Map<DoiCreator, Doi> conceptDois = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a maintainability/extensibility point of view, I'd probably make this a Set, because the above Map implementation forecloses supporting a future scheme where an entry/version could have multiple DOIs from the same creator (could happen someday, maybe an "UNKNOWN" creator). But, it's not wired into the database representation, which wouldn't make it super hard to change. I'm registering this thought here more to document it, less to press for a change...

Copy link

sonarcloud bot commented May 31, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
79.4% Coverage on New Code (required ≥ 80%)

See analysis details on SonarCloud

@@ -839,6 +844,11 @@ public static void checkCanRegisterDoi(Workflow workflow, WorkflowVersion workfl
throw new CustomWebApplicationException(String.format("Could not generate DOI for %s. %s", workflowNameAndVersion, FROZEN_VERSION_REQUIRED), HttpStatus.SC_BAD_REQUEST);
}

if (workflowVersion.isHidden()) {
LOG.error("{}: Could not generate DOI for {}. {}", user.getUsername(), workflowNameAndVersion, UNHIDDEN_VERSION_REQUIRED);
throw new CustomWebApplicationException(String.format("Could not generate DOI for %s. %s", workflowNameAndVersion, UNHIDDEN_VERSION_REQUIRED), HttpStatus.SC_BAD_REQUEST);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add the explanation for the user message though we'll have it in the logs. UI requires a snapshot anyway, so not big deal

registerZenodoDOI(zenodoClient, workflow, workflowVersion, workflowOwner, authenticatedResourceInterface,
DoiCreator.DOCKSTORE);
} catch (CustomWebApplicationException e) {
LOG.error("Could not automatically register DOI for {}", workflowNameAndVersion(workflow, workflowVersion), e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this could be a followup ticket; otherwise we won't know when/if things start failing (unless there's already another way we would notice).

@kathy-t kathy-t merged commit e6eea04 into develop Jun 3, 2024
17 of 19 checks passed
@kathy-t kathy-t deleted the feature/zenodo-prototype branch June 3, 2024 15:11
@kathy-t kathy-t mentioned this pull request Jun 3, 2024
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants