Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display DOIs in Archived Histories #18134

Merged
merged 15 commits into from
May 16, 2024
Merged

Conversation

davelopez
Copy link
Contributor

@davelopez davelopez commented May 14, 2024

When a history has been archived preserving the data in an external RDM repository like Zenodo, we can display its Document Object Identifier on it.

image

This PR also allows FilesSource plugins to serialize extra properties defined in the serialize_extra_props class variable. This adds flexibility to customize the information we want to serialize for the file source. In this particular case, we serialize to the extra fields the repository URL associated with the RDM file source so it can be used to fetch the DOI information. We decided to replace the extra field with directly serializing the url for simplicity even if not all file sources might use it.

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    • Setup an RDM file source like in Add Zenodo integration #18022
    • Archive a history creating an export record in that file source.
    • [Important] make sure the record is published and has a DOI (a fake one using the Zenodo Sandbox is fine but links won't resolve).
    • The DOI should appear in your Archived Histories.

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@dannon
Copy link
Member

dannon commented May 14, 2024

Nice, great idea!

@davelopez
Copy link
Contributor Author

Thanks! Obviously not my idea 😆

@davelopez davelopez marked this pull request as ready for review May 14, 2024 16:25
@github-actions github-actions bot added this to the 24.1 milestone May 14, 2024
@hexylena hexylena added the highlight Included in user-facing release notes at the top label May 15, 2024
Copy link
Member

@jmchilton jmchilton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to just add URL to the serialized object instead of the extra handling. It seems fine to me that 95% of file sources wouldn't use it and I think I like that it would be documented in the schema. I get the handling it this way though - I understand the impulse and we've done things like this all over Galaxy many times.

@davelopez
Copy link
Contributor Author

My preference would be to just add URL to the serialized object instead of the extra handling.

That was my initial approach 3e375ca#diff-644fe18408381b506c4abc27ffe07d42b8a8dc55a5a833f705662f65fe60c6d6L382 😅
But mypy complained about it because rval was a TypedDict and it doesn't like arbitrary strings coming from serialize_extra_props as keys. I could have ignored mypy or better yet overridden the to_dict to handle the custom serialization. But then we still need to keep model_config = ConfigDict(extra="allow") and this causes the client to ignore type checking for BrowsableFilesSourcePlugin and this has bitten us already in other models as any attribute will be valid and typos or wrong attributes will pass.

But I'm happy to revert that if the extra field is too ugly. In my mind, it feels like a container object that must be handled carefully in a case-by-case scenario.

@jmchilton
Copy link
Member

I think what I was suggesting was just adding url to the TypedDict as a NotRequired string. And then rather than the serialize_extra_props option we just do something like:

--- a/lib/galaxy/files/sources/__init__.py
+++ b/lib/galaxy/files/sources/__init__.py
@@ -98,6 +98,7 @@ class FilesSourceProperties(TypedDict):
     uri_root: NotRequired[str]
     type: NotRequired[str]
     browsable: NotRequired[bool]
+    url: NotRequired[str]
 
 
 @dataclass
@@ -319,6 +320,9 @@ class BaseFilesSource(FilesSource):
     def get_writable(self) -> bool:
         return self.writable
 
+    def get_url(self) -> Optional[str]:
+        return None
+
     def user_has_access(self, user_context: "OptionalUserContext") -> bool:
         if user_context is None and self.user_context_required:
             return False
@@ -381,6 +385,9 @@ class BaseFilesSource(FilesSource):
             "disable_templating": self.disable_templating,
             "scheme": self.get_scheme(),
         }
+        url = self.get_url()
+        if url is not None:
+            rval["url"] = url
         if self.get_browsable():
             rval["uri_root"] = self.get_uri_root()
         if for_serialization:

But what URL means is going to change a lot from plugin to plugin if any others ever implement it. So I am totally fine with your approach. This just feels a little more structured.

@davelopez
Copy link
Contributor Author

Ahh I see your point now. This is certainly simpler, the whole serialize_extra_props was a bit of a yagni in case other plugins wanted to expose other options.

Thank you for explaining in detail! I will remove the extra and do it like this!

@davelopez davelopez marked this pull request as draft May 15, 2024 15:40
For the sake of simplicity
@davelopez davelopez marked this pull request as ready for review May 15, 2024 16:03
@davelopez davelopez requested a review from jmchilton May 15, 2024 17:46
@davelopez
Copy link
Contributor Author

davelopez commented May 16, 2024

The integration test failure is somewhat "persistent" but unrelated:

FAILED test/integration/objectstore/test_swift_objectstore.py::test_tools[composite_output] - galaxy.tool_util.verify.interactor.JobOutputsError: Output output: Composite file (Sequences) of History item cbde53fe322113de different than expected, difference:
History item cbde53fe322113de different than expected, difference (using diff):
...

@dannon dannon merged commit d0f15e8 into galaxyproject:dev May 16, 2024
55 of 56 checks passed
@davelopez davelopez deleted the display_doi branch May 17, 2024 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/UI-UX highlight Included in user-facing release notes at the top kind/enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants