Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using uri_file_path to properly check for file existence #3972

Merged
merged 3 commits into from
Dec 22, 2021

Conversation

douglowe
Copy link
Contributor

@douglowe douglowe commented Dec 21, 2021

Correcting logical checks inside upload_directory and
upload_file, to use schema_salad.ref_resolver.uri_file_path
for resolving file:/// paths. This ensures that encoded
special characters within the path are properly decoded again.

Will fix #3973

Changelog Entry

To be copied to the draft changelog by merger:

  • CWL input uploading to the file store: decode file paths that were previously URL encoded

Reviewer Checklist

  • Make sure it is coming from issues/XXXX-fix-the-thing in the Toil repo, or from an external repo.
    • If it is coming from an external repo, make sure to pull it in for CI with:
      contrib/admin/test-pr otheruser theirbranchname issues/XXXX-fix-the-thing
      
    • If there is no associated issue, create one.
  • Read through the code changes. Make sure that it doesn't have:
    • Addition of trailing whitespace.
    • New variable or member names in camelCase that want to be in snake_case.
    • New functions without type hints.
    • New functions or classes without informative docstrings.
    • Changes to semantics not reflected in the relevant docstrings.
    • New or changed command line options for Toil workflows that are not reflected in docs/running/{cliOptions,cwl,wdl}.rst
    • New features without tests.
  • Comment on the lines of code where problems exist with a review comment. You can shift-click the line numbers in the diff to select multiple lines.
  • Finish the review with an overall description of your opinion.

Merger Checklist

  • Make sure the PR passes tests.
  • Make sure the PR has been reviewed since its last modification. If not, review it.
  • Merge with the Github "Squash and merge" feature.
    • If there are multiple authors' commits, add Co-authored-by to give credit to all contributing authors.
  • Copy its recommended changelog entry to the Draft Changelog.
  • Append the issue number in parentheses to the changelog entry.

Correcting logical checks inside upload_directory and
upload_file, to use schema_salad.ref_resolver.uri_file_path
for resolving file:/// paths. This ensures that encoded
special characters within the path are properly decoded again.
@mr-c
Copy link
Contributor

mr-c commented Dec 21, 2021

Thanks @douglowe ! Is there an example path that this fixes?

@douglowe
Copy link
Contributor Author

Thanks @douglowe ! Is there an example path that this fixes?

My problem path was an output directory we were naming after a mutation (A:Gln2Cys) - the file string for this became: "file:///Users/douglowe/cwl-mutations/A%3AGln2Cys".

I could zip up an example script and input molecular structure to send over, if that would help?

@mr-c
Copy link
Contributor

mr-c commented Dec 21, 2021

file:///Users/douglowe/cwl-mutations/A%3AGln2Cys

Thanks! I think we can write a fast test that uses A:Gln2Cys as an input name to test this PR

@mr-c
Copy link
Contributor

mr-c commented Dec 21, 2021

I've pushed your branch to our repo as https://github.com/DataBiosphere/toil/tree/issues/3973-decode-uri-filepaths to enable CI testing while I write a targeted test

@mr-c
Copy link
Contributor

mr-c commented Dec 22, 2021

@douglowe Are you running with --relax-path-checks?

@douglowe
Copy link
Contributor Author

@douglowe Are you running with --relax-path-checks?

No - I was running without that option. I've tried enabling it just now, and it failed the same test in upload_file as before.

From the help text it looks like this option only deals with allowing spaces and hashes in paths. I've not run any tests to see if my change helps with those or not.

@mr-c
Copy link
Contributor

mr-c commented Dec 22, 2021

Even with this change, I think cwltool would need common-workflow-language/cwltool#1579

@mr-c
Copy link
Contributor

mr-c commented Dec 22, 2021

Here is my test setup; does it cover the same use case for you? common-workflow-language/cwl-v1.2#144

@douglowe
Copy link
Contributor Author

Even with this change, I think cwltool would need common-workflow-language/cwltool#1579

Ahh - I had not checked --relax-path-checks for cwltool.

I had actually tested this workflow with cwltool - and found that it coped fine with these paths (which made me look more into the Toil code, to see where it might be doing things differently).

@mr-c
Copy link
Contributor

mr-c commented Dec 22, 2021

I had actually tested this workflow with cwltool - and found that it coped fine with these paths (which made me look more into the Toil code, to see where it might be doing things differently).

Can you share your workflow with me?

@douglowe
Copy link
Contributor Author

I had actually tested this workflow with cwltool - and found that it coped fine with these paths (which made me look more into the Toil code, to see where it might be doing things differently).

Can you share your workflow with me?

Yep - you can get a copy here: https://github.com/douglowe/cwl-mutation-example

@mr-c
Copy link
Contributor

mr-c commented Dec 22, 2021

Copy link
Member

@DailyDreaming DailyDreaming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@DailyDreaming
Copy link
Member

Merging since all of the CWL tests have passed and completed and this only touches the CWL code. Thank you @douglowe .

@DailyDreaming DailyDreaming merged commit bdb4884 into DataBiosphere:master Dec 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CWL: some inputs paths get percent encoded but not decoded
3 participants