Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow additional pathname characters in strict mode #1579

Merged
merged 3 commits into from Jan 4, 2022

Conversation

mr-c
Copy link
Member

@mr-c mr-c commented Dec 22, 2021

specifically , : @ ] ^
(comma, colon, at-sign, end-square-bracket, and caret)

In response to common-workflow-language/cwl-v1.2#144

@mr-c mr-c requested a review from tetron December 22, 2021 10:07
@codecov
Copy link

codecov bot commented Dec 22, 2021

Codecov Report

Merging #1579 (95b2bb6) into main (cf00d97) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1579      +/-   ##
==========================================
- Coverage   65.92%   65.90%   -0.02%     
==========================================
  Files          93       93              
  Lines       16447    16447              
  Branches     4358     4358              
==========================================
- Hits        10842    10840       -2     
- Misses       4453     4455       +2     
  Partials     1152     1152              
Impacted Files Coverage Δ
cwltool/command_line_tool.py 76.78% <100.00%> (ø)
cwltool/job.py 76.37% <0.00%> (-0.40%) ⬇️
command_line_tool.py 66.43% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf00d97...95b2bb6. Read the comment docs.

specifically , : @ ] ^
(comma, colon, at-sign, end-square-bracket, and caret)
Copy link
Member

@kinow kinow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The colon may give some trouble to users that either copy the files to Windows, or that run cwltool on Windows linux subsystem. I think in Windows : is reserved for file system drive names, and cannot (or could not last I checked) be used in file names, unless we escape it.

I had a quick test manually, and spaces are not allowed ✔️ , commas accepted ✔️ , colon accepted ✔️ . Accents in words used in Portuguese (ã, é, etc) passed the regex too ✔️ .Then tested with a couple Japanese characters ✔️ , and found an issue with common symbols used in Japanese, but not sure if important.

The \w in the regex appears to match words in Japanese in any of its three writing systems (hiragana, katakana, kanji), not sure about other Chinese characters, but shouldn't be a problem ✔️

Other Japanese symbols that are valid in file names, and are not spaces, are not supported, such as the parentheses "「」" or the Japanese version for comma/dot "、。".

These symbols can be used in normal file names in Windows/Linux, but I think we cannot guarantee every symbol of every language will be supported, so that should be fine I think.

Thanks! :neckbeard:

# POSIX metacharacters
# | & ; < > ( ) $ ` " ' <space> <tab> <newline>
# (In accordance with
# https://www.commonwl.org/v1.0/CommandLineTool.html#File under "path"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing )?

@mr-c
Copy link
Member Author

mr-c commented Dec 23, 2021

@kinow Thanks for the thoughtful review! I also tested using https://regex101.com/r/J8mA2g/1

The colon may give some trouble to users that either copy the files to Windows, or that run cwltool on Windows linux subsystem. I think in Windows : is reserved for file system drive names, and cannot (or could not last I checked) be used in file names, unless we escape it.

Sure, but for cwltool we only support WSL 2 + Docker, so we only have to care about POSIX. Users can rename results before copying them to other systems if there is an issue.

Other Japanese symbols that are valid in file names, and are not spaces, are not supported, such as the parentheses "「」" or the Japanese version for comma/dot "、。".

These symbols can be used in normal file names in Windows/Linux, but I think we cannot guarantee every symbol of every language will be supported, so that should be fine I think.

Good to know. If we switch to https://pypi.org/project/regex/ then we can easily include all unicode characters classified as symbols and explicitly exclude Separator and Other characters

https://en.wikipedia.org/wiki/Unicode_character_property
https://github.com/mrabarnett/mrab-regex#unicode-codepoint-properties-including-scripts-and-blocks

@mr-c mr-c enabled auto-merge (squash) January 4, 2022 17:52
@mr-c mr-c merged commit a1e3449 into main Jan 4, 2022
@mr-c mr-c deleted the path_checking_strict_allow_more_chars branch January 4, 2022 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants