New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow additional pathname characters in strict mode #1579
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1579 +/- ##
==========================================
- Coverage 65.92% 65.90% -0.02%
==========================================
Files 93 93
Lines 16447 16447
Branches 4358 4358
==========================================
- Hits 10842 10840 -2
- Misses 4453 4455 +2
Partials 1152 1152
Continue to review full report at Codecov.
|
specifically , : @ ] ^ (comma, colon, at-sign, end-square-bracket, and caret)
b0d9977
to
b8de9fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The colon may give some trouble to users that either copy the files to Windows, or that run cwltool
on Windows linux subsystem. I think in Windows :
is reserved for file system drive names, and cannot (or could not last I checked) be used in file names, unless we escape it.
I had a quick test manually, and spaces are not allowed ✔️ , commas accepted ✔️ , colon accepted ✔️ . Accents in words used in Portuguese (ã, é, etc) passed the regex too ✔️ .Then tested with a couple Japanese characters ✔️ , and found an issue with common symbols used in Japanese, but not sure if important.
The \w
in the regex appears to match words in Japanese in any of its three writing systems (hiragana, katakana, kanji), not sure about other Chinese characters, but shouldn't be a problem ✔️
Other Japanese symbols that are valid in file names, and are not spaces, are not supported, such as the parentheses "「」" or the Japanese version for comma/dot "、。".
These symbols can be used in normal file names in Windows/Linux, but I think we cannot guarantee every symbol of every language will be supported, so that should be fine I think.
Thanks!
cwltool/command_line_tool.py
Outdated
# POSIX metacharacters | ||
# | & ; < > ( ) $ ` " ' <space> <tab> <newline> | ||
# (In accordance with | ||
# https://www.commonwl.org/v1.0/CommandLineTool.html#File under "path" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing )
?
@kinow Thanks for the thoughtful review! I also tested using https://regex101.com/r/J8mA2g/1
Sure, but for
Good to know. If we switch to https://pypi.org/project/regex/ then we can easily include all unicode characters classified as https://en.wikipedia.org/wiki/Unicode_character_property |
specifically , : @ ] ^
(comma, colon, at-sign, end-square-bracket, and caret)
In response to common-workflow-language/cwl-v1.2#144