Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use cases for value_of/values_of #751

Closed
BurntSushi opened this issue Nov 14, 2016 · 17 comments · Fixed by #2677
Closed

use cases for value_of/values_of #751

BurntSushi opened this issue Nov 14, 2016 · 17 comments · Fixed by #2677
Labels
A-docs Area: documentation, including docs.rs, readme, examples, etc... A-parsing Area: Parser's logic and needs it changed somehow. E-help-wanted Call for participation: Help is requested to fix this issue. E-medium Call for participation: Experience needed to fix: Medium / intermediate M-breaking-change Meta: Implementing or merging this will introduce a breaking change.
Milestone

Comments

@BurntSushi
Copy link
Contributor

BurntSushi commented Nov 14, 2016

Sorry to keep opening issues, and this may indeed be more of a question because I've missed something, but what are the intended use cases of value_of/values_of? In particular, I notice that this is part of its contract:

This method will panic! if the value contains invalid UTF-8 code points.

(Pedant: I think this should say "invalid UTF-8." There is no such thing as a "UTF-8 codepoint.")

Since the value is typed by the user, this means that the user can cause your program to panic if you use value_of/values_of when the user gives a value that isn't valid UTF-8. In my view, if an end user sees a panic, then it ought to be considered a bug. If my interpretation is right, then that means I should never use value_of/values_of, right?

I do see one possible use case: if a caller disables the AllowInvalidUtf8 setting, then I believe clap will return a nice error if it detects invalid UTF-8, and therefore, value_of/values_of will never panic. However, according to the docs, AllowInvalidUtf8 is enabled by default, so this seems like a potential footgun.

(This is a good instance of the pot calling the kettle black because docopt.rs, for example, can't handle invalid UTF-8 at all! :-) So clap is already leagues better in this regard.)

@kbknapp
Copy link
Member

kbknapp commented Nov 14, 2016

No worries, so far all the issues you've written have been superb and are pushing great things! Keep them coming!

I think you're absolutely correct in that reasoning, and I actually agree the StrictUtf8 should be the default, where the users actually opt-in to allowing invalid UTF-8, at which point they're also conscious that they should be using the value[s]_of_os instead.

I can't remember exactly, but I believe my reasoning behind making invalid UTF-8 allowed by default was....whelp no I can't remember. I know I had a reason though! Maybe it was prior to adding the value[s]_of_os? Oh well...

Changing that default is technically a breaking change, although like Rust I do believe some breaking changes should be allowed if they fix [logic] bugs, prevent security holes, or otherwise correct other unsound behavior. It should be simple to at least find github users who have used the value[s]_of_os and know to what extent this change would cause. I know that's not the whole sample, but should at least give an idea.

Thoughts?

Changing the docs is a quick fix too 😉

@kbknapp kbknapp added A-docs Area: documentation, including docs.rs, readme, examples, etc... A-parsing Area: Parser's logic and needs it changed somehow. D: easy M-breaking-change Meta: Implementing or merging this will introduce a breaking change. labels Nov 14, 2016
@kbknapp
Copy link
Member

kbknapp commented Nov 14, 2016

Ok, so on Github there are actually more than I thought using the value[s]_of_os methods. Probably more than I'd feel comfortable switching the default. It's roughly ~30 uses, which is totally possible to contact all of them and warn them of the change. What scares me is that there's ~30 on github, but I have no idea how many in all the other places with no way to warn them. This may just have to wait for 3.x 😢

@BurntSushi
Copy link
Contributor Author

BurntSushi commented Nov 14, 2016

I think it's perfectly fine to wait for 3.x FWIW. It does feel like a bit of a sneaky breaking change to try to squeeze it in into a minor bump. I also think making StrictUtf8 the default is a good solution given the current API too.

@BurntSushi
Copy link
Contributor Author

BurntSushi commented Nov 14, 2016

One thing to consider though is this fact: if I'm writing a CLI tool on Unix and my tool accepts file paths as arguments, then I probably always want to allow invalid UTF-8 because I'm otherwise preventing my tool from working on all file paths. This is kind of a bigger picture question to answer that's probably out-of-scope for a small footgun, because it means putting UTF-8 handling front and center, which obviously sacrifices a bit of ergonomics. It's a hard line to walk and probably requiring them to explicitly use the os methods is a fine answer.

@kbknapp
Copy link
Member

kbknapp commented Nov 14, 2016

Exactly, I wouldn't make a change like that unless I could reasonably sure it either breaks no code or I was able to contact the majority (if not all) users of said feature to get their approval first. And in this case I don't think that's reasonable.

I have been thinking of a 3.x for a while now, so hopefully the wait won't be too long! :)

@kbknapp kbknapp added W: 3.x and removed W: 2.x labels Nov 14, 2016
@kbknapp kbknapp added this to the 3.0 milestone Dec 27, 2016
@kbknapp kbknapp modified the milestones: 3.0, v3-alpha1 Feb 2, 2018
@kbknapp kbknapp added E-medium Call for participation: Experience needed to fix: Medium / intermediate E-easy Call for participation: Experience needed to fix: Easy / not much labels Jul 22, 2018
@kornelski
Copy link
Contributor

kornelski commented Jun 3, 2019

I've tried to fix panics in Cargo (w/clap 2.x), but Cargo frequently uses such pattern (and its match equivalent):

if matches.value_of("arg") == Some("value")

Changing it to value_of_os/value_of_lossy prevents direct comparison, and requires mapping/unwrapping which makes it longer and less readable.

Without breaking changes, it could be improved by adding matches.value_equals(arg_name, &str) which could simply return false for non-UTF-8 args (since they can never match a str).

Perhaps it could be fixed at the source, by changing definition of arguments? Currently app.arg() doesn't define the data type it expects.

How about having app.arg_os() or app.arg_path() that would define the arg is/isn't a path, and force use of value_of_os/value_of_pathbuf? And app.arg() (or app.arg_str()?) would reject non-UTF-8 arg.

@pksunkara pksunkara modified the milestones: v3-alpha.2, v3.0 Feb 1, 2020
elasticdog added a commit to elasticdog/taxonate that referenced this issue Apr 28, 2020
This properly implements the behavior I was expecting where:

- the list of paths will not contain duplicate entries

- it will append entries from STDIN if explicitly requested (I couldn't
  figure out the implicit logic to work in an intuitive way)

- it will default to just the current working directory if no other
  paths have been specified

I believe that paths are not guaranteed to be valid UTF-8, so the proper
way to handle that is to use OsString values instead of String values.
Note that I've also dropped the external bstr crate dependency, since
I'm not 100% confident on how to convert between the two, or if it will
even be beneficial/necessary. I can revist that in the future once the
real parsing functionality is implemented.

See:
- clap-rs/clap#751
@CreepySkeleton
Copy link
Contributor

It does seem like the vast majority of non-utf8 offenders in CLI args are filesystem paths, and otherwise there's little sense to be on guard against invalid utf-8, correct? Maybe introducing some sort of path-specific getter (as described in #1723 for instance) is the way to mitigating the issue?

How about having app.arg_os() or app.arg_path() that would define the arg is/isn't a path, and force use of value_of_os/value_of_pathbuf? And app.arg() (or app.arg_str()?) would reject non-UTF-8 arg.

That might not be a bad idea, but it's not really feasible implementation-wise. That's just very far image from how the current implementation works; we'd have to rewrite a good part of clap for it. Maybe in next major version 🤷

@CreepySkeleton CreepySkeleton added C: matches E-help-wanted Call for participation: Help is requested to fix this issue. and removed D: easy E-easy Call for participation: Experience needed to fix: Easy / not much labels Jul 7, 2020
@ldm0
Copy link
Member

ldm0 commented Mar 7, 2021

Opinion: Using value_of do introduce possiblility of panic, but the current api design is the same as std::env::args() and std::env::args_os. So removing value_of() is unneeded. And I don't think adding new convenient functions is good for api surface's cleaness.

@epage
Copy link
Member

epage commented Jul 19, 2021

clap_derive is using value_of behind the scenes. We should either switch away from value_of for the derive or fix this because this is a case where the user has no idea this is happening and don't have a choice to move away from it.

@BurntSushi
Copy link
Contributor Author

Is it planned to fix this for clap 3? It looks like the current API as of clap 3 beta 2 still panics on invalid UTF-8: https://docs.rs/clap/3.0.0-beta.2/clap/struct.ArgMatches.html#method.value_of

@epage
Copy link
Member

epage commented Jul 21, 2021

I assume so since its tagged with the 3.0 milestone. At minimum, I'd like to see hidden callers use a non-panicing API (derive, value_of_t, etc), whether that involves fixing this or changing the call sites.

Options

  • StrictUtf8 being the default
    • Downside: This can't discriminate on how the arg will be used, so breaking all uses of accepting PathBuf / OsString. Enabling support for that would then re-enable the panic
  • ArgSettings for invalid/strict like the similar AppSettings
    • Looks like we look at StrictUtf8 after we know which are we are processing, so this looks feasible
    • clap_derive can auto-set this
    • We can make this the default with a comment on value_of_os that they should unset the flag.
    • The question is how to have this interact with the AppSetting, if we keep it.
      • If someone sets StrictUtf8 at the App level, there isn't a way to opt-out at the Arg level. We'd need a tri-state (yes, no, default).
      • We could keep both StrictUtf8 and AllowInvalidUtf8 to mimic the tri-state. What to do if someone sets both on an Arg or on the App? Unsetting a default would be different from all other settings (instead of unset, you set something different)
      • Its also used for external subcommands, so we'd need something for that.
      • This is similar to the "How about having app.arg_os() or app.arg_path()" idea before mentioned but is tying into the way clap is currently doing things
  • Change value_ofs signature to return an error

I lean towards

  • ArgSetting::StrictUtf8 being added and being the default
    • value_of_os would document this
    • clap_derive would "do the right thing"
  • AppSetting::StrictUtf8 would be changed to AppSetting::StrictUtf8ExternalSubcommand and would be on by default just to keep the documentation simple
    • value_of_os would document this
    • clap_derive would "do the right thing"

@pksunkara
Copy link
Member

There's a little bit of holistic clean up that needs to be done regarding these API methods and how they are used in derive. This is related to a few others issues too. Definitely need to be done before 3.0

@epage
Copy link
Member

epage commented Jul 26, 2021

@pksunkara how does my proposal sound for a path to move forward?

@epage
Copy link
Member

epage commented Jul 28, 2021

Based on another thread (#2627), By making StrictUtf8 the default, we'd need to invert its name again back to AllowInvalidUtf8

@pksunkara
Copy link
Member

Yup. So, we basically need to change the StrictUtf8 setting in favor of AllowInvalidUtf8.

@epage
Copy link
Member

epage commented Aug 9, 2021

@pksunkara just want to make sure you read what my comment was associated with; its not just inverting it but also moving it from an app setting to an arg setting

@pksunkara
Copy link
Member

Missed that, but still sounds good. This change might be a bit big for @hosseind88 though in #2623

epage added a commit to epage/clap that referenced this issue Aug 10, 2021
Before, validating UTF-8 was all-or-nothing and would cause a `panic` if
someone used the right API with non-UTF-8 input.

Now, all arguments are validated for UTF-8, unless opted-out.  This
ensures a non-panicing path forward at the cost of people using the
builder API that previously did `value_of_os` need to now set this flag.

Fixes clap-rs#751
epage added a commit to epage/clap that referenced this issue Aug 11, 2021
Before, validating UTF-8 was all-or-nothing and would cause a `panic` if
someone used the right API with non-UTF-8 input.

Now, all arguments are validated for UTF-8, unless opted-out.  This
ensures a non-panicing path forward at the cost of people using the
builder API that previously did `value_of_os` need to now set this flag.

Fixes clap-rs#751
epage added a commit to epage/clap that referenced this issue Aug 11, 2021
Before, validating UTF-8 was all-or-nothing and would cause a `panic` if
someone used the right API with non-UTF-8 input.

Now, all arguments are validated for UTF-8, unless opted-out.  This
ensures a non-panicing path forward at the cost of people using the
builder API that previously did `value_of_os` need to now set this flag.

Fixes clap-rs#751
epage added a commit to epage/clap that referenced this issue Aug 11, 2021
Before, validating UTF-8 was all-or-nothing and would cause a `panic` if
someone used the right API with non-UTF-8 input.

Now, all arguments are validated for UTF-8, unless opted-out.  This
ensures a non-panicing path forward at the cost of people using the
builder API that previously did `value_of_os` need to now set this flag.

Fixes clap-rs#751
epage added a commit to epage/clap that referenced this issue Aug 17, 2021
Before, validating UTF-8 was all-or-nothing and would cause a `panic` if
someone used the right API with non-UTF-8 input.

Now, all arguments are validated for UTF-8, unless opted-out.  This
ensures a non-panicing path forward at the cost of people using the
builder API that previously did `value_of_os` need to now set this flag.

Fixes clap-rs#751
epage added a commit to epage/clap that referenced this issue Aug 18, 2021
Before, validating UTF-8 was all-or-nothing and would cause a `panic` if
someone used the right API with non-UTF-8 input.

Now, all arguments are validated for UTF-8, unless opted-out.  This
ensures a non-panicing path forward at the cost of people using the
builder API that previously did `value_of_os` need to now set this flag.

Fixes clap-rs#751
epage added a commit to epage/clap that referenced this issue Aug 18, 2021
Before, validating UTF-8 was all-or-nothing and would cause a `panic` if
someone used the right API with non-UTF-8 input.

Now, all arguments are validated for UTF-8, unless opted-out.  This
ensures a non-panicing path forward at the cost of people using the
builder API that previously did `value_of_os` need to now set this flag.

Fixes clap-rs#751
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: documentation, including docs.rs, readme, examples, etc... A-parsing Area: Parser's logic and needs it changed somehow. E-help-wanted Call for participation: Help is requested to fix this issue. E-medium Call for participation: Experience needed to fix: Medium / intermediate M-breaking-change Meta: Implementing or merging this will introduce a breaking change.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants