Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

number_of_values with and without Option is weird/bad #364

Closed
cecton opened this issue Mar 25, 2020 · 5 comments
Closed

number_of_values with and without Option is weird/bad #364

cecton opened this issue Mar 25, 2020 · 5 comments

Comments

@cecton
Copy link
Contributor

cecton commented Mar 25, 2020

Before you read: if someone has a real practical use case where they need this to be consistent somehow, it's best to re-open this ticket and explain the use case (or open a new one but link this one).

There are 2 cases here that I find counter intuitive. The first one is when you use number_of_values with a Vec: it allows the user to not provide the parameter at all despite the field is not an Option. The second one is when you use number_of_values with an Option<Vec>: it allows the parameter to be passed with no value at all.

According to the doc of clap, number_of_values:

Specifies how many values are required to satisfy this argument. For example, if you had a -f argument where you wanted exactly 3 'files' you would set .number_of_values(3), and this argument wouldn't be satisfied unless the user provided 3 and only 3 values.

Therefore:

  1. accepting 0 value like in Option<Vec> makes no sense. It shouldn't behave that way.
  2. it doesn't tell if the argument is required or not but in structopt, fields are required unless they are Option (it's the opposite in clap)

Related to #349

Case 1

Code sample

#[derive(StructOpt, Debug)]
struct Cli {
    #[structopt(long, number_of_values = 2)]
    values: Vec<String>,
}

Expected behavior

values is required and is valid ONLY if 2 values are provided. Example:

command --values 1 2 

Or:

command --values 1 --values 2

Actual behavior

values is optional and is valid IF 2 values are provided OR the parameter is not used at all. Example:

command 

Or:

command --values 1 2

Or:

command --values 1 --values 2

Case 2

Code sample

#[derive(StructOpt, Debug)]
struct Cli {
    #[structopt(long, number_of_values = 2)]
    values: Option<Vec<String>>,
}

Expected behavior

values is optional and is valid IF 2 values are provided OR the parameter is not used at all. Example:

command

Or:

command --values 1 2

Or:

command --values 1 --values 2

Actual behavior

values is optional and is valid IF 2 values are provided OR 0 values are provided OR the parameter is not used at all. Example:

command 

Or:

command --values

Or:

command --values 1 2

Or:

command --values 1 --values 2

Analyze

I checked both cases with cargo-expand to see what code is generated. And this is what I found:

Case 1

            let app = app.arg(
                ::structopt::clap::Arg::with_name("values")
                    .takes_value(true)  // definitely correct, we want a value
                    .multiple(true) // we never asked for that but why not
                    .validator(|s| {
                        ::std::str::FromStr::from_str(s.as_str())
                            .map(|_: String| ())
                            .map_err(|e| e.to_string())
                    })
                    .long("values")
                    .number_of_values(2), // correct
                    // missing .required(true) despite the field is not an Option
            );

Case 2

            let app = app.arg(
                ::structopt::clap::Arg::with_name("values")
                    .takes_value(true) // correct
                    .multiple(true) // why not
                    .min_values(0) // incorrect! the field is not required but it needs to have 2 values, not 0
                    .validator(|s| {
                        ::std::str::FromStr::from_str(s.as_str())
                            .map(|_: String| ())
                            .map_err(|e| e.to_string())
                    })
                    .long("values")
                    .number_of_values(2), // correct
            );

Conclusion

I believe it would be best to change these behaviors to make it more intuitive but this will break everybody's code so I suggest to upgrade the MAJOR.

@CreepySkeleton
Copy link
Collaborator

CreepySkeleton commented Mar 25, 2020

This is fine analysis. It is also mostly correct, so be all your code blessed by God of Computing. Amen.

Small nit:

we never asked for that but why not

You did ask for it. You see, the very essence of Vec is about multiple values. It's obvious that without it --values 1 2 wouldn't work at all. Vec implies .multiple(true).

A hint: you didn't really need to check the code on your own because we have this spectacular table which describes what methods structopt generates and when it does so.

Option<Vec<T>> vs Vec<T>

To understand why this code works the way you have discovered, we need to understand the problem we tried to resolve by introducing this behavior.

(Well, I joined structopt a little bit later than it was introduced, but I believe I'm well qualified for this explanation regardless).

structopt had had Vec<T> fields from almost the very first version; it had worked just like today - .takes_value(true).multiple(true). This had been allowing people to express the following conditions:

  • app --foo non empty list of vals
    OR
  • app // no --foo at all

Once, on a wonderful day, we received the request to also support the third, "optional option with optional value", i.e:

  • app --foo non empty list of vals
    OR
  • app // no --foo at all
    OR
  • app --foo

The recommended way to do it was found to be .min_values(0) which seem to confused you. Might have been better, I agree, but we have what we have.

The question was - "How to distinguish <--foo with no vals> from <no --foo at all>?" They would have been both mapped to an empty Vec!

This is where Option<Vec<T>> comes in:

invocation result
app --foo a b Some(["a", "b"])
app --foo Some([])
app None

That's about it. Yes, it kinda does stand out from other usages of Option in structopt, but so does Vec. I think this is a liable exception.

Changing it now - I mean, making Vec required by default - would be a breaking change for the vast majority of our users, and it would be for little or no benefit. Even if we were to consider a major version bump, like some dude on twitter said - "Dozens of thousands lines of code won't rewrite themselves". I don't think it's a big burden for you to add required = true where appropriate, is it?

P.S. There is a proposal which you may like, here. If you do, please, go there and give it some love (a comment or emoji). The more attention it gets, the more I am inclined to stop procrastinating and just fucking do it.

@cecton
Copy link
Contributor Author

cecton commented Mar 25, 2020

First of all I must apologize because my first message has been misinterpreted by GitHub's Markdown syntax and all the Option<Vec> got transformed to simply Option. This must not have helped the understanding of its content. I will fix it in a few min.

You did ask for it. You see, the very essence of Vec is about multiple values. It's obvious that without it --values 1 2 wouldn't work at all. Vec implies .multiple(true).

Dear Lord Creepy Skeleton: I think I know something about clap that you don't 😅 My apologies, I don't want to sound like a smart ass!

In fact multiple is not use to allow multiple values explicitly: it's used to allow multiple occurrences of the same parameter. Like in -v -v -v (to increase verbosity).

From the doc:

Specifies that the argument may appear more than once. For flags, this results in the number of occurrences of the flag being recorded. For example -ddd or -d -d -d would count as three occurrences. For options there is a distinct difference in multiple occurrences vs multiple values.

In fact, with clap there are 3 ways to allow multiple values:

  1. by allowing multiple parameters with multiple=true
  2. by defining min_values
  3. by defining number_of_values

It works with either of these.

#[derive(Debug, StructOpt)]
struct Cli {
    #[structopt(long, multiple = false, number_of_values = 2)]
    values: Vec<String>,
}

You can see in the example above that it works perfectly fine. It just won't allow to insert multiple time the same argument anymore:

[0] [20:06:45] /t/a HEAD > ./target/debug/a --values 1 --values 2
Hello, world!
error: The argument '--values <values> <values>' was provided more than once, but cannot be used multiple times

USAGE:
    a --values <values> <values>

For more information try --help
[1] [20:06:47] /t/a HEAD > ./target/debug/a --values 1 2
Hello, world!
thread 'main' panicked at 'Cli { values: ["1", "2"] }', src/main.rs:12:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

It seems to me that in its current state structopt is neither consistent with its own rules neither with clap's ones:

  1. if I set number_of_values to 2 with Option<Vec>, I get .min_values(0) despite I didn't write that at all in the definition.

    This is inconsistent with clap because when I read -as a user- the documentation of clap, I don't expect that number_of_values would allow the parameter without value.

  2. if I set number_of_values to 2 with Vec, I do not get .required(true) despite I expect that structopt would add it for me.

    This is inconsistent with structopt where I do expect every field to be required unless I explicitly say so (using an Option).

Once, on a wonderful day, we received the request to also support the third, "optional option with optional value"

Like you I'm interested in the history and I wouldn't want to break people's stuff. That is why I recommended to update the MAJOR if we would change this behavior.

This is the case mentioned by the PR you mentioned:

#[derive(Debug, StructOpt)]
struct Cli {
    #[structopt(long, min_values = 0)]
    values: Option<Option<String>>,
}

But here I don't see any relation between "optional option with optional value" and "optional vec with maybe no value". These are all different things. Does changing the behavior with Vec would break the behavior of Option?

In the case of Vec, someone who would really want to distinguish an empty set of value with no parameter provided at all, should probably do this instead:

struct Cli {
    #[structopt(long, min_values = 0)]
    values: Option<Vec<String>>,
}

Here I can clearly read:

  • Option: which means the parameter is optional (None if the parameter is not provided at all)
  • min_values = 0: which means the parameter accept no values at all (Some(vec![]) if the parameter is provided without values)

So the question is: why it hasn't been done that way?? If I look at the PR of May 2019 (< 0.3.0), I read:

In order to achieve this we can use Arg::min_values(0) in clap (as described here: clap-rs/clap#892), but there is no way to directly specify it in structopt.

I'm puzzled because you can definitely specify the min_values nowadays with structopt. So I tested with the latest version they are probably using in the PR: 0.2.17 and I found out that it is actually very hard (maybe impossible?) to declare an Option<Vec> (the trait std::str::FromStr is not implemented for std::vec::Vec<std::string::String>). Though you can do:

#[derive(Debug, StructOpt)]
struct Cli {
    #[structopt(long, min_values = 0)]
    values: Vec<String>,
}

But that won't help you much because --values and no parameter at all result to the same empty Vec.

@cecton
Copy link
Contributor Author

cecton commented Mar 25, 2020

But here I don't see any relation between "optional option with optional value" and "optional vec with maybe no value". These are all different things. Does changing the behavior with Vec would break the behavior of Option?

I think the misunderstandings come from my original post where the Option<Vec> have been transformed by GitHub to simply Option (as I just explained in the beginning of my second message).

@CreepySkeleton
Copy link
Collaborator

I think the misunderstandings come from my original post where the Option have been transformed by GitHub to simply Option (as I just explained in the beginning of my second message).

Nope, I believe I got it right from your first explanation. It did confused me at first, but I correctly derived your real meaning after some thought. The misunderstanding originates from somewhere else, and I assume I know where it's from.

I think I know something about clap that you don't 😅 My apologies, I don't want to sound like a smart ass!

You do, and I like smart 😎 ! Keep it going 👌. And I am not Lord, not yet; I am a humble servant whose talents are many and various, and whose work is superb and extraordinary.

I'll be answering your points (as I figured them out) one by one because I seemingly have some mental issues with laying my thoughts out whenever I think about different things at once. A tittle is your point as I see it.

Offtopic: I would really like to hear you opinion on clap-rs/clap#1682 , I highly suspect it would solve your use case charmingly.

Multiple occurrences plus multiple values is not a good default for Vec. It should have been only "multiple values" (I mostly agree, but am not sure what to do).

In fact multiple is not use to allow multiple values explicitly: it's used to allow multiple occurrences of the same parameter. Like in -v -v -v (to increase verbosity).

Yes, you are right. But it is also used to allow multiple values (clap docs):

Setting multiple(true) for an option with no other details, allows multiple values and multiple occurrences because it isn't possible to have more occurrences than values for options.

They come as a package deal when it comes to options that take value(s), and Vec is explicitly optimized for such options (it stores values). But yeah, I see you point. You think that "multiple occurrences and values" (multiple(true)) should not be enabled by default and only "multiple values" (min_values(1)) should be.

Note that I used min_values(1) instead of min_values(0) so that it matches multiple(true) which allows multiple values but disallows zero values.

Whether it was a mistake to use it instead of .min_values(1) is arguable (some people like it); I'm leaning to yes, it was.

(I actually knew about it when I was writing the answer, I just apparently thought it was obvious what I meant. You got me there!)

But I am wary of introducing this change because min_values(1) would be impossible to amend on user's side while multiple(true) can be amended with multiple(false). As an example, you wouldn't have been able to do

#[derive(Debug, StructOpt)]
struct Cli {
    #[structopt(long, multiple = false, number_of_values = 2)]
    values: Vec<String>,
}

in that case, i.e number_of_values would have been impossible to use with Vec.

The fact that structopt inserts methods' calls that weren't mentioned by the user at all is surprising and perhaps should be avoided. (I disagree)

if I set number_of_values to 2 with Option, I get .min_values(0) despite I didn't write that at all in the definition.

This is inconsistent with clap because when I read -as a user- the documentation of clap, I don't expect that number_of_values would allow the parameter without value.

But - as a user of structopt - you should expect the type of a field to be allowed adding it's own methods. For example, bool results in .takes_value(false).multiple(false), Vec<T> results in .takes_value(true).multiple(true). Additional methods are to be expected; we are very explicit about it and we even have the table - that is a part of public API by the way - that describes which methods are get used with the special types. Option<Vec<T>> is one of those specials, and min_values(0) is mentioned.

Vec<T> is not required by default whereas all the other non-Option types are. This is confusing. (Well, I agree, but I don't think it can be helped)

Oh, I apparently linked the wrong issue, and this is where the misunderstanding comes from. Remember, kids, never post technical info if it's past midnight. The issue, the PR.

In the case of Vec, someone who would really want to distinguish an empty set of value with no parameter provided at all, should probably do this instead:

Yes, except the min_values(0) is builtin. This misunderstanding was caused by my messing the links. Oops.

image

But here I don't see any relation between "optional option with optional value" and "optional vec with maybe no value". These are all different things.

Indeed. The same as above.

image

if I set number_of_values to 2 with Vec, I do not get .required(true) despite I expect that structopt would add it for me.

This is inconsistent with structopt where I do expect every field to be required unless I explicitly say so (using an Option).

I don't recall the later statement being present anywhere in our documentation. It's just what you have extrapolated from the other use cases. (I hope it doesn't reads rude, I have certain problems with the tone of my writing sometimes.)

I do see you point though, I agree this is surprising and kind of stands out. But Option<Vec<T>> is already in use for entirely different purpose; this is where it comes to the aforementioned issue/PR .

Let me repost the table:

invocation result
app --foo a b Some(["a", "b"])
app --foo Some([])
app None

Option<Vec<T>> is in use for distinguishing between "no option" and "option but no args". Changing it to requred = false would be breaking and it would have little benefit for users (see the next section below).

So the question is: why it hasn't been done that way??

Well, if my after-the-fact flavor of mindreading is to be trusted, they just wanted it to look out-of-box, screw the consistency. We would need a time machine to figure for sure; do you have one?

Breaking changes are fine as long as we bump MAJOR version (0.4 in our case, but whatever). (Yes and... kind of not-yes)

It's not that easy.

You see, people are inert. Once they have written some code, they expect it to work until the Second Coming. This is also applicable to people's habits: if they got used to some API, they will also expect that the future versions will work similarly. And don't forget that the transition to a new backward incompatible version means rewriting tons of old code, the deed that worth the most generous reward.

Of course, it's not that bad. If people are promised some significant benefits - read features/bugfixes - they could overcome their sloth, lack of time, lack of enthusiasm (billion excuses exist). It means that the benefits must worth the effort required to perform the transition.

With all due respect, these changes you have proposed - despite that I mostly agree with them! - they bring almost no benefits for existing users. They are totally cosmetic and do not offer anything that couldn't be done with the current version. They would help newcomers though.

I'm not absolutely against it, no; I'm just afraid we will end up with two versions of structopt to maintain - one for old users and one for newcomers. In the light of upcoming clap_derive (it does progress, mind you, as slow as it might be), I urge you to redirect you efforts there.


OK, @cecton can I call it a successful mindreading test?

@cecton
Copy link
Contributor Author

cecton commented Mar 30, 2020

So. I had to think about it because there are many points in this discussion. I will try to focus on the one that is actually the subject of the conversation: number_of_values with and without Option is weird/bad

  1. Why is it bad (see the description of the issue), we both agree it could be better
  2. Breaking change for that? Ok, maybe not worth it because of point 3.
  3. Why do I personally need it? Because indeed I need to parse to an Option((_, _)) (Option of tuple).

So to refocus the discussion: do we want to change this behavior?

From a personal point of view I don't need it. To be honest, I'm not even sure I would have seen this if I didn't have to parse a tuple. So I guess this is fine. What you propose in #1682 would work for me.

From a general point of view:

I hope it doesn't reads rude, I have certain problems with the tone of my writing sometimes.

No problem at all! Thanks :)

So I guess we can close this ticket from my point of view. Even if nothing changed it is good to have a ticket that talk about it so someone who will bump into the same issue will know what's going on.

If someone has a real practical use case where they need this to be consistent somehow, it's best to re-open this ticket and explain the use case (or open a new one but link this one).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants