Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide data in the application domain rather than the CLI domain #2683

Closed
2 of 5 tasks
pksunkara opened this issue Aug 13, 2021 · 13 comments · Fixed by #3732
Closed
2 of 5 tasks

Provide data in the application domain rather than the CLI domain #2683

pksunkara opened this issue Aug 13, 2021 · 13 comments · Fixed by #3732
Labels
A-builder Area: Builder API A-derive Area: #[derive]` macro API A-parsing Area: Parser's logic and needs it changed somehow. C-enhancement Category: Raise on the bar on expectations M-breaking-change Meta: Implementing or merging this will introduce a breaking change. S-waiting-on-mentor Status: Needs elaboration on the details before doing a 'Call for participation'
Milestone

Comments

@pksunkara
Copy link
Member

pksunkara commented Aug 13, 2021

Currently, clap's API is focused on the CLI domain, with data being tracked as OsString with specialized helpers for String. At the tale-end of the builder API, we provide the value_of_t API. We do gloss over this in the derive API.

Problems with the current approach

See also #2832, #1185

Roll out

  1. Change the data structure for ArgMatches to store Box<Any + Send + Sync + 'static>
  2. Add a generic getter, gated behind a feature flag
  3. Add a builder function for setting a custom value, gated behind a feature flag
  4. Change the default parser from OsString to String, deprecating AllowInvalidUtf8, gated behind a feature flag
  5. Mark old API deprecated, gated behind a feature flag
  6. On next breaking release, remove deprecated APIs and remove feature gates

Other considerations / open questions

  • For multiple_values, users also have to set default_values, value_names, and value_hints, should we follow possible_values and avoid users passing in parallel arrays?
  • How do we handle default values?
    • Users will want to set their defaults according to their application data type
    • To show a default in help, we need it to be Display
    • clap_derive currently punts on this by providing both a default_value and default_value_t where the latter has to support Display and it gets forwarded to default_value, see feat(derive): Specify defaults by native expressions #2635
  • How many feature gates and when do we remove them?
    • The sooner its public, the better, for collecting feedback
    • The more people using it, the more impacted by subtle behavior changes (default parser)
      • We could put in hacks to make the old API work along side the new API and set the default parser according to AllowInvalidUtf8
  • How do we handle optional vs required, especially in light of value_of_t() is difficult to use with optional arguments  #2505?
    • We have to deal with present / not-present, invalid cast of Any, etc.
  • For the parser, how do we balance flexibility and performance
    • We could make the parser a trait ValueParser and provide a struct BoxedValueParser that contains an enum for common function-pointer signatures (wrapped with converters for the error types) and people can explicitly construct a BoxedValueParser for their custom type
  • In designing the API, we should keep in mind that people are also wanting access to what authored a value (env, default, CLI) (TODO Find Issue)
    • Unsure how this feature would even fit into clap_derive
  • Long term, we will want to morph the ArgMatches API to be more map-like
  • Long term, we need to consider other type-related features besides default_value:
    • ValueHint: if we can detect and set a default for Path that would provide a better default experience
    • value_name for named arguments: the default (of --name <name>) is redundant. If we could provide type related defaults (e.g. PATH for PathBuf, NUM for numbers), it would also provide a better default experience
    • See Inferring more meaning from types for derives #2637 for prior discussion

Original

Basically, there are two things that need to be solved here.

  • We should be keeping the parsed values during validation stage of parsing instead of throwing them away
  • We should be storing the types directly in ArgMatches

I have to dig into the code to write a longer description of issues with the current behaviour and design.

See also #2832

@pksunkara pksunkara added C: matches A-validators Area: ArgMatches validation logi labels Aug 13, 2021
@pksunkara pksunkara added this to the 3.0 milestone Aug 13, 2021
@pksunkara pksunkara changed the title Validator & ArgMatches issues epic Validator & ArgMatches need to be reworked? Aug 13, 2021
@epage epage modified the milestones: 3.0, 4.0 Oct 16, 2021
@epage
Copy link
Member

epage commented Oct 16, 2021

Sounds like this is related to #2832

epage added a commit to epage/clap that referenced this issue Oct 18, 2021
In considering potential work for clap-rs#2683, I realized we might need a type to carry data for
each of the `multiple_values`.  `ArgValue` works both for that and for
possible values, so we need to come up with a better name for one or
both.  Changing `ArgValue`s name now would be ideal since its new in
clap3 and by renaming it, we can reduce churn for users.

While thinking about this, I realized I regularly get these mixed
up, so renaming `ArgValue` to `PossibleValue` I think will help clear
things up, regardless of clap-rs#2683.
@epage epage changed the title Validator & ArgMatches need to be reworked? Provide data in the application domain rather than the CLI comain Oct 18, 2021
@epage epage changed the title Provide data in the application domain rather than the CLI comain Provide data in the application domain rather than the CLI domain Oct 18, 2021
@epage epage added the M-breaking-change Meta: Implementing or merging this will introduce a breaking change. label Oct 18, 2021
@epage
Copy link
Member

epage commented Oct 18, 2021

@pksunkara just captured in the Issue my thoughts from not being able to sleep last night :) I'm sure you also have some and would love to hear and see how we can synthesize the two.

The only linked issue I'm seeing this directly help with is #2206. I thought it would help with #2505 but it sounds like we just repeat the same problem after exploring the idea.

@pksunkara
Copy link
Member Author

pksunkara commented Oct 18, 2021

Looks good to me. I would actually make this the main product objective for 4.0. I will add more if needed once I get into this context later (which I still have in my todo)

epage added a commit to epage/clap that referenced this issue Oct 19, 2021
In considering potential work for clap-rs#2683, I realized we might need a type to carry data for
each of the `multiple_values`.  `ArgValue` works both for that and for
possible values, so we need to come up with a better name for one or
both.  Changing `ArgValue`s name now would be ideal since its new in
clap3 and by renaming it, we can reduce churn for users.

While thinking about this, I realized I regularly get these mixed
up, so renaming `ArgValue` to `PossibleValue` I think will help clear
things up, regardless of clap-rs#2683.
@epage
Copy link
Member

epage commented Nov 5, 2021

From #2832

@ssokolow

Hmm. My main concern there is that this line looks like it could easily become a footgun if not handled carefully:

Change the default parser from OsString to String, deprecating AllowInvalidUtf8, gated behind a feature flag.

Windows may make it difficult to get paths with un-paired surrogates, but I've found mojibake'd filenames to be easy on POSIX platforms... to the point where it led me to report a bug in Serde because a mojibake'd Japanese ID3 tag had put mojibake'd characters in media file names during auto-renaming and Serde doesn't complain if you want to put a PathBuf into a JSON file without explicitly specifying how to handle un-UTF8-able contents. (Was it ID3v1 that left the tag's encoding to be communicated out-of-band? I think Audacious Media Player actually has a browser-style encoding detector for that.)

@epage
Copy link
Member

epage commented Nov 5, 2021

Hmm. My main concern there is that this line looks like it could easily become a footgun if not handled carefully:

With the proposed API, if you want a path, you'd be specifying it as a PathBuf and we would parse from OsStr. Just if you didn't specify any custom type, we'd be choosing String by default which seems to be what people want in most cases.

@ssokolow
Copy link

ssokolow commented Nov 5, 2021

Ahh. By "if not handled carefully", I meant "this could turn into a footgun for end users if clap doesn't handle it carefully", and that should fit the definition of "clap handling it carefully" well enough.

@epage epage added A-builder Area: Builder API A-derive Area: #[derive]` macro API A-parsing Area: Parser's logic and needs it changed somehow. and removed A-validators Area: ArgMatches validation logi labels Dec 8, 2021
@epage epage added S-waiting-on-mentor Status: Needs elaboration on the details before doing a 'Call for participation' C-enhancement Category: Raise on the bar on expectations labels Dec 13, 2021
@AndreasBackx
Copy link
Contributor

@epage requested I share our usecases after a short chat.

We are logging various aspects of CLI usage across the company. This is to give CLI owners an understanding of how their CLIs are used. This discoverability gives various teams the insights to improve their CLIs and see where they could improve. One example of this is the ability to see which subcommands are most often used and how long they take. This can help a team prioritise making improvements to a CLI. Saying "We saved 1000 engineers a total of 50 hours a week." is more impactful and provides incentives towards CLI improvements.

Additionally this information can be useful when debugging errors as well. We are currently working towards providing automated error logging. This means that when their CLI crashes, the user doesn't have to do anything and it'll be appropriately logged and can be followed up on. Having a better understanding of the state of the CLI can be beneficial. Maybe your parsing of a CLI argument lead to the eventual error? We'd know if that data was included as metadata while logging the error.

Either of these usecases would require data access in the application layer. Let me know if you have any questions.

@epage
Copy link
Member

epage commented May 11, 2022

I had been wondering if we should offer look up functions that panic instead of returning bad type errors.

To allow what @AndreasBackx brought up, we either need

  • Allow people to query every known data type for an argument without panicing
  • Store the "raw" version of arguments
    • This might get messy if we are able to support untyped defaults as there won't be a raw version.

@epage
Copy link
Member

epage commented May 13, 2022

So my current implementation offers a get_raw (on top of get_one and get_many). At times I am tempted to remove it but at least for the current implementation, its actually a help to track the raw values.

Progress

  • Replace allow_invalid_utf8 with ValueParser, internally
  • Allow users to set ValueParser on an Arg and look up the typed and raw values
  • Provide alternative boolean ValueParsers that handle env-like truthiness
  • Support for an ArgEnum ValueParser
  • Generalize possible values so bools can specify them to help with completion support (will require native bool support)
  • Document why types are natively supported
  • Support a possible-values ValueParser
  • Explore replacing forbid_empty_value with a ValueParser
  • Numeric range support (maybe use this as the basis for native numbers?).
  • Deprecate possible_values in favor of value_parser
  • Deprecate allow_invalid_utf8 in favor of ValueParser
  • Deprecate arg value validators in favor of ValueParser
  • Deprecate forbid_empty_values
  • Add default_value_raw / default_values_raw etc and deprecate the current forms

Deferred work

  • Switch subcommands to ValueParser and away from allow_invalid_utf8_for_external_subcommands
  • clap_derive support (fixing try_from_str and try_from_os_str functions are called twice #3589) because we need a way to get values out of ArgMatches and it'd be a breaking change to require Clone and a breaking change for FromArgMatches to accept &mut ArgMatches so we can remove the already-allocated items
    • I took another approach of having _mut variants of the FromArgMatches functions that have inherent implementations to maintain compatibility but the update version with subcommands processes subcommands then if it fails, it falls back to from_arg_matches on the original matcher, so removing the sub matches breaks it

@epage
Copy link
Member

epage commented May 13, 2022

I was considering moving ignore_case into the ValueParser as each type that it matters for, like ArgEnumValueParser could implement it and it could be left off all the others.

However, required_if_eq and friends complicate this because they use ignore_case and I don't want to move it from Arg to every ValueParser, so for now, I'm leaving it on the Arg. Maybe as we move special required rules out, we can re-evaluate this.

@epage
Copy link
Member

epage commented May 16, 2022

Currently, clap allows you accessing all of the values for a group. The new implementation limits that to groups where all present values are of the same type. For more background on why this was added, see #283

@AndreasBackx
Copy link
Contributor

@epage I like the new API, do you think this would allow for some way to implement #1206? Happy to make a PR as well, let me know if you had something in mind.

@epage
Copy link
Member

epage commented May 18, 2022

That is still blocked on another issue; I've updated the issue description to summarize what it is blocked on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-builder Area: Builder API A-derive Area: #[derive]` macro API A-parsing Area: Parser's logic and needs it changed somehow. C-enhancement Category: Raise on the bar on expectations M-breaking-change Meta: Implementing or merging this will introduce a breaking change. S-waiting-on-mentor Status: Needs elaboration on the details before doing a 'Call for participation'
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants