Research using zod for validating options #5639

waldekmastykarz · 2023-11-05T16:15:56Z

I'd like to propose that we research the viability of using zod for validating options. While originally we only had required and optional options, later we introduced option sets. Yet, it seems like we're missing some scenarios (#5636 and #5637). What's more, our validation of options is split into a few parts: required/optional, optionsets, and the actual values and types.

Using a package like zod would allow us centralize all validation rules in a single schema, covering all aspects for each option. Let's do a PoC to see if it would help us. Refactoring would mean a significant effort so let's give it a good though before we commit to it.

martinlingstuyl · 2023-11-05T20:16:19Z

I went through the documentation of Zod. Certainly looks interesting.

There are some areas where I immediately see how we can apply it, for example: defining different option object shapes, depending on a certain key. (For example the authType of the login command, which could solve the optional required thing)

Other areas do not immediately seem obvious, like optionSets.

martinlingstuyl · 2023-11-05T20:49:38Z

It would solve the flag problem which I noticed the other day. Using zod and coercion of primitives.

waldekmastykarz · 2023-11-05T20:55:39Z

It would be great to do a PoC to see it in action and explore its limitations if any in our scenario. There was also an alternative to zod which was way smaller which we could also look into, but I can't recall its name. Will need to try and find it back

waldekmastykarz · 2023-11-05T20:56:08Z

Might've been valibot

martinlingstuyl · 2023-11-05T21:27:16Z

There's a list of competitors on their site. Valibot isn't one of 'em...

Adam-it · 2023-11-18T02:06:04Z

seems like another huge refactor is on the way 😅.
good suggestion. Let's do some research 📚💪

waldekmastykarz · 2024-02-24T09:01:44Z

We agreed that I'll do a POC so that we have something to look at and see how it meets our needs

waldekmastykarz · 2024-03-02T10:27:54Z

OK, I've done some work, and here's what I've got.

Using zod, we can define a schema that allows us to replace initOptions, initValidators and initOptionSets. Here's how a sample schema looks like for the login command:

const globalOptions = z.object({
  query: z.string().optional(),
  output: z.enum(['csv', 'json', 'md', 'text', 'none']).optional(),
  debug: z.boolean().optional().default(false),
  verbose: z.boolean().optional().default(false)
});

const options = globalOptions.extend({
  authType: (z.enum(['certificate', 'deviceCode', 'password', 'identity', 'browser', 'secret']).optional().default('deviceCode') as unknown as ZodAliasType).alias('t'),
  cloud: z.nativeEnum(CloudType).optional().default(CloudType.Public),
  userName: z.string().optional(),
  password: z.string().optional(),
  certificateFile: z.string().optional()
    // single property validator
    .refine(filePath => !filePath || fs.existsSync(filePath), filePath => ({
      message: `Certificate file ${filePath} does not exist`
    })),
  certificateBase64Encoded: z.string().optional(),
  thumbprint: z.string().optional(),
  appId: z.string().optional(),
  tenant: z.string().optional(),
  secret: z.string().optional()
})
  // option sets
  .refine(options => options.authType !== 'password' || options.userName, {
    message: 'Username is required when using password authentication'
  })
  .refine(options => options.authType !== 'password' || options.password, {
    message: 'Password is required when using password authentication'
  })
  .refine(options => options.authType !== 'certificate' || !(options.certificateFile && options.certificateBase64Encoded), {
    message: 'Specify either certificateFile or certificateBase64Encoded, but not both.'
  })
  .refine(options => options.authType !== 'certificate' || options.certificateFile || options.certificateBase64Encoded, {
    message: 'Specify either certificateFile or certificateBase64Encoded'
  })
  .refine(options => options.authType !== 'secret' || options.secret, {
    message: 'Secret is required when using secret authentication'
  });

zod implements basic type validation that you can extend using the refine function. By adding refine to a property definition, you define a validator for a single value. If you want to validate multiple values, like in option sets, you add refine to the parent object.

For defining a short option, I added a custom function named alias. zod doesn't support it natively, and as far as I can see, there's no way to extend a class definition with an extra function in TypeScript, hence the extra cast.

What's cool about this approach, is that we define everything in a single place. We've got one schema that defines options, aliases, autocomplete and whether they're required or not. When the validation succeeds, we get a strongly-typed object.

zod runs its validation on an object. To convert user-provided command-line args into an object that we can validate with zod, we can use yargs-parser (we use minimist for this now). yargs-parser is very similar to minimist but offers some more control about parsing numbers, removing aliases, etc.

Next, I'll update one of the commands to see how it would look like inside the CLI. Looking forward to hearing your thoughts @pnp/cli-for-microsoft-365-maintainers

waldekmastykarz · 2024-03-02T11:13:07Z

One more benefit that we get with zod, is that we can work with enums more reliably by getting design-time typesafety, eg:

// define output enum type
const OutputType = z.enum(['csv', 'json', 'md', 'text', 'none']);
type OutputType = z.infer<typeof OutputType>;

// add output to global options, make optional
export const globalOptions = z.object({
  query: z.string().optional(),
  output: OutputType.optional(),
  debug: z.boolean().optional().default(false),
  verbose: z.boolean().optional().default(false)
});

// parse user-provided command line args using yargs-parser
const output = ["--output", "json"];
const argv = parse(output);

// check what output was provided
switch (argv.output) {
  // cool! design-time typesafe enum without fiddling with strings
  case OutputType.enum.csv:
    console.log('csv');
    break;
  case OutputType.enum.json:
    console.log('json');
    break;
  case OutputType.enum.md:
    console.log('md');
    break;
  case OutputType.enum.text:
    console.log('text');
    break;
  case OutputType.enum.none:
    console.log('none');
    break;
}

waldekmastykarz · 2024-03-02T11:47:45Z

For commands that allow unknown options, we extend the schema with .and(z.any()):

const options = globalOptions.extend({
  authType: (z.enum(['certificate', 'deviceCode', 'password', 'identity', 'browser', 'secret']).optional().default('deviceCode') as unknown as ZodAliasType).alias('t'),
  cloud: z.nativeEnum(CloudType).optional().default(CloudType.Public),
  userName: z.string().optional(),
  password: z.string().optional(),
  certificateFile: z.string().optional()
    .refine(filePath => !filePath || fs.existsSync(filePath), filePath => ({
      message: `Certificate file ${filePath} does not exist`
    })),
  certificateBase64Encoded: z.string().optional(),
  thumbprint: z.string().optional(),
  appId: z.string().optional(),
  tenant: z.string().optional(),
  secret: z.string().optional(),
  dummyNumber: z.number().optional(),
  dummyBoolean: z.boolean().optional(),
})
  .refine(options => options.authType !== 'password' || options.userName, {
    message: 'Username is required when using password authentication'
  })
  .refine(options => options.authType !== 'password' || options.password, {
    message: 'Password is required when using password authentication'
  })
  .refine(options => options.authType !== 'certificate' || !(options.certificateFile && options.certificateBase64Encoded), {
    message: 'Specify either certificateFile or certificateBase64Encoded, but not both.'
  })
  .refine(options => options.authType !== 'certificate' || options.certificateFile || options.certificateBase64Encoded, {
    message: 'Specify either certificateFile or certificateBase64Encoded'
  })
  .refine(options => options.authType !== 'secret' || options.secret, {
    message: 'Secret is required when using secret authentication'
  })
  // allow unknown options
  .and(z.any());

waldekmastykarz · 2024-03-02T13:47:35Z

After fiddling with it some more, I found out that most likely we can't expose the whole schema incl. option set validation as one definition. globalOptions.extend() returns a ZodObject. You need a reference to it, if you want to extend the schema (ie. add more properties). As soon as you call refine() on the schema, the return type changes to ZodEffects. So, if we're to combine the schema in validation in one definition, we wouldn't be able to extend it in child commands that inherit from the definition. I'm suspect this would be blocking us because we're defining validators on different levels. I think we'll get more clarity when we try to add it to CLI codebase.

martinlingstuyl · 2024-03-02T15:48:51Z

Interesting work @waldekmastykarz 👍

waldekmastykarz · 2024-03-02T17:54:27Z

I've got a working setup with options definition and single-option validation being in one part and the option sets being in another. Had to change alias into a function to retain typesafety. Next stop, implementing it to one of our commands to see how it would fit our design.

One thing I like so far a lot is fewer strings and implicit definitions. Options and their types are expressed in code which gives us more robustness and design time feedback.

Here's btw the experimental code I've been fiddling with: https://github.com/waldekmastykarz/clim365-zod/blob/main/src/index.ts. It shows the option definition, but also how we can parse schema to get the necessary information (aliases and value types) to parse args with yargs-parser.

milanholemans · 2024-03-02T18:48:48Z

Nice research @waldekmastykarz! 👏 Looks like something we could definitely use.

A few remarks from my side:

Are optional option sets (runsWhen) supported?
In Minimist, we can't type something as a number, is Zod able to do this?

waldekmastykarz · 2024-03-03T17:25:22Z

Are optional option sets (runsWhen) supported?

Yes, this is basically the condition you specify on a schema-wide refiner, eg.

.refine(options => options.authType !== 'password' || options.userName, {
    message: 'Username is required when using password authentication'
  })

Runs only when authType is set to password. Reading the condition takes getting used to, because you specify the condition you want to be true, but then specify the message for when the condition failed.

In Minimist, we can't type something as a number, is Zod able to do this?

We'd do this in yargs-parser, which supports it. If we wait for zod, it's too late because the data is lost by then by whoever parsed command line args.

milanholemans · 2024-03-08T22:05:04Z

@waldekmastykarz another question. We have very few options that take 2 types as value. Take planner task set for example.

Is this supported by ZOD? This will also be quite annoying with yargs-parser right? Worst case we always have to parse it as string.

waldekmastykarz · 2024-03-09T14:30:21Z

zod allows you to define union types so in this can it could be number or enum. The tricky part would be parsing it properly using yargs-parser. Ideally, we'd exclude it from explicit types so that yargs-parser can figure out where the value is a string or a number itself (which it should do just fine). The problem is, that the solution we're looking at, we're automatically mapping zod schema to yargs-parser types and have no idea to exclude an option, so perhaps treating the value as a string ourselves, and maybe adding a transform to zod schema is the second best choice.

Jwaegebaert · 2024-03-13T12:32:23Z

Awesome work on this one @waldekmastykarz! Looks very promising to use.

waldekmastykarz · 2024-03-15T19:37:20Z

OK, I've got one command (login) done. Check out: waldekmastykarz@390cfda. I've also replaced minimist with yargs-parser. The current setup is so that both old and new zod-based validations work side by side. That way we can gradually replace it across all commands over time. zod-based validation is btw 3x slower than our: 0.6ms vs. 0.2ms, still the difference is imperceptible.

This is just the first step, so looking forward to hearing what you think.

waldekmastykarz · 2024-03-23T11:34:36Z

I've added support for dynamically building the telemetry: waldekmastykarz@fabf164. It turns out, that we don't need to parse the zod schema. Instead, we can use the command options info, which we gather while building the project. We have this information when loading the command and to get access to it, we need to pass it from the runtime into the command. Then it's just a matter of iterating over the options and correctly recording the different properties.

Jwaegebaert · 2024-03-23T14:24:11Z

Nice! That way it simplifies the command codebase even more. Looks very clean!

waldekmastykarz · 2024-03-23T16:58:59Z

Just pushed one more commit to include the necessary tests for 100% coverage. I suggest that we:

Have a final critical look at the work so far
Decide if we need to do some other commands to validate the concept before we commit to it

waldekmastykarz · 2024-03-23T17:04:50Z

Nice! That way it simplifies the command codebase even more. Looks very clean!

It's pretty cool to see how much we can simplify building commands. Building the schema will take getting used to, but with a few examples, we'll have the most cases covered. Looking forward to seeing it adopted across our codebase and finally removing the obsolete code. 🚀

Jwaegebaert · 2024-03-26T07:54:30Z

I'll try to make some time to go through it more thoroughly. But, like mentioned before, big fan of the fluent writing style. It'll indeed take a bit of time to get used to but the read-ability and simplicity in the end will make it worth it.

Jwaegebaert · 2024-04-15T18:48:55Z

I just pulled the code locally and took a deeper look at the setup. It seems like this will be a substantial refactor, especially since we'll need to update the tests too. However, the tests won’t be asynchronous anymore, which is a benefit. Here are a few observations:

In zod.ts, we're using any for several parameter types, for example in the getParseFn function. Is there a specific reason for this?
There's no error handling in the parseDef function. If getParseFn returns undefined, this could lead to problems.
The names of functions like parseString, parseNumber, etc., imply they parse values, but they actually update the currentOption object. We might want to reconsider the naming here.

Just a few minor points, as the implementation overall seems very well done. Once again, great job on this!

waldekmastykarz · 2024-04-16T13:33:16Z

In zod.ts, we're using any for several parameter types, for example in the getParseFn function. Is there a specific reason for this?

Yes! This is a factory which is responsible for retrieving invoking specific parsing functions. In each function, we want to have access to the exact type it supports. But when getting the function, we need to support every type and as far as I can tell, there's no shared type across all types that we could, hence any.

There's no error handling in the parseDef function. If getParseFn returns undefined, this could lead to problems.

If getParseFn returns undefined then the line that follows if (!parseDef) will be true and will break the loop, so it should be fine, unless you're referring to a different scenario?

The names of functions like parseString, parseNumber, etc., imply they parse values, but they actually update the currentOption object. We might want to reconsider the naming here.

I chose for parse because these functions are parsing the schema. I'm open to suggestions if we feel there are better alternatives that more accurately reflect what these functions do.

Jwaegebaert · 2024-04-18T06:37:45Z

But when getting the function, we need to support every type and as far as I can tell, there's no shared type across all types that we could, hence any.

I figured this might be the reason. We could create a typed alias for this scenario, but I'm not sure it's worth the effort.

If getParseFn returns undefined then the line that follows if (!parseDef) will be true and will break the loop, so it should be fine, unless you're referring to a different scenario?

Well, appeared to overlook that one 😅

I chose for parse because these functions are parsing the schema. I'm open to suggestions if we feel there are better alternatives that more accurately reflect what these functions do.

Good point. I considered using configureAs... but it does make the name longer, so I'm not sure if it actually makes things clearer.

waldekmastykarz · 2024-04-21T11:26:52Z

With that, @pnp/cli-for-microsoft-365-maintainers are we ready to accept this PR or is there anything else we should research first?

Adam-it · 2024-04-26T00:18:57Z

Had one more check (I hope final) on the implementation:

I like the fact we moved the telemetry outside of the command it seems a huge plus. BUT does it also mean now all options will be added to telemetry? Previously we decided which options were included and usually not all were there
as for the zod implementation: I don't have a strong opinion on the fluent writing style. Maybe I am just old or good used to what we have 😉. It seems a bit similar to like we handled callbacks which we refactored to async/await 😜. Anyway, It seems to be less code so fine for me 🫡👍
I don't mind it is a 'bit' slower. I think it is not noticeable 😉
it seems to be clean especially the way we extend the options and define all the 'rules' in a single place

My opinion: If we want to make this step (and we want) and we see benefit in it (why reinvent the wheel right?) then lets do it! if zod implementation may work side by side to what we already have I don't see why we do such a hold up and I would open a PR with it 👍.
So to sum up:
Awesome work @waldekmastykarz, you Rock 🤩

fine for me as of now
let's open a PR and stop waiting
IMO the PR should be approved by at least 2 maintainers
let's go

waldekmastykarz · 2024-04-26T06:17:12Z

I like the fact we moved the telemetry outside of the command it seems a huge plus. BUT does it also mean now all options will be added to telemetry? Previously we decided which options were included and usually not all were there

The schema decides which properties are tracked and we're using the same logic which we're using now:

required options are not tracked
optional options with string and numeric values are tracked if they're set, but not what the value is
optional enums are tracked with values

I'll get on submitting the PR. I appreciate your feedback!

Adam-it · 2024-04-26T06:31:51Z

A ok, perfect than 🤩😍

waldekmastykarz · 2024-04-27T12:07:16Z

I've opened the PR. After we merge it, let's create an epic where we can track upgrading all commands to use zod and finally decommissioning the obsolete code.

waldekmastykarz added enhancement needs discussion labels Nov 5, 2023

waldekmastykarz self-assigned this Feb 24, 2024

waldekmastykarz added needs research and removed needs discussion labels Feb 24, 2024

waldekmastykarz added needs discussion and removed needs research labels Apr 21, 2024

waldekmastykarz added a commit to waldekmastykarz/cli-microsoft365 that referenced this issue Apr 27, 2024

Introduces zod validation. Closes pnp#5639

9fb89b0

waldekmastykarz added a commit to waldekmastykarz/cli-microsoft365 that referenced this issue Apr 27, 2024

Introduces zod validation. Closes pnp#5639

da7d68d

waldekmastykarz linked a pull request Apr 27, 2024 that will close this issue

Introduces zod validation. Closes #5639 #6000

Open

waldekmastykarz added work in progress and removed needs discussion labels Apr 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research using zod for validating options #5639

Research using zod for validating options #5639

waldekmastykarz commented Nov 5, 2023

martinlingstuyl commented Nov 5, 2023

martinlingstuyl commented Nov 5, 2023 •

edited

waldekmastykarz commented Nov 5, 2023

waldekmastykarz commented Nov 5, 2023

martinlingstuyl commented Nov 5, 2023

Adam-it commented Nov 18, 2023

waldekmastykarz commented Feb 24, 2024

waldekmastykarz commented Mar 2, 2024 •

edited

waldekmastykarz commented Mar 2, 2024

waldekmastykarz commented Mar 2, 2024

waldekmastykarz commented Mar 2, 2024

martinlingstuyl commented Mar 2, 2024

waldekmastykarz commented Mar 2, 2024

milanholemans commented Mar 2, 2024

waldekmastykarz commented Mar 3, 2024

milanholemans commented Mar 8, 2024

waldekmastykarz commented Mar 9, 2024

Jwaegebaert commented Mar 13, 2024

waldekmastykarz commented Mar 15, 2024

waldekmastykarz commented Mar 23, 2024

Jwaegebaert commented Mar 23, 2024

waldekmastykarz commented Mar 23, 2024

waldekmastykarz commented Mar 23, 2024

Jwaegebaert commented Mar 26, 2024

Jwaegebaert commented Apr 15, 2024

waldekmastykarz commented Apr 16, 2024

Jwaegebaert commented Apr 18, 2024

waldekmastykarz commented Apr 21, 2024

Adam-it commented Apr 26, 2024 •

edited

waldekmastykarz commented Apr 26, 2024

Adam-it commented Apr 26, 2024

waldekmastykarz commented Apr 27, 2024

Research using zod for validating options #5639

Research using zod for validating options #5639

Comments

waldekmastykarz commented Nov 5, 2023

martinlingstuyl commented Nov 5, 2023

martinlingstuyl commented Nov 5, 2023 • edited

waldekmastykarz commented Nov 5, 2023

waldekmastykarz commented Nov 5, 2023

martinlingstuyl commented Nov 5, 2023

Adam-it commented Nov 18, 2023

waldekmastykarz commented Feb 24, 2024

waldekmastykarz commented Mar 2, 2024 • edited

waldekmastykarz commented Mar 2, 2024

waldekmastykarz commented Mar 2, 2024

waldekmastykarz commented Mar 2, 2024

martinlingstuyl commented Mar 2, 2024

waldekmastykarz commented Mar 2, 2024

milanholemans commented Mar 2, 2024

waldekmastykarz commented Mar 3, 2024

milanholemans commented Mar 8, 2024

waldekmastykarz commented Mar 9, 2024

Jwaegebaert commented Mar 13, 2024

waldekmastykarz commented Mar 15, 2024

waldekmastykarz commented Mar 23, 2024

Jwaegebaert commented Mar 23, 2024

waldekmastykarz commented Mar 23, 2024

waldekmastykarz commented Mar 23, 2024

Jwaegebaert commented Mar 26, 2024

Jwaegebaert commented Apr 15, 2024

waldekmastykarz commented Apr 16, 2024

Jwaegebaert commented Apr 18, 2024

waldekmastykarz commented Apr 21, 2024

Adam-it commented Apr 26, 2024 • edited

waldekmastykarz commented Apr 26, 2024

Adam-it commented Apr 26, 2024

waldekmastykarz commented Apr 27, 2024

martinlingstuyl commented Nov 5, 2023 •

edited

waldekmastykarz commented Mar 2, 2024 •

edited

Adam-it commented Apr 26, 2024 •

edited