Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CLI Settings Source #214

Open
wants to merge 50 commits into
base: main
Choose a base branch
from
Open

Conversation

kschwab
Copy link
Contributor

@kschwab kschwab commented Jan 22, 2024

Resolves #209.

This PR implements a CliSettingsSource with the following main features outlined below. Refer to docs/index.md for more details.

Current Examples

Fields

  • Same expectations as setting environment variables
  • Proposed _cli_parse_args can accept arg array or bool set to True for sys.argv
from pydantic import BaseModel

from pydantic_settings import BaseSettings


class DeepSubModel(BaseModel):
    v4: str


class SubModel(BaseModel):
    v1: str
    v2: bytes
    v3: int
    deep: DeepSubModel


class Settings(BaseSettings):
    v0: str
    sub_model: SubModel


print(Settings(_cli_parse_args=[
    '--v0=0',
    '--sub_model={"v1": "json-1", "v2": "json-2"}',
    '--sub_model.v2', 'nested-2',
    '--sub_model.v3', '3',
    '--sub_model.deep.v4', 'v4'
]).model_dump())
'''
{'v0': '0', 'sub_model': {'v1': 'json-1', 'v2': b'nested-2', 'v3': 3, 'deep': {'v4': 'v4'}}}
'''

Lists

  • Proposed intermixing of three list styles:
    • JSON style --field='[1,2,3]'
    • Argparse style --field 1 --field 2 --field 3
    • Lazy style --field=1,2,3
from pydantic import BaseModel
from typing import List
from pydantic_settings import BaseSettings


class Settings(BaseSettings):
    my_list: List[int]

print(Settings(_cli_parse_args=[
    '--my_list', '[1,2]',
    '--my_list', '3,4',
    '--my_list', '5', '--my_list', '6'
]).model_dump())
'''
{'my_list': [1, 2, 3, 4, 5, 6]}
'''

Dictionaries

  • Proposed intermixing of two dictionary styles:
    • JSON style --field='{"k1": 1, "k2": 2}'
    • Env Var style --field k1=1 --field k2=2
  • These styles can be used in conjunction with list forms as well:
    • --field k1=1,k2=2 --field k3=3 --field '{"k4: 4}' etc.
from pydantic import BaseModel
from typing import Dict
from pydantic_settings import BaseSettings


class Settings(BaseSettings):
    my_dict: Dict[str, str]


print(Settings(_cli_parse_args=[
    '--my_dict', '{"k1": "a", "k2": "b"}',
    '--my_dict', '{"k3": "c"}, {"k4": "d"}',
    '--my_dict', '{"k5": "e"}', '--my_dict', '{"k6": "f"}',
    '--my_dict', '[k7=g,k8=h]',
    '--my_dict', 'k9=i,k10=j',
    '--my_dict', 'k11=k', '--my_dict', 'k12=l',
]).model_dump())
'''
{'my_dict': {
    'k1': 'a',
    'k2': 'b',
    'k3': 'c',
    'k4': 'd',
    'k5': 'e',
    'k6': 'f',
    'k7': 'g',
    'k8': 'h',
    'k9': 'i',
    'k10': 'j',
    'k11': 'k',
    'k12': 'l',
}}
'''

Subcommands and Positionals

  • Proposed help text generation based off of field descriptions and class documentation
  • Proposed _cli_prog_name has same meaning as in argparse
  • Proposed CliSubCommand annotation for marking subcommands
  • Proposed CliPositionalArg annotation for marking positional arguments
from pydantic import BaseModel
from typing import Tuple, Type
from pydantic_settings import BaseSettings, CliSubCommand, CliPositionalArg, PydanticBaseSettingsSource


class FooPlugin(BaseModel, use_attribute_docstrings=True):
    '''git-plugins-foo - Extra deep foo plugin command'''

    my_feature: bool = False
    '''Enable my feature on foo plugin'''


class BarPlugin(BaseModel, use_attribute_docstrings=True):
    '''git-plugins-bar - Extra deep bar plugin command'''

    my_feature: bool = False
    '''Enable my feature on bar plugin'''


class Plugins(BaseModel, use_attribute_docstrings=True):
    '''git-plugins - Fake plugins for GIT'''

    foo: CliSubCommand[FooPlugin]
    '''Foo is fake plugin'''

    bar: CliSubCommand[BarPlugin]
    '''Bar is also a fake plugin'''


class Clone(BaseModel, use_attribute_docstrings=True):
    '''git-clone - Clone a repository into a new directory'''

    repository: CliPositionalArg[str]
    '''The repository to clone'''

    directory: CliPositionalArg[str]
    '''The directory to clone into'''

    local: bool = False
    '''When the resposity to clone from is on a local machine, bypass ...'''

    shared: bool = False
    '''Force the clone process from a reposity on a local filesystem ...'''


class Init(BaseModel, use_attribute_docstrings=True):
    '''git-init - Create an empty Git repository or reinitialize an existing one'''

    directory: CliPositionalArg[str]
    '''The directory to clone into'''

    quiet: bool = False
    '''Only print error and warning messages; all other input will be suppressed'''

    bare: bool = False
    '''Create a bare repository. If GIT_DIR environment is not set, it is set ...'''


class Git(BaseSettings, use_attribute_docstrings=True):
    '''git - The stupid content tracker'''

    clone: CliSubCommand[Clone]
    '''Clone a repository into a new directory'''

    init: CliSubCommand[Init]
    '''Create an empty Git repository or reinitialize an existing one'''

    plugins: CliSubCommand[Plugins]
    '''Fake GIT plugin commands'''


print(Git(_cli_prog_name='git', _cli_parse_args=['--help']))
'''
usage: git [-h] {clone,init,plugins} ...

git - The stupid content tracker

options:
  -h, --help            show this help message and exit

subcommands:
  {clone,init,plugins}
    clone               Clone a repository into a new directory
    init                Create an empty Git repository or reinitialize an existing one
    plugins             Fake GIT plugin commands
'''


print(Git(_cli_prog_name='git', _cli_parse_args=['clone', '--help']))
'''
usage: git clone [-h] [--local bool] [--shared bool] REPOSITORY DIRECTORY

git-clone - Clone a repository into a new directory

positional arguments:
  REPOSITORY     The repository to clone
  DIRECTORY      The directory to clone into

options:
  -h, --help     show this help message and exit
  --local bool   When the resposity to clone from is on a local machine, bypass ...
  --shared bool  Force the clone process from a reposity on a local filesystem ...
'''


print(Git(_cli_prog_name='git', _cli_parse_args=['plugins', 'bar', '--help']))
'''
usage: git plugins bar [-h] [--my_feature bool]

git-plugins-bar - Extra deep bar plugin command

options:
  -h, --help         show this help message and exit
  --my_feature bool  Enable my feature on bar plugin
'''

Will update to be individualized. This will also help improve generated
help text, which was a concern when using class doc strings.
@kschwab
Copy link
Contributor Author

kschwab commented Jan 22, 2024

@hramezani and @samuelcolvin

I am most of the way through but hit this limitation in the underlying environment variable parsing:

JSON is only parsed in top-level fields, if you need to parse JSON in sub-models, you will need to implement validators on those models.

Obviously, this would be very nice to have from a CLI perspective (e.g., nested lists etc.). I didn't know this was the case until I started writing nested tests. Do you have suggestions on what the best course of action is here? Ideally, I would love for it to be handled in environment parsing. If not, from CLI perspective the next easiest path would be nested dictionaries of strings. Is that something I can throw to Pydantic and have a reasonable expectation it will sort it out?

Copy link

codecov bot commented Jan 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (40230ab) 97.69% compared to head (cb9c1c3) 98.65%.
Report is 5 commits behind head on main.

❗ Current head cb9c1c3 differs from pull request most recent head a984f32. Consider uploading reports for the commit a984f32 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #214      +/-   ##
==========================================
+ Coverage   97.69%   98.65%   +0.96%     
==========================================
  Files           5        5              
  Lines         347      597     +250     
  Branches       76      152      +76     
==========================================
+ Hits          339      589     +250     
  Misses          6        6              
  Partials        2        2              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hramezani
Copy link
Member

Thanks @kschwab for this patch 🙏

Regarding JSON parsing issue please take the easiest path.

Also, you need to add documentation for this and show how to use this new feature.
I can see you've added different configs, please add some examples for them in the doc to show what is the use case of them.

@kschwab
Copy link
Contributor Author

kschwab commented Jan 22, 2024

Sorry @hramezani, I should have been more specific with my question. I am at a fork and am looking for guidance on which direction to take.

Is there any background on why nested JSON isn't supported in EnvSettingsSource? Was there something that scared us away from adding it? Is it a massive pile of work? etc.

My preference would be to solve it there so that the CLI and ENV are consistent. Just trying to ascertain if that is a reasonable direction or not.

@hramezani
Copy link
Member

I think the comment that you brought from the doc is an old one and related to the first time that nested env support added to Pydantic V1.

Here is my understanding from the comment:
if you have a settings class like

class SubValue(BaseModel):
    v4: str
    v5: int

class TopValue(BaseModel):
    v1: str
    v2: str
    v3: str
    sub: SubValue

class Cfg(BaseSettings):
    v0: str
    top: TopValue
    model_config = SettingsConfigDict(env_nested_delimiter='__')

you only can provide JSON values for Cfg.top. e.g.

V0=test TOP='{"v1": "json-1", "v2": "nested-2", "v3": "3", "sub": {"v4": "v4", "v5": 5}}'

and it is not possible to provide JSON values for sub fields like:

V0=test TOP='{"v1": "json-1", "v2": "nested-2", "v3": "3"}' TOP__SUB='{"v4": "v4", "v5": 5}'

But, you can see that passing JSON values for sub fields also works.
So, Probably we can remove it.

@kschwab
Copy link
Contributor Author

kschwab commented Jan 22, 2024

Got it. So it's either a bug or something invalid in the test case. Below is an example of the issue I am seeing:

import os
from pydantic import BaseModel
from typing import Optional, List
from pydantic_settings import BaseSettings


class Child(BaseModel):
    num_list: Optional[List[int]] = None


class Cfg(BaseSettings, env_nested_delimiter='__'):
    child: Optional[Child] = None


os.environ['CHILD__NUM_LIST'] = '[1,2,3]'
print(Cfg().model_dump())
'''
Traceback (most recent call last):
  File "/home/kschwab/pydantic-settings/example.py", line 23, in <module>
    print(Cfg().model_dump())
  File "/home/kschwab/pydantic-settings/pydantic_settings/main.py", line 93, in __init__
    super().__init__(
  File "/home/kschwab/.local/lib/python3.10/site-packages/pydantic/main.py", line 171, in __init__
    self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for Cfg
child.num_list
  Input should be a valid list [type=list_type, input_value='[1,2,3]', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/list_type
'''

If you note the input_value='[1,2,3]' in the validation error it got the correct input, it just did not convert it into a list. This is what led me to believe that nested JSON must not be supported.

@hramezani
Copy link
Member

It's a bug. Here is the related issue

@kschwab
Copy link
Contributor Author

kschwab commented Jan 22, 2024

Well, not great that it's a bug, but good in the context of this PR since it allows us to keep using EnvSettingsSource as the backbone. We'll have to address that issue first before this PR goes in.

I'll continue on with rest of documentation, cleanup, etc. and loop back to it if it's still there.

pydantic_settings/sources.py Outdated Show resolved Hide resolved
@kschwab
Copy link
Contributor Author

kschwab commented Jan 24, 2024

The bulk of the functionality is now complete. Will continue to work on cleanup, comments, docs etc. over next few days. The only key remaining feedback items needed for any adjustments are:

  • Subcommand and Positional Argument Implementation (refer to GIT CLI example for reference)
    • Does the CliSubCommand and CliPositionalArg annotations for denoting their respective meanings look OK?
      • There are a few ways to accomplish this. I took the stance of a more explicit annotation at the user level for denoting subcommands and positional arguments. Overall, it is quite clean and makes intent clear when reading the model. The alternative approach is implicit coercion, with union fields into subcommands and required fields into positionals. This takes flexibility away from user, who may want a similar experience to that as environment variable overrides. Therefore, the default experience is in line with how Pydantic is already structured, and deviation from the expectation is expressed through these annotations.
    • Is single subparser per model OK?
      • I took an opinionated stance of only supporting a single subcommand group (single subparser) per model. If we want to support multiple subparsers, we would likely need to go back to an earlier version I had here. I changed from that direction however because:
        • Better control of generated help text. With a union implementation, the help text for the subcommand gets pulled from the class doc, which works so long as it is not long. However, in our application, we work with larger models that do have verbose class documentation so it can flood the help text. Simple workaround would be parsing the first sentence of the class doc etc. but not worth it IMO. With current implementation, the subcommand help text is the field doc string making it very straightforward. Then, the class doc itself is displayed when using help on the subcommand itself. e.g. git --help shows condensed help for the clone subcommand, whereas git clone --help shows the expanded class doc help text there.
        • Explicit naming of subcommands. In union implementation, the subcommand name must be coerced from the class name or some other mechanism. With current implementation, it is whatever the subcommand field name is.
        • Avoids the need for Discriminated Unions. Absolutely an option but requires more boilerplate, which takes away from the clean aesthetic. This was actually the forcing issue that caused me to change direction. The normal union selection was not strong enough so users would have needed to add discriminated unions for any subcommand.
    • Positional arguments always placed before subcommands?
      • Another opinionated stance, although of smaller concern. As is today, there is no requirement with regards to field ordering within the definition of a model. However, the resultant CLI will reorder positional args before subcommands. This is mostly to prevent users from shooting themselves in the foot. Perhaps this is best to just change to a flag, i.e., turn the safety off?

Outside of the above, I think everything else is pretty straightforward and aligns with precedence already set by environment variables.

@kschwab
Copy link
Contributor Author

kschwab commented Jan 28, 2024

Hi @hramezani, need some help with resolving the below:

  1. Many of the doc examples use CLI --help to view generated help text etc. However, this results in an exit, which pytest does not like. How to update pytest such that exit is allowed in these expected cases?
  2. Many of the doc examples use the recently merged use_attribute_docstrings flag, which does not appear available in tests. How to update the tests such that the pydantic version includes this commit?
  3. Is there a way to generate the docs locally for viewing?

@hramezani
Copy link
Member

Many of the doc examples use CLI --help to view generated help text etc. However, this results in an exit, which pytest does not like. How to update pytest such that exit is allowed in these expected cases?

You can use

```py test="skip"

to skip running on those blocks

Many of the doc examples use the pydantic/pydantic#6563 use_attribute_docstrings flag, which does not appear available in tests. How to update the tests such that the pydantic version includes this commit?

This will be available on Pydantic 2.6. It is in beta and will be released soon. We are planing to release the next pydantic-settings after Pydantic 2.6. So, in the end, we can use use_attribute_docstrings flag.

Is there a way to generate the docs locally for viewing?

Not right now. pydantic-settings docs will be rendered under pydantic docs. But I will work on it to make it possible to render docs in pydantic-settings

Co-authored-by: Samuel Colvin <s@muelcolvin.com>
@kschwab
Copy link
Contributor Author

kschwab commented Feb 17, 2024

@hramezani @samuelcolvin Thank you for the feedback, really appreciate it and will make updates accordingly. We found a couple of changes we'll also bring in after playing with it internally:

  1. The enum support should be moved deeper into the environment variable parsing. Will open a separate issue and address it there.
  2. We plan to integrate this with other existing parsers (one of which is pytest), so will be providing some support to "export" the internally generated parser and likely a pytest specific example. Still working on how to best accomplish this but will push once ready.

@kschwab
Copy link
Contributor Author

kschwab commented Mar 5, 2024

  1. The enum support should be moved deeper into the environment variable parsing. Will open a separate issue and address it there.

Opened #252 to address this.

@kschwab kschwab removed their assignment Mar 13, 2024
@kschwab
Copy link
Contributor Author

kschwab commented Mar 24, 2024

@hramezani @samuelcolvin @dmontagu updates are complete and ready for review.

@kschwab kschwab requested a review from hramezani March 24, 2024 19:40
@hramezani
Copy link
Member

@hramezani @samuelcolvin @dmontagu updates are complete and ready for review.

Thanks @kschwab for the update.
We will review it later and probably this will be included in the next release

tests/test_settings.py Outdated Show resolved Hide resolved
@hramezani
Copy link
Member

@kschwab Sorry for the late response. Finally, I found time to work on this and test it.
The implementation is good and the doc is very well 👍

There are something that need to be fixed:

  • If you enable SettingsConfigDict(cli_parse_args=True), then the CLI source will be the first priority and there is no way to change the priority. This is not like other sources that we've added before like toml and yaml. you include them and change the priority as well. So, this needs to be changed and we should have the same behavior as other new sources.

  • It doesn't work with aliases. with the following model:

class Settings(BaseSettings):
    apple: str = Field(alias='alias')

    model_config = SettingsConfigDict(cli_parse_args=True)

we can't use --alias foo.
As alias support is included in other sources, we have to have it in CLI as well.

  • It doesn't work with case_sensitive config and always expects the field name. With the following model:
class Settings(BaseSettings):
    foo: str

    model_config = SettingsConfigDict(cli_parse_args=True, case_sensitive=False)  # default is `False`

Then --FOO bar doesn't work. in env source, we convert all the env names to lowercase in when case_sensitive=False

  • It doesn't work well with dataclasses in sub model. for example with:
@pydantic_dataclasses.dataclass
class MyDataclass:
    foo: int

class Settings(BaseSettings):
    n: MyDataclass

You have to pass --n '{"foo": 123} but --n.foo 123 doesn't work. but it works for normal basemodel submodels

Thanks for your great work here ❤️

@kschwab
Copy link
Contributor Author

kschwab commented May 14, 2024

Thanks @hramezani for the review, we'll address the feedback items soon.

With respect to the CliSettingsSouce prioritization, originally, we had prioritization that could be changed but removed it after our discussion here. The conclusion was to use fixed prioritization so as to not break backwards compatibility, and then revisit dynamic prioritization once this issue is fixed in V3.

We can change it back to the original if you would prefer, let me know your thoughts.

@hramezani
Copy link
Member

Thanks @hramezani for the review, we'll address the feedback items soon.

With respect to the CliSettingsSouce prioritization, originally, we had prioritization that could be changed but removed it after our discussion here. The conclusion was to use fixed prioritization so as to not break backwards compatibility, and then revisit dynamic prioritization once this issue is fixed in V3.

We can change it back to the original if you would prefer, let me know your thoughts.

Yeah. that was my first thought. but after talking with the team we decided to have the CLI source like other sources.
Can't we implement the new CLI source like other new sources(toml, yaml) and include it like them? it would be good to
Sorry about that.

@kschwab
Copy link
Contributor Author

kschwab commented May 15, 2024

No worries!

Can't we implement the new CLI source like other new sources(toml, yaml) and include it like them?

I haven't looked at the latest toml, yaml source changes and how it was integrated, but will try and align with it if possible. I don't think there should be any issues. The only thing that comes to mind is the external parser support, but I think it should be fine. We'll make the updates and see where we land 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Add CLI argument parsing for model initialization
3 participants