Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request:load the config should log the load source by debug level #63

Open
faker-bert opened this issue Jun 1, 2023 · 20 comments
Labels
enhancement New feature or request

Comments

@faker-bert
Copy link

When pydantic loads the configuration, it records what data source the configuration is read from, and provides debug-level log output.i think if the feature is added, this library will work better.

By reading the source code, I found that it is currently unable to support recording the configuration source. This can be achieved by encapsulating the chainmap, or by adding an additional dict to record. Due to my limited level, I am afraid that direct modification of the source code will not completely cover it. testing, thus introducing more bugs, so I hope the development team can add this feature.

@hramezani
Copy link
Member

pydantic-settings loads settings from all data sources defined in settings_customise_sources. the value for a field can be loaded in multiple sources but one of them will be selected based on priority defined in settings_customise_sources.

What is your request:

  • Logging the data loaded from different sources?
  • Or logging the final value for a field has been selected from which source?

@faker-bert
Copy link
Author

logging the final value for a field has been selected from which source

I tried to submit the code to pydantic v1, and later found that the development team separated the setting from pydantic into a sub-project. My implementation method needs to modify the deep_update code, and according to the source code, this function is referenced from the _utils file of v2. Since I found out that this function was only used in BaseSettings during the initial pr submission process, I modified it directly, but the v2 project keeps the function in the pydantic project. I don’t know if there are some problems in this way, because other functions The deep_update function is not used

@faker-bert
Copy link
Author

In order to reduce code changes as much as possible and introduce functions, I only think of this idea of ​​​​changing deep_update. But since the code is not in a project, I can't modify and submit pr

@hramezani
Copy link
Member

I tried to submit the code to pydantic v1, and later found that the development team separated the setting from pydantic into a sub-project

Yeah. pydantic-settings is a separate package after V2

Yeah deep_update is in Pydantic package and pydantic-settings used it. it will be deprecated in V2 and we might need to move it to pydantic-settings.

I think even by changing the deep_update it is hard to say the field value comes from which source. Consider the following field:

a: int = Field(validation_alias='a_alias')

and two sources with these values:

s1 = {'a': 1}
s2 = {'a_alias': 2}

deep_update returns a dict like {'a': 1, 'a_alias': 2}. Do you know at this point which value has to be selected for the field? you may find it but I am afraid it would be complicated.

At the end Pydantic validates the data and selects data for fields (based on field_name, alias, alias_generator, and some config).

@faker-bert
Copy link
Author

@hramezani

My understanding is that based on the configuration of custome_source, deep_update will return a unified list.

This list will contain two types of keys, one is member variables, the other is alias, each member variable must have its own variable key, there may be alias, you can use debug when you get a specific key for the first time Form for log processing.

Of course, this form is only convenient for the debugging process of the setting. It is also possible to talk about the source of alias and the source of its own variable key for log output.

Maybe I can try to migrate the deep_update in v2's _utuls.py file to the utils of this project and make a brief pull request?

@faker-bert
Copy link
Author

@hramezani

Perhaps we can dive deeper into this feature and determine how this idea should be implemented

@hramezani
Copy link
Member

I've checked your old PR in pydantic and specially the change that you made in deep_update. in the end, it will log that each value is loaded from which sources. it has some problems:

  • It may log multiple times for a value if the key exists in more than one source
  • The implementation is complex and it makes the function hard to understand
  • Performance wise it's not good because after each value you loop over the fields
  • At the end, Pydantic will select the value for the field based on alias, config, ... . So, even if we log value 'x' loads from 'config 1' it might not be the final value for the field(as I explained in my previous comment)

Maybe I can try to migrate the deep_update in v2's _utuls.py file to the utils of this project and make a brief pull request?

At some point, we might need to move this function here in pydantic-settings. but I would prefer to do it whenever we are going to remove the function from Pydantic.

@faker-bert
Copy link
Author

@hramezani

Actually, initially I wanted to add a sentinel class to specifically detect sources, but with my current knowledge, it seems difficult to add this feature with as little code change as possible, and I am afraid that other bugs will be introduced due to the addition of classes.

So I choose to modify deep_update, because in theory this function is to aggregate all keys.

The loop used in deep_update currently only uses two generators, consider copying the list of keys fields of the setting instance, and removing the key from the list if the relevant value is obtained. As for aliased setting members, this situation was not considered before, and I also need to think about how to solve the ambiguity problem based on the specific code.

Other ideas such as using the chain_map in python to replace the original {} may also be considered, but the impact of the change will be relatively large

@faker-bert
Copy link
Author

Because that source is actually multiple configuration sources, if the previous configuration source has obtained the property, you can directly delete the key that has obtained the value, so that the list can be narrowed, or deleted after sorting, maybe you can get a part of the performance improvement?

As for how to map in the case of alias, this needs to be determined according to the specific logic of pydantic.

In order to increase performance, consider not performing these field judgment operations if the incoming setting is None, which can also optimize the complexity of some of the runs

@hramezani
Copy link
Member

We may can log values loaded from sources and at the end log the final values that has to be passed to pydantic. something like:

DEBUG    pydantic_settings.main:main.py:120 {'foo': 'foo_secret_value_str'} loaded from SecretsSettingsSource
DEBUG    pydantic_settings.main:main.py:120 {'foo': 'foo_env_value_str'} loaded from DotEnvSettingsSource
DEBUG    pydantic_settings.main:main.py:120 {} loaded from EnvSettingsSource
DEBUG    pydantic_settings.main:main.py:120 {} loaded from InitSettingsSource
DEBUG    pydantic_settings.main:main.py:123 {'foo': 'foo_env_value_str'} is the final settings values

we can easily have it by

diff --git a/pydantic_settings/main.py b/pydantic_settings/main.py
index 88478f1..05a7f5a 100644
--- a/pydantic_settings/main.py
+++ b/pydantic_settings/main.py
@@ -1,5 +1,6 @@
 from __future__ import annotations as _annotations

+import logging
 from pathlib import Path
 from typing import Any

@@ -18,6 +19,8 @@ from .sources import (

 env_file_sentinel: DotenvType = Path('')

+log = logging.getLogger(__name__)
+

 class SettingsConfigDict(ConfigDict):
     case_sensitive: bool
@@ -111,7 +114,14 @@ class BaseSettings(BaseModel):
             file_secret_settings=file_secret_settings,
         )
         if sources:
-            return deep_update(*reversed([source() for source in sources]))
+            sources_values = []
+            for source in reversed(sources):
+                source_values = source()
+                log.debug(f'{source_values} loaded from {source.__class__.__name__}')
+                sources_values.append(source_values)
+            values = deep_update(*sources_values)
+            log.debug(f'{values} is the final settings values')
+            return values
         else:
             # no one should mean to do this, but I think returning an empty dict is marginally preferable
             # to an informative error and much better than a confusing error

But let's wait for @samuelcolvin opinion here

@faker-bert
Copy link
Author

@hramezani

Thank you for your ideas, but I think it should be specific to which data source each value is obtained from. For example, foo may exist in different data sources.

But according to the priority, the final value of foo comes from a single data source, which is not easy to locate. This is often better positioned in the case of configuration loading errors. If you simply output different source results, this is actually easier to achieve.

What do you think?

@faker-bert
Copy link
Author

We may can log values loaded from sources and at the end log the final values that has to be passed to pydantic. something like:


DEBUG    pydantic_settings.main:main.py:120 {'foo': 'foo_secret_value_str'} loaded from SecretsSettingsSource

DEBUG    pydantic_settings.main:main.py:120 {'foo': 'foo_env_value_str'} loaded from DotEnvSettingsSource

DEBUG    pydantic_settings.main:main.py:120 {} loaded from EnvSettingsSource

DEBUG    pydantic_settings.main:main.py:120 {} loaded from InitSettingsSource

DEBUG    pydantic_settings.main:main.py:123 {'foo': 'foo_env_value_str'} is the final settings values

we can easily have it by

diff --git a/pydantic_settings/main.py b/pydantic_settings/main.py

index 88478f1..05a7f5a 100644

--- a/pydantic_settings/main.py

+++ b/pydantic_settings/main.py

@@ -1,5 +1,6 @@

 from __future__ import annotations as _annotations



+import logging

 from pathlib import Path

 from typing import Any



@@ -18,6 +19,8 @@ from .sources import (



 env_file_sentinel: DotenvType = Path('')



+log = logging.getLogger(__name__)

+



 class SettingsConfigDict(ConfigDict):

     case_sensitive: bool

@@ -111,7 +114,14 @@ class BaseSettings(BaseModel):

             file_secret_settings=file_secret_settings,

         )

         if sources:

-            return deep_update(*reversed([source() for source in sources]))

+            sources_values = []

+            for source in reversed(sources):

+                source_values = source()

+                log.debug(f'{source_values} loaded from {source.__class__.__name__}')

+                sources_values.append(source_values)

+            values = deep_update(*sources_values)

+            log.debug(f'{values} is the final settings values')

+            return values

         else:

             # no one should mean to do this, but I think returning an empty dict is marginally preferable

             # to an informative error and much better than a confusing error

But let's wait for @samuelcolvin opinion here

It seems that this can't solve the alias situation. For a configuration with an alias, where is the configuration loaded from?

@faker-bert
Copy link
Author

We may can log values loaded from sources and at the end log the final values that has to be passed to pydantic. something like:


DEBUG    pydantic_settings.main:main.py:120 {'foo': 'foo_secret_value_str'} loaded from SecretsSettingsSource

DEBUG    pydantic_settings.main:main.py:120 {'foo': 'foo_env_value_str'} loaded from DotEnvSettingsSource

DEBUG    pydantic_settings.main:main.py:120 {} loaded from EnvSettingsSource

DEBUG    pydantic_settings.main:main.py:120 {} loaded from InitSettingsSource

DEBUG    pydantic_settings.main:main.py:123 {'foo': 'foo_env_value_str'} is the final settings values

we can easily have it by

diff --git a/pydantic_settings/main.py b/pydantic_settings/main.py

index 88478f1..05a7f5a 100644

--- a/pydantic_settings/main.py

+++ b/pydantic_settings/main.py

@@ -1,5 +1,6 @@

 from __future__ import annotations as _annotations



+import logging

 from pathlib import Path

 from typing import Any



@@ -18,6 +19,8 @@ from .sources import (



 env_file_sentinel: DotenvType = Path('')



+log = logging.getLogger(__name__)

+



 class SettingsConfigDict(ConfigDict):

     case_sensitive: bool

@@ -111,7 +114,14 @@ class BaseSettings(BaseModel):

             file_secret_settings=file_secret_settings,

         )

         if sources:

-            return deep_update(*reversed([source() for source in sources]))

+            sources_values = []

+            for source in reversed(sources):

+                source_values = source()

+                log.debug(f'{source_values} loaded from {source.__class__.__name__}')

+                sources_values.append(source_values)

+            values = deep_update(*sources_values)

+            log.debug(f'{values} is the final settings values')

+            return values

         else:

             # no one should mean to do this, but I think returning an empty dict is marginally preferable

             # to an informative error and much better than a confusing error

But let's wait for @samuelcolvin opinion here

In fact, I think it is only necessary to use the log module to record the specific data sources of the member variables of the configuration class. It is not necessary to log all data sources. For most users, at least for me, this does not improve the debugging experience of the configuration much. Users can actively call these configuration sources in other places. provided value. However, the specific configuration is obtained from the first few data sources, but it seems that it has not been well helped

@oslijunw
Copy link

oslijunw commented Jun 2, 2023

like what I did by my pr

@oslijunw
Copy link

oslijunw commented Jun 2, 2023

alias may also need to be considered

@hramezani
Copy link
Member

Considering alias is complicated. the alias will be handled in environment variables sources. but in other sources, it doesn't.

for example if we have a settings model like:

class Settings(BaseSettings):
    x: int = Field(validation_alias='alias_x')

and env variables:

os.environ['x'] = '1'
os.environ['alias_x'] = '2'

and call the settings like:

Settings(**{'x': 3, 'alias_x': 4})

Then, InitSettingsSource source returns {'x': 3, 'alias_x': 4} and EnvSettingsSource returns {'alias_x': '2'}. You can see that EnvSettingsSource only returns the value for alias.

BTW, I still think the logging shouldn't be too complicated and we only need to log values loaded from sources based on priority.

@oslijunw
Copy link

oslijunw commented Jun 2, 2023

Considering alias is complicated. the alias will be handled in environment variables sources. but in other sources, it doesn't.

for example if we have a settings model like:

class Settings(BaseSettings):
    x: int = Field(validation_alias='alias_x')

and env variables:

os.environ['x'] = '1'
os.environ['alias_x'] = '2'

and call the settings like:

Settings(**{'x': 3, 'alias_x': 4})

Then, InitSettingsSource source returns {'x': 3, 'alias_x': 4} and EnvSettingsSource returns {'alias_x': '2'}. You can see that EnvSettingsSource only returns the value for alias.

BTW, I still think the logging shouldn't be too complicated and we only need to log values loaded from sources based on priority.

When it comes to aliases, we only need to briefly test them. If x has an alias_x, consider the following situations: 1. Whether x and alias_x appear at the same time, and can only be assigned when they are equal at the same time; 2. and 1 is similar, but if it is not equal, is a certain value preferred? 3. Can x and alias_x be assigned as long as they appear in one data source?

Haha, about whether the log is too complicated, in fact, my pr is not much different from your code above, just take the intersection of the two lists, and you can know what appears in those data sources without excessively introducing complicated methods The corresponding key, but maybe some case processing is required?

@oslijunw
Copy link

oslijunw commented Jun 2, 2023

#68 is my pr

@hramezani
Copy link
Member

After talking with our team(Pydantic team), we decided to go for a simple logging approach. The one that I suggested here.

Please open a PR with the proposed diff and provide some test for it if you want to work on it.

@hramezani hramezani added the enhancement New feature or request label Jun 5, 2023
@faker-bert
Copy link
Author

After talking with our team(Pydantic team), we decided to go for a simple logging approach. The one that I suggested here.

Please open a PR with the proposed diff and provide some test for it if you want to work on it.

Ok It depends on what your team thinks, and you can simply log the data source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants