New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce parameters configuration #1650
Conversation
8398c6b
to
b146236
Compare
75556dd
to
3f7cb3c
Compare
a06f53d
to
3f4ee15
Compare
if not all(c.isalpha() or c == "_" for c in name): | ||
raise ValueError( | ||
"Names are expected to only contain characters and `_`, was: {name}" | ||
) | ||
|
||
return name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a bit too early to be concerned with what characters are allowed, but I'm fine with it.
Normally I would expect the name to be whatever, but if the name was to be used in URLs or file names, then it would be slugified (https://docs.djangoproject.com/en/dev/glossary/#term-slug) by the application as an implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree! However, there is already a concept of referencing variables from either storage or the workspace in ensemble.yml
that uses .
as delimiter. I think the format should be up for discussion, but by not allowing them to be included in the variable names we avoid some odd pitfalls in the imminent future. Slugging is not really an option when the user is referencing these variables later in the configuration file (storage.my_record
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good, it's sane to move this stuff out of engine (if it is to live up to its name, then the fuel needs to be provided, not created inside it)—good tests! 👏
ert3/config/_parameters_config.py
Outdated
type: Literal["gaussian", "uniform"] | ||
input: _DistributionInput | ||
|
||
@root_validator(pre=True) | ||
def _ensure_gaussian_required_fields(cls, values): # type: ignore | ||
dist_type = values["type"] | ||
input_fields = values["input"].keys() | ||
if dist_type == "uniform": | ||
expected = {"lower_bound", "upper_bound"} | ||
actual = set(input_fields) | ||
if expected != actual: | ||
raise AssertionError( | ||
f"Expected distribution inputs: {expected}, was: {actual}" | ||
) | ||
elif values["type"] == "gaussian": | ||
expected = {"mean", "std"} | ||
actual = set(input_fields) | ||
if expected != actual: | ||
raise AssertionError( | ||
f"Expected distribution inputs: {expected}, was: {actual}" | ||
) | ||
else: | ||
raise ValueError(f"Validating unknown distribution: {dist_type}") | ||
|
||
return values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to do this the same way as suggested in this issue? 🤔 #1571
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably yes! I'll give it a try...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the changes now and my conclusion is that:
- it is definitively neater within the Pydantic framework. It is probably usage as intended and that is always good 👍🏻
- the error messages becomes completely useless when one relies on
Union
, as Pydantic tries to match against the different options. But, this is something we can investigate and see whether there are fixes for later...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was afraid that would be the case for the last part yes 😕 But as you say, that is something we can investigate later!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if this issue is resolved, this would be a lot simpler (and faster parsing): pydantic/pydantic#619
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is also of interest: pydantic/pydantic#503 (comment)
390a66a
to
4e70ec3
Compare
4e70ec3
to
8142133
Compare
Issue
Since the early days of ert3 it has remained the engines responsibility to parse and load the available parameters and their distributions. All other elements of the configuration have been loaded further up in the call-stack using Pydantic. This PR changes this so that also the parameters are loaded early, using Pydantic and later passed to the engine.
This has multiple benefits! The first is that
ert3.engine
no longer has any knowledge of the layout of a workspace. Due to this we are also able to remove quite some file operations in the corresponding tests.Furthermore, this PR is necessary preparation for #1340.
Consider this PR payback ™️
Approach
The first two commits is prep work in the stats module. Afterwards, the Pydantic schema and the corresponding tests for the parameter configuration is introduced. Afterwards, a debateable decision is made in that a configured parameter group has the capability to spawn the corresponding distribution. I don't see a better place to put this responsibility as of now, but it might very well be that this is moved in the future ™️ The remaining commits besides the last one propagates the config down into the engine, all the way down to where the parameters are currently loaded, swaps the implementation such that it is loaded both from disk and config and compared, then removing the dependency on the
utils
before last theutils
is completely removed. The very last commits just removes the necessary file toggling that is currently done in the engine tests, such that the config is only present in-memory.This is not how I typically structure my commits, but in this particular case I belive it is relevant to display "how it was done" in a step by step transformation and not only the "polished" history for the future reader. In particular iterating between changing iterface and tests and swapping underlying implementation. As a consequence, some of these commits should be squashed before merging...
New issues to be made
lower_bound == upper_bound
). I think this is a bad way of communicating the intent of constant data and hence think (as the original implementation in this PR was doing) that we should requirelower_bound < upper_bound
. However, that has consequences beyond the scope of this PR and hence I think it should be turned into an issue for later.