Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce parameters configuration #1650

Merged
merged 3 commits into from May 13, 2021

Conversation

markusdregi
Copy link
Contributor

@markusdregi markusdregi commented May 9, 2021

Issue
Since the early days of ert3 it has remained the engines responsibility to parse and load the available parameters and their distributions. All other elements of the configuration have been loaded further up in the call-stack using Pydantic. This PR changes this so that also the parameters are loaded early, using Pydantic and later passed to the engine.

This has multiple benefits! The first is that ert3.engine no longer has any knowledge of the layout of a workspace. Due to this we are also able to remove quite some file operations in the corresponding tests.

Furthermore, this PR is necessary preparation for #1340.

Consider this PR payback ™️

Approach
The first two commits is prep work in the stats module. Afterwards, the Pydantic schema and the corresponding tests for the parameter configuration is introduced. Afterwards, a debateable decision is made in that a configured parameter group has the capability to spawn the corresponding distribution. I don't see a better place to put this responsibility as of now, but it might very well be that this is moved in the future ™️ The remaining commits besides the last one propagates the config down into the engine, all the way down to where the parameters are currently loaded, swaps the implementation such that it is loaded both from disk and config and compared, then removing the dependency on the utils before last the utils is completely removed. The very last commits just removes the necessary file toggling that is currently done in the engine tests, such that the config is only present in-memory.

This is not how I typically structure my commits, but in this particular case I belive it is relevant to display "how it was done" in a step by step transformation and not only the "polished" history for the future reader. In particular iterating between changing iterface and tests and swapping underlying implementation. As a consequence, some of these commits should be squashed before merging...

New issues to be made

  • Make record index public and enforce uniform datatype within an index (currently the index type of a record is present both in the parameter configuration, the distribution implementation and in the actual record implementation. There should be unified).
  • Introduce constant distribution (the SPE1 setup is leveraging that a uniform distribution can have lower_bound == upper_bound). I think this is a bad way of communicating the intent of constant data and hence think (as the original implementation in this PR was doing) that we should require lower_bound < upper_bound. However, that has consequences beyond the scope of this PR and hence I think it should be turned into an issue for later.

@markusdregi markusdregi self-assigned this May 9, 2021
@markusdregi markusdregi force-pushed the parameters-config branch 2 times, most recently from 8398c6b to b146236 Compare May 9, 2021 15:15
@markusdregi markusdregi marked this pull request as draft May 9, 2021 15:17
@markusdregi markusdregi force-pushed the parameters-config branch 4 times, most recently from a06f53d to 3f4ee15 Compare May 11, 2021 09:07
@markusdregi markusdregi marked this pull request as ready for review May 11, 2021 09:45
Comment on lines +36 to +41
if not all(c.isalpha() or c == "_" for c in name):
raise ValueError(
"Names are expected to only contain characters and `_`, was: {name}"
)

return name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit too early to be concerned with what characters are allowed, but I'm fine with it.

Normally I would expect the name to be whatever, but if the name was to be used in URLs or file names, then it would be slugified (https://docs.djangoproject.com/en/dev/glossary/#term-slug) by the application as an implementation detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree! However, there is already a concept of referencing variables from either storage or the workspace in ensemble.yml that uses . as delimiter. I think the format should be up for discussion, but by not allowing them to be included in the variable names we avoid some odd pitfalls in the imminent future. Slugging is not really an option when the user is referencing these variables later in the configuration file (storage.my_record).

Copy link
Contributor

@jondequinor jondequinor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, it's sane to move this stuff out of engine (if it is to live up to its name, then the fuel needs to be provided, not created inside it)—good tests! 👏

Comment on lines 76 to 100
type: Literal["gaussian", "uniform"]
input: _DistributionInput

@root_validator(pre=True)
def _ensure_gaussian_required_fields(cls, values): # type: ignore
dist_type = values["type"]
input_fields = values["input"].keys()
if dist_type == "uniform":
expected = {"lower_bound", "upper_bound"}
actual = set(input_fields)
if expected != actual:
raise AssertionError(
f"Expected distribution inputs: {expected}, was: {actual}"
)
elif values["type"] == "gaussian":
expected = {"mean", "std"}
actual = set(input_fields)
if expected != actual:
raise AssertionError(
f"Expected distribution inputs: {expected}, was: {actual}"
)
else:
raise ValueError(f"Validating unknown distribution: {dist_type}")

return values
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to do this the same way as suggested in this issue? 🤔 #1571

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably yes! I'll give it a try...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the changes now and my conclusion is that:

  • it is definitively neater within the Pydantic framework. It is probably usage as intended and that is always good 👍🏻
  • the error messages becomes completely useless when one relies on Union, as Pydantic tries to match against the different options. But, this is something we can investigate and see whether there are fixes for later...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was afraid that would be the case for the last part yes 😕 But as you say, that is something we can investigate later!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if this issue is resolved, this would be a lot simpler (and faster parsing): pydantic/pydantic#619

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is also of interest: pydantic/pydantic#503 (comment)

@markusdregi markusdregi merged commit 0e41a8a into equinor:master May 13, 2021
@markusdregi markusdregi deleted the parameters-config branch May 13, 2021 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants