Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce RecordTransmitter #1469

Merged
merged 1 commit into from
Apr 14, 2021
Merged

Introduce RecordTransmitter #1469

merged 1 commit into from
Apr 14, 2021

Conversation

sondreso
Copy link
Collaborator

@sondreso sondreso commented Mar 3, 2021

Issue
Resolves #1334
Resolves #1447
Resolves #1328
Resolves #1502

Approach
Introduce RecordTransmitter concept, along with two concrete implementations (shared-disk and in-memory). This allows records to be used as inputs and outputs to steps in the workflow manager, while the actual transportation method is abstracted away. This allows the transmitters (references to the records) to be transported as lightweight objects over the network from the client to the compute side, while the actual data is only pulled from storage to the compute node.

Assumptions and limitations

  • No RecordTransmitter that uses the ert3 storage and/or the new^2 ert-storage yet
  • We assume the record transmitter can be cloudpickled
  • Validation in ert3 that all commands are specified as transportable commands is removed, as you can now use arbitrary UNIX commands
  • The Record as-is does not contain all information that you need to work with it. Specifically, the name and mime of the record is almost always bundled together with the record, since you need this information to interact with the record outside pure data access.
  • Currently we do: ert3 config -> ee-config -> PrefectEnsemble. We should cut the middle layer, and use the ert3 config the construct the PrefectEnsemble directly using the builders.
  • This PR makes it possible to get the result from the evaluation using the monitor. Currently, this will give you a transmitter for every output in the ensemble. This is accomplished by the ensemble sending all transmitters to the evaluator as part of the ensemble finished message, and then clients can connect to a separate endpoint to get the results after a STOPPED event is received.

@sondreso sondreso force-pushed the record-transmitting branch 3 times, most recently from df70fea to 5e40698 Compare March 5, 2021 17:35
@jondequinor
Copy link
Contributor

Terminology:

  • record key: e.g. coeffs. Represents all coeffs from a user's perspective.
  • concrete record: e.g. coeffs_7788cf9c-7dec-11eb-9439-0242ac130002, is a sampled coeff e.g. {"a": 2.4, … } but with a globally unique name

I'm talking about ert3/ee interfacing here:

  1. Parameters and other locally sampled records are at some point Record in ert3
  2. User provided records (e.g. large files/blobs in azure/storage) are also Record in ert3.
  3. All Record will get an accompanying RecordTransmitter and will be transmitted() in ert3.
  4. After transmit(), already existing azure/storage things will remain in storage, but locally sampled records will be saved in storage.
  5. A DAG is created from the record key IO mapping
  6. An accompanying io map from record key -> iens -> (RecordTransmitter) is provided
  7. A RecordTransmitter will contain configuration options (location, ???)
  8. The Prefect Evaluator creates an ensemble from DAG and io mapping
  9. In the ensemble, input/output information is indicating that it is file-based IO for UnixSteps, memory based for FunctionStep. Possibly Stream-based for some specially configured FunctionStep.

EE/Evaluation of steps:

  1. A UnixStep, for all inputs, gets some information about the IO (e.g. FileIO) and a RecordTransmitter. Usually .dump(file_io.location)
  2. ???
  3. Profit

@markusdregi
Copy link
Contributor

Sounds like a good plan! 👏 I did not understand this one:

A RecordTransmitter will contain configuration options (location, ???)

Do you mean that the RecordTransmitter will contain configuration options for where the record should be transmitted? It is probably the location that made me think this was possibly the disk location of data produced in the unix step? Because that knowledge I think should reside within the unix step directly :)

ert_shared/record.py Outdated Show resolved Hide resolved
@jondequinor
Copy link
Contributor

Do you mean that the RecordTransmitter will contain configuration options for where the record should be transmitted? It is probably the location that made me think this was possibly the disk location of data produced in the unix step? Because that knowledge I think should reside within the unix step directly :)

You're right, it will be in the ensemble and the step.

@markusdregi
Copy link
Contributor

Thanks for the clarification 👍

@sondreso sondreso force-pushed the record-transmitting branch 3 times, most recently from 88836e1 to bbd2785 Compare March 24, 2021 09:15
@verveerpj verveerpj mentioned this pull request Mar 24, 2021
@sondreso sondreso force-pushed the record-transmitting branch 2 times, most recently from daa2d7f to b82af7b Compare March 24, 2021 13:49
@sondreso
Copy link
Collaborator Author

We should also fix: #1502

@sondreso sondreso mentioned this pull request Mar 25, 2021
@sondreso sondreso force-pushed the record-transmitting branch 3 times, most recently from b45d4ec to 54ac075 Compare March 25, 2021 20:14
@sondreso sondreso marked this pull request as ready for review March 26, 2021 09:14
@sondreso sondreso changed the title Record transmitting Introduce RecordTransmitter Mar 26, 2021
@markusdregi
Copy link
Contributor

I've gone through most of this PR except the actual implementation in ert_shared. That part will have to wait until later today or tomorrow morning, but in the meantime feel free to address and resolve my comments 👍🏻

@sondreso
Copy link
Collaborator Author

https://github.com/equinor/komodo-releases/pull/1259 Needs to be merged.

@jondequinor
Copy link
Contributor

@markusdregi addressed recent batch of comments. Thanks for reviewing!

@markusdregi
Copy link
Contributor

Thanks for reviewing!

Happy to do so 👍🏻 I have no further comments... Good job all of you 🚀 I suggest you rebase, squash and do a last internal review among the authors and then merge it.

Introduce RecordTransmitter concept, along with two
concrete implementations (shared-disk and in-memory).
This allows records to be used as inputs and outputs
to steps in the workflow manager, while the actual
transportation method is abstracted away. This allows
the transmitters (references to the records) to be
transported as lightweight objects over the network
from the client to the compute side, while the actual
data is only pulled from storage to the compute node.

Co-authored-by: Dan Sava <dsav@equinor.com>
Co-authored-by: Jonas G. Drange <jond@equinor.com>
Co-authored-by: Sondre Sortland <sonso@equinor.com>
Copy link
Contributor

@jondequinor jondequinor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sondreso sondreso merged commit 1dd180f into master Apr 14, 2021
@sondreso sondreso deleted the record-transmitting branch April 14, 2021 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants