Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mailbox async internal process communication (was intents) #3568

Merged
merged 39 commits into from Aug 17, 2022

Conversation

raubitsj
Copy link
Member

@raubitsj raubitsj commented Apr 25, 2022

WB-11186

Description

Add an async communication path utilizing mailbox delivery of responses.

Problems to solve:

  • PROGRESS: Progress info during long running operations
    Currently the user does not have any indication that an operation is long running
  • IDEMPOTENT: Need idempotent messages
    If we need to resend a message, the messages can back up on each other and the responses
    come in only for the requests that issued them even though those requestors have timed out.
    We should coalesce requests in the handler, and coalesce responses in the mailbox
  • TIMEOUTS: Dynamic timeouts based on multiple factors
    Currently we choose a timeout at request time. There are circumstances where we might want a more dynamic
    timeout depending on the conditions. For example if certain network errors are happening, we might want a longer
    timeout
  • CANCEL: would be nice to be able to cancel operations after timeouts
    If we want to timeout an operation and do something different, we will need to cancel the previous operation. For
    example if we want to try to use the network but then fall back to offline mode.

Phases:

  • Phase1: introduce mailbox, use for wandb.init()
    Solves PROGRESS (static)
    Solves IDEMPOTENT (for the reply only)
    Solves TIMEOUT (there will probably me more advanced versions)
  • Phase2: use for run.summary and run.finish(), improve messaging
    Solve: IDEMPOTENT (for the request)
    Solve: PROGRESS (more dynamic using another mailbox and a status message)
  • Phase3: get rid of communicate for all operations?
  • Phase4: support timing out hanging network operations (need to mess with retry.py)

Solution:

sequenceDiagram
  participant user
  participant mailbox
  participant router
  participant handler
  participant sender
  user ->> handler: wandb.init() -- deliver_run()
  handler ->> sender: dispatch
  sender ->> router: run_result
  router ->> mailbox: mailbox.deliver()
  user -->> mailbox: handle.wait()
  mailbox ->> user: run_result

Introduce mailbox and new deliver verb.

Deliver sends a message asynchronously to the internal process. Deliveries are marked with a mailbox slot where responses will eventually be placed.

The user can wait on message delivery in a mailbox using wait(). Normally the mailbox will be released when the message is received by the user.

Tasks:

  • Implement gRPC compatibility
  • Add some mailbox tests
  • Update PR description

@codecov
Copy link

codecov bot commented Apr 25, 2022

Codecov Report

Merging #3568 (d951b62) into master (ccd51c9) will increase coverage by 0.07%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3568      +/-   ##
==========================================
+ Coverage   82.60%   82.68%   +0.07%     
==========================================
  Files         255      256       +1     
  Lines       32350    32508     +158     
==========================================
+ Hits        26723    26879     +156     
- Misses       5627     5629       +2     
Flag Coverage Δ
functest 55.00% <98.32%> (+0.40%) ⬆️
unittest 73.42% <88.26%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
wandb/sdk/backend/backend.py 95.91% <100.00%> (+0.08%) ⬆️
wandb/sdk/interface/interface.py 96.73% <100.00%> (+0.03%) ⬆️
wandb/sdk/interface/interface_grpc.py 65.09% <100.00%> (+0.80%) ⬆️
wandb/sdk/interface/interface_queue.py 96.87% <100.00%> (+0.20%) ⬆️
wandb/sdk/interface/interface_shared.py 85.92% <100.00%> (+0.75%) ⬆️
wandb/sdk/interface/interface_sock.py 89.79% <100.00%> (+0.43%) ⬆️
wandb/sdk/interface/router.py 94.36% <100.00%> (+0.42%) ⬆️
wandb/sdk/interface/router_queue.py 100.00% <100.00%> (ø)
wandb/sdk/interface/router_sock.py 100.00% <100.00%> (ø)
wandb/sdk/internal/sender.py 90.97% <100.00%> (ø)
... and 9 more

@raubitsj raubitsj changed the title POC: Add "intents" capability PoC: Add "intents" capability May 19, 2022
@raubitsj raubitsj self-assigned this Aug 5, 2022
@raubitsj raubitsj changed the title PoC: Add "intents" capability PoC: Add "intents" capability (phase 1) Aug 10, 2022
@raubitsj raubitsj changed the title PoC: Add "intents" capability (phase 1) Add mailbox async internal process communication (was intents) Aug 12, 2022
@raubitsj raubitsj marked this pull request as ready for review August 12, 2022 15:54
@raubitsj raubitsj requested a review from a team August 12, 2022 15:54
@dmitryduev
Copy link
Member

dmitryduev commented Aug 12, 2022

@raubitsj
Copy link
Member Author

Amusing error in https://app.circleci.com/pipelines/github/wandb/wandb/14813/workflows/22f79a23-3efd-4b9e-be42-7763c87ec87a/jobs/265497 :) Mind looking into it please?

yep. that was in the TODO.. i just added a quick change to emulate async grpc handling with the mailbox:
b48652d

@raubitsj raubitsj added this to the sdk-2022-09.1 milestone Aug 13, 2022
Copy link
Contributor

@kptkin kptkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

@dmitryduev dmitryduev enabled auto-merge (squash) August 17, 2022 08:06
Copy link
Member

@dmitryduev dmitryduev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 🔥 🔥

wandb/sdk/wandb_init.py Show resolved Hide resolved
wandb/sdk/interface/router.py Show resolved Hide resolved
@dmitryduev dmitryduev merged commit 9a94062 into master Aug 17, 2022
@dmitryduev dmitryduev deleted the jhr-poc-intents-1 branch August 17, 2022 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants