Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype for a shared memory mask container #1005

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

uellue
Copy link
Member

@uellue uellue commented Mar 26, 2021

This prototype demonstrates the core functionality of a shared memory MaskContainer.
At this stage, it just implements a method to share computed_masks.

This is not included in MaskContainer yet, but as stand-alone testing code with many tracing
print()s. Sharing tile slices should work analogously to computed_masks.

  • The name to address objects is a function of mask_factories
  • Determine likely availability of pre-computed masks through a canary object
  • Fallback to local masks after timeout -- to be adjusted
  • Supports both dense and sparse computed_masks
  • Supports additional metadata as JSON

Lessons learned:

Life time management is a bit tricky. If the reference count for the buffer goes to zero,
the shared memory can be purged immediately, apparently. Balancing several interdependent objects
requires making sure that objects will most likely stay alive, and having a fallback
in case a race condition occurs. In this prototype, a separate process is started whose only
job is to keep references around so that objects will not be purged between processing
"partitions" (run_for_partition() emulated with multiprocessing).

A race condition between the "keep-alive" process and the object creation can occur.
join()ing the queue after data has been stored makes sure that keys are safe and secure
before proceeding and having variables fall out of scope or worker processes terminating.

The only IPC/orchestration in this example is a queue to keep track of the buffer names so
that a reference can be kept. Other than that, this is self-organizing and seems to handle a
"thundering herd" quite well: One process is first and creates the canary, the others wait
for it to finish the remaining items and then pick them up. Fr a real MaskContainer this
could be improved by making different workers request different tiles from the MaskContainer
first, so that the tile slice calculation is parallelized.

Contributor Checklist:

Reviewer Checklist:

  • /azp run libertem.libertem-data passed

This prototype demonstrates the core functionality of a shared memory MaskContainer.
At this stage, it just implements a method to share computed_masks.

This is not included in MaskContainer yet, but as stand-alone testing code with many tracing
`print()`s. Sharing tile slices should work analogously to computed_masks.

* The name to address objects is a function of mask_factories
* Determine likely availability of pre-computed masks through a canary object
* Fallback to local masks after timeout -- to be adjusted
* Supports both dense and sparse computed_masks
* Supports additional metadata as JSON

Lessons learned:

Life time management is a bit tricky. If the reference count for the buffer goes to zero,
the shared memory can be purged immediately, apparently. Balancing several interdependent objects
requires making sure that objects will most likely stay alive, and having a fallback
in case a race condition occurs. In this case, a separate process is started whose only
job is to keep references around so that objects will not be purged between processing
"partitions" (run_for_partition() emulated with multiprocessing).

A race condition between the "keep-alive" process and the object creation can occur.
`join()`ing the queue after data has been stored makes sure that keys are safe and secure
before proceeding and having variables fall out of scope or worker processes terminating.

The only IPC/orchestration in this example is a queue to keep track of the buffer names so
that a reference can be kept. Other than that, this is self-organizing and seems to handle a
"thundering herd" quite well: One process is first and creates the canary, the others wait
for it to finish the remaining items and then pick them up. Fr a real MaskContainer this
could be improved by making different workers request different tiles from the MaskContainer
first, so that the tile slice calculation is parallelized.
@uellue uellue added enhancement New feature or request WIP labels Mar 26, 2021
@uellue
Copy link
Member Author

uellue commented Mar 26, 2021

Refs #335

@codecov
Copy link

codecov bot commented Mar 26, 2021

Codecov Report

Merging #1005 (aa26840) into master (6ebfe4c) will decrease coverage by 7.22%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1005      +/-   ##
==========================================
- Coverage   69.02%   61.79%   -7.23%     
==========================================
  Files         262      262              
  Lines       12063    12063              
  Branches     1655     1655              
==========================================
- Hits         8326     7454     -872     
- Misses       3417     4329     +912     
+ Partials      320      280      -40     
Impacted Files Coverage Δ
src/libertem/io/dataset/dm.py 27.86% <0.00%> (-61.48%) ⬇️
src/libertem/io/dataset/mrc.py 38.46% <0.00%> (-48.08%) ⬇️
src/libertem/io/dataset/ser.py 33.12% <0.00%> (-47.86%) ⬇️
src/libertem/io/dataset/frms6.py 26.21% <0.00%> (-46.16%) ⬇️
src/libertem/io/dataset/empad.py 39.55% <0.00%> (-44.78%) ⬇️
src/libertem/io/dataset/k2is.py 32.55% <0.00%> (-43.76%) ⬇️
src/libertem/io/dataset/mib.py 24.10% <0.00%> (-41.10%) ⬇️
src/libertem/io/dataset/seq.py 45.73% <0.00%> (-29.27%) ⬇️
src/libertem/io/dataset/blo.py 54.05% <0.00%> (-28.83%) ⬇️
src/libertem/utils/async_utils.py 65.00% <0.00%> (-25.00%) ⬇️
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6ebfe4c...aa26840. Read the comment docs.

@uellue
Copy link
Member Author

uellue commented Mar 26, 2021

@sk1p Do you have your "plasma" code somewhere as a prototype? I've approached the issue bottom-up here, starting with the requirements of MaskContainer and how to store and create the necessary data in byte buffers. This uses Python shared memory for now, which works on Windows. The API seems to be reasonably similar to Plasma, so one can probably support both.

@sk1p
Copy link
Member

sk1p commented Mar 26, 2021

@sk1p Do you have your "plasma" code somewhere as a prototype? I've approached the issue bottom-up here, starting with the requirements of MaskContainer and how to store and create the necessary data in byte buffers. This uses Python shared memory for now, which works on Windows. The API seems to be reasonably similar to Plasma, so one can probably support both.

See #1006

@uellue
Copy link
Member Author

uellue commented Mar 26, 2021

@sk1p Do you have your "plasma" code somewhere as a prototype? I've approached the issue bottom-up here, starting with the requirements of MaskContainer and how to store and create the necessary data in byte buffers. This uses Python shared memory for now, which works on Windows. The API seems to be reasonably similar to Plasma, so one can probably support both.

See #1006

Thx! :-) This looks like it will complement each other nicely. Probably after Easter. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants