Prototype for a shared memory mask container #1005

uellue · 2021-03-26T13:44:57Z

This prototype demonstrates the core functionality of a shared memory MaskContainer.
At this stage, it just implements a method to share computed_masks.

This is not included in MaskContainer yet, but as stand-alone testing code with many tracing
print()s. Sharing tile slices should work analogously to computed_masks.

The name to address objects is a function of mask_factories
Determine likely availability of pre-computed masks through a canary object
Fallback to local masks after timeout -- to be adjusted
Supports both dense and sparse computed_masks
Supports additional metadata as JSON

Lessons learned:

Life time management is a bit tricky. If the reference count for the buffer goes to zero,
the shared memory can be purged immediately, apparently. Balancing several interdependent objects
requires making sure that objects will most likely stay alive, and having a fallback
in case a race condition occurs. In this prototype, a separate process is started whose only
job is to keep references around so that objects will not be purged between processing
"partitions" (run_for_partition() emulated with multiprocessing).

A race condition between the "keep-alive" process and the object creation can occur.
join()ing the queue after data has been stored makes sure that keys are safe and secure
before proceeding and having variables fall out of scope or worker processes terminating.

The only IPC/orchestration in this example is a queue to keep track of the buffer names so
that a reference can be kept. Other than that, this is self-organizing and seems to handle a
"thundering herd" quite well: One process is first and creates the canary, the others wait
for it to finish the remaining items and then pick them up. Fr a real MaskContainer this
could be improved by making different workers request different tiles from the MaskContainer
first, so that the tile slice calculation is parallelized.

Contributor Checklist:

I have added or updated my entry in the creators.json file
I have added a changelog entry for my contribution
I have added/updated documentation for all user-facing changes
I have added/updated test cases
I have included the rebuilt production build of the client (only if
changes were made to the GUI)

Reviewer Checklist:

/azp run libertem.libertem-data passed

This prototype demonstrates the core functionality of a shared memory MaskContainer. At this stage, it just implements a method to share computed_masks. This is not included in MaskContainer yet, but as stand-alone testing code with many tracing `print()`s. Sharing tile slices should work analogously to computed_masks. * The name to address objects is a function of mask_factories * Determine likely availability of pre-computed masks through a canary object * Fallback to local masks after timeout -- to be adjusted * Supports both dense and sparse computed_masks * Supports additional metadata as JSON Lessons learned: Life time management is a bit tricky. If the reference count for the buffer goes to zero, the shared memory can be purged immediately, apparently. Balancing several interdependent objects requires making sure that objects will most likely stay alive, and having a fallback in case a race condition occurs. In this case, a separate process is started whose only job is to keep references around so that objects will not be purged between processing "partitions" (run_for_partition() emulated with multiprocessing). A race condition between the "keep-alive" process and the object creation can occur. `join()`ing the queue after data has been stored makes sure that keys are safe and secure before proceeding and having variables fall out of scope or worker processes terminating. The only IPC/orchestration in this example is a queue to keep track of the buffer names so that a reference can be kept. Other than that, this is self-organizing and seems to handle a "thundering herd" quite well: One process is first and creates the canary, the others wait for it to finish the remaining items and then pick them up. Fr a real MaskContainer this could be improved by making different workers request different tiles from the MaskContainer first, so that the tile slice calculation is parallelized.

uellue · 2021-03-26T13:45:39Z

Refs #335

codecov · 2021-03-26T13:48:34Z

Codecov Report

Merging #1005 (aa26840) into master (6ebfe4c) will decrease coverage by 7.22%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1005      +/-   ##
==========================================
- Coverage   69.02%   61.79%   -7.23%     
==========================================
  Files         262      262              
  Lines       12063    12063              
  Branches     1655     1655              
==========================================
- Hits         8326     7454     -872     
- Misses       3417     4329     +912     
+ Partials      320      280      -40

Impacted Files	Coverage Δ
src/libertem/io/dataset/dm.py	`27.86% <0.00%> (-61.48%)`	⬇️
src/libertem/io/dataset/mrc.py	`38.46% <0.00%> (-48.08%)`	⬇️
src/libertem/io/dataset/ser.py	`33.12% <0.00%> (-47.86%)`	⬇️
src/libertem/io/dataset/frms6.py	`26.21% <0.00%> (-46.16%)`	⬇️
src/libertem/io/dataset/empad.py	`39.55% <0.00%> (-44.78%)`	⬇️
src/libertem/io/dataset/k2is.py	`32.55% <0.00%> (-43.76%)`	⬇️
src/libertem/io/dataset/mib.py	`24.10% <0.00%> (-41.10%)`	⬇️
src/libertem/io/dataset/seq.py	`45.73% <0.00%> (-29.27%)`	⬇️
src/libertem/io/dataset/blo.py	`54.05% <0.00%> (-28.83%)`	⬇️
src/libertem/utils/async_utils.py	`65.00% <0.00%> (-25.00%)`	⬇️
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6ebfe4c...aa26840. Read the comment docs.

uellue · 2021-03-26T13:48:39Z

@sk1p Do you have your "plasma" code somewhere as a prototype? I've approached the issue bottom-up here, starting with the requirements of MaskContainer and how to store and create the necessary data in byte buffers. This uses Python shared memory for now, which works on Windows. The API seems to be reasonably similar to Plasma, so one can probably support both.

sk1p · 2021-03-26T13:53:23Z

@sk1p Do you have your "plasma" code somewhere as a prototype? I've approached the issue bottom-up here, starting with the requirements of MaskContainer and how to store and create the necessary data in byte buffers. This uses Python shared memory for now, which works on Windows. The API seems to be reasonably similar to Plasma, so one can probably support both.

See #1006

uellue · 2021-03-26T13:56:50Z

@sk1p Do you have your "plasma" code somewhere as a prototype? I've approached the issue bottom-up here, starting with the requirements of MaskContainer and how to store and create the necessary data in byte buffers. This uses Python shared memory for now, which works on Windows. The API seems to be reasonably similar to Plasma, so one can probably support both.

See #1006

Thx! :-) This looks like it will complement each other nicely. Probably after Easter. :-)

uellue added enhancement New feature or request WIP labels Mar 26, 2021

uellue mentioned this pull request May 27, 2021

Meta issue: Advanced executor #473

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototype for a shared memory mask container #1005

Prototype for a shared memory mask container #1005

uellue commented Mar 26, 2021 •

edited

uellue commented Mar 26, 2021

codecov bot commented Mar 26, 2021 •

edited

uellue commented Mar 26, 2021

sk1p commented Mar 26, 2021

uellue commented Mar 26, 2021

Prototype for a shared memory mask container #1005

Are you sure you want to change the base?

Prototype for a shared memory mask container #1005

Conversation

uellue commented Mar 26, 2021 • edited

Contributor Checklist:

Reviewer Checklist:

uellue commented Mar 26, 2021

codecov bot commented Mar 26, 2021 • edited

Codecov Report

uellue commented Mar 26, 2021

sk1p commented Mar 26, 2021

uellue commented Mar 26, 2021

uellue commented Mar 26, 2021 •

edited

codecov bot commented Mar 26, 2021 •

edited