-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single threaded alternative processor #773
Open
JelleAalbers
wants to merge
10
commits into
AxFoundation:master
Choose a base branch
from
JelleAalbers:single_thread
base: master
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
JelleAalbers
changed the title
Single threaded alterantive processor
Single threaded alternative processor
Nov 2, 2023
JelleAalbers
force-pushed
the
single_thread
branch
from
November 20, 2023 12:23
bc4731a
to
c40755e
Compare
JelleAalbers
force-pushed
the
single_thread
branch
from
November 20, 2023 13:39
1bfb967
to
02f599f
Compare
@JelleAalbers big thanks for the deep deep PR! We will review before next round of heavy low-level processing, for which I think this PR will mostly benefit about. |
for more information, see https://pre-commit.ci
JelleAalbers
force-pushed
the
single_thread
branch
from
January 30, 2024 14:59
02f599f
to
fbc34d2
Compare
Hey @JelleAalbers , can I resolve the conflicts and push to your forked repo? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What is the problem / what does the code in this PR do
This adds a single-threaded alternative processor backend that avoids the mailbox system and uses less memory.
Strax has a custom-built concurrency backend (the 'mailbox system'). It works, but it has problems and we hope to eventually replace it. @jmosbacher has done much work towards this; see also the discussion in #81.
A single-threaded processor won't work for the DAQ, but it could help reprocessing, analysis, and debugging:
Can you briefly describe how it works?
I started from a commit from Yossi's #410 to allow processors to be selected at runtime by the context. Currently
BaseProcessor
is a bit empty, we can see if there is more we can generalize to there.The
SingleThreadProcessor
is an alternative to theThreadedMailboxProcessor
. To keep the processor classes comparable, I put most of the mailbox-replacing logic into aPostOffice
class. There is only onePostOffice
instance per processor, it handles all the data types ('topics'), and it is much simpler than a single mailbox since it needs no threading or locking. We could eventually split this off into a separate package, maybe together with mailboxes.The trickiest part was dealing with rechunking in savers. The current code assumed it had its own thread available to independently pull on an iterator in
save_from
. In single-threaded processing we instead have to send chunks to_save_chunk
individually (as we do in multiprocessing), so then the rechunking logic is skipped. I thus factored it out into a separateRechunker
class and wrapped that around the Saver when we are single-threading. Might be useful to have the rechunking logic separate anyway.For testing, I let some tests in test_core run on both processors, and there are some asserts. Maybe you'd like to see some dedicated unit tests and docs as well, let me know.
The default processor is not changed, you have to add
processor='single_thread'
to your get_array/get_df/etc call to switch to the single-threaded processor. I would propose we test it in some reprocessing jobs first, if it works, we could then make it the default formax_workers=1
.Can you give a minimal working example (or illustrate with a figure)?
This shows the mprof output for the full processing of a tiny (~90 second) background run, starting from lz4 raw records on my laptop, with all needed online resources already downloaded. First for the current mailbox processing:
st.make('026195', 'event_basics')
and for single-threaded processing:
st.make('026195', 'event_basics', processor='single_thread')
For reviewing, note the number of lines changed is deceptively large. The stuff in
strax.processor
basically just got moved tostrax.processors.threaded_mailbox
. I keptstrax.mailbox
in place since mailboxes also used directly in the rechunker script. (Andstrax.processor
is now present only to keep an old straxen test running that imports from it directly.)