backend: implement event file replacement detection #5529

nfelt · 2022-01-25T00:11:01Z

This implements an optional mode in our Python EventFileLoader that will attempt to detect and handle the case when an event file is replaced entirely with a new version of the file containing additional data (rather than the new data being appended to the existing file, the normal behavior we expect).

This situation occurs for example when using rsync as described in #349. When run without the --inplace option, rsync will transfer data to a temporary file and then swap it in for the final file, but this means that TensorBoard will only ever see the first version of that file and never shows any updates to it until it's restarted. A similar issue happens for internal users - Googlers, please see b/201113906 for context.

The new logic uses stat() to monitor the size of the file, and if it grows, we will re-open the underlying iterator for that filename, resuming at the prior read offset. Care has been taken to try to avoid adding too much filesystem I/O overhead, but it's possible that this new mode is more expensive (since it re-opens the file potentially many times), hence the reason to leave it off by default, at least for now.

This functionality requires that the underlying iterator supports re-opening, which is under development for tf.compat.v1.summary.tf_record_iterator() (Googlers, please see cl/423871645). It is not currently supported for the no-TF mode stub iterator implementation (although I suspect, but have not confirmed, that the stub implementation is not affected by the original issue since it has a simpler albeit likely less efficient implementation that just re-opens files each time it reads). Also, since this code change is currently only in our Python event loading code, it has no effect on RustBoard (our rust data server). If there's demand, however, it should be possible to port this logic.

Note that there is currently no way to activate this mode - #5530 is the followup PR with the CLI flag and plumbing.

Tested: ran TensorBoard with these changes enabled (see followup PR) and confirmed that it now picks up new data when replacing an event file entirely rather than appending to it, writes the appropriate log messages, etc.

nfelt · 2022-01-25T00:29:53Z

FYI @bileschi - the CI test failures are expected until tf-nightly contains the changes to tf_record_iterator (see cl/423871645), but I thought I'd send it now anyway in case you have questions about the approach. I can re-ping when CI is fully passing.

bileschi

Very nice tests

backend: implement event file replacement detection

b3965bc

nfelt added the core:backend label Jan 25, 2022

nfelt marked this pull request as draft January 25, 2022 00:11

nfelt mentioned this pull request Jan 25, 2022

core: add --detect_file_replacement flag and plumbing #5530

Merged

nfelt requested a review from bileschi January 25, 2022 00:27

nfelt marked this pull request as ready for review January 25, 2022 00:27

bileschi approved these changes Jan 25, 2022

View reviewed changes

nfelt merged commit fe572f6 into master Jan 31, 2022

nfelt deleted the nfelt-filereplace-3 branch January 31, 2022 23:47

nfelt mentioned this pull request Feb 1, 2022

core: add --detect_file_replacement flag and plumbing (redo) #5546

Merged

nfelt mentioned this pull request Mar 4, 2022

Fails to show new data when event files are replaced (e.g. w/ rsync) #349

Open

yatbear pushed a commit to yatbear/tensorboard that referenced this pull request Mar 27, 2023

backend: implement event file replacement detection (tensorflow#5529)

05935bd

dna2github pushed a commit to dna2fork/tensorboard that referenced this pull request May 1, 2023

backend: implement event file replacement detection (tensorflow#5529)

a15d045

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

backend: implement event file replacement detection #5529

backend: implement event file replacement detection #5529

nfelt commented Jan 25, 2022 •

edited

Loading

Uh oh!

nfelt commented Jan 25, 2022

Uh oh!

bileschi left a comment

Uh oh!

backend: implement event file replacement detection #5529

backend: implement event file replacement detection #5529

Conversation

nfelt commented Jan 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nfelt commented Jan 25, 2022

Uh oh!

bileschi left a comment

Choose a reason for hiding this comment

Uh oh!

nfelt commented Jan 25, 2022 •

edited

Loading