QC reports require locking `index.html` which doesn't function correctly on NFS-mounted drives #4423

joshuacwnewton · 2024-04-05T15:20:54Z

Background

Our QC reports contain a table of QC entries:

Each entry in the table is stored as simple JSON data listing the subject, date, CLI script, etc. We generate individual .json files for each entry via our qc.py module.

However, when it comes to loading the data into a table in index.html, we store this JSON data directly within a JavaScript variable inside the index.html file. This means that every time we create a new QC entry using our Python script, we need to completely rewrite the index.html file, substituting the JSON data into a placeholder in the HTML template:

spinalcordtoolbox/spinalcordtoolbox/reports/qc.py

Line 613 in 1af24e3

output = template.substitute(sct_json_data=json.dumps(json_data))

spinalcordtoolbox/spinalcordtoolbox/reports/assets/index.html

Line 159 in 1af24e3

Problem

Because all new QC entries have the same destination file, concurrent writes can cause issues with corrupted/missing entries. So, we explicitly lock the index.html before writing to it.

However, in sct-pipeline/spine-park#15 (comment), we ran into some system errors when trying to lock and write to the index.html file while it was on a mounted drive.

Proposed solution

Initially, I can think of two ways we could fix this issue:

Lock and write to the index.html file in a tmpdir on a non-mounted drive, then copy the file to the destination directory when sct_run_batch is complete.
- This may make the QC report folder difficult to access during processing, as pointed out by @jcohenadad in this comment.
~~Try to eliminate the need for locking entirely.~~ EDIT: See this comment for details on why loading the JSON data using client-side JavaScript probably isn't feasible.

The text was updated successfully, but these errors were encountered:

… a newer version of SCT (#4427) When we generate QC reports, we universally update `index.html` with the copy stored in the repo, due to the behavior described in #4423. This can cause problems when, for example, `index.html` and `main.js` are both updated simultaneously between SCT versions (e.g. #4317). In such a case, if QC reports are regenerated using the newer SCT version, `index.html` will be updated with the changes, but `main.js` will not. And, since `index.html` and `main.js` are so interdependent, this can often cause broken behavior in the QC report. So, this PR ensures that the assets are updated alongside `index.html`, but only if the contents differ.

joshuacwnewton · 2024-04-10T19:07:57Z

Ah! It may be that we're dealing with an NFS mount in this situation, in which case wolph/portalocker#92 may apply here:

Locking with NFS is rather problematic, while it's not entirely impossible to do it, it absolutely kills your performance in my experience.
If at all possible I would highly recommend using Redis Locks instead: https://github.com/wolph/portalocker#redis-locks

I'll check in the other issue regarding this. :)

EDIT: They were indeed using NFS mounts!

joshuacwnewton · 2024-04-12T16:49:46Z

Assuming the issue here is NFS mounts, then maybe all we need to do is avoid locking on the NFS-mounted drive.

Actually, wait a minute here...

spinalcordtoolbox/spinalcordtoolbox/reports/qc.py

Lines 576 to 579 in a7e538c

    
           # We lock `index.html` at the start of this function so that we halt any other processes *before* they have a 
        
           # chance to generate or read any .json files. This ensues that the last process to write to index.html has read 
        
           # in all of the available .json files, preventing: 
        
           # https://github.com/spinalcordtoolbox/spinalcordtoolbox/pull/3701#discussion_r816300380

There are two distinct steps here:

Generating a new .json entry in the QC report's _json folder
Reading in all of the .json files on disk, dumping all of it into the index.html template, and writing the data to a file.

But, all that really matters is that the "reading" step is done sequentially, right? (In other words, generating each individual JSON file concurrently is OK, as long as each process waits to read in all of the JSON files to generate the new index.html file.)

In that case, do we even need to write-lock index.html? Couldn't we just use portalocker.BoundedSemaphore as a mutex?

This is very similar to threading.BoundedSemaphore but works across multiple processes and across multiple operating systems.

joshuacwnewton · 2024-04-15T15:06:41Z

I think I might have a solution that could kill 2 birds with one stone:

Reading in all of the .json files on disk, dumping all of it into the index.html template, and writing the data to a file.

What if we were to isolate this logic inside of a separate, standalone Python script (e.g. refresh_qc_entries.py)? Then, distribute it as an asset inside of the QC folder?

This would have two benefits:

Users could re-run this script when they want to refresh the QC entries, which could solve Make it possible to remove lines in a QC report #4438. (The user can delete one or more json files, then re-run the script to copy the new json data to the html file.)
When generating the QC report, we could use a mutex to only ever run one instance of refresh_qc_entries.py at a time. (This would mitigate the NFS mount issues, since the mutex isn't tied to the location of the report. Plus, the encapsulation of this separate script would ensure that we're locking the entire "read-JSON-write-HTML" cycle, without the need for open(mode='r+") tricks.)

joshuacwnewton added bug category: fixes an error in the code sct_qc context: labels Apr 5, 2024

joshuacwnewton mentioned this issue Apr 5, 2024

First pass and correct T2w segmentations sct-pipeline/spine-park#15

Open

This comment was marked as outdated.

Sign in to view

joshuacwnewton mentioned this issue Apr 9, 2024

Keep CSS/JS/etc. assets up to date when QC report is regenerated with a newer version of SCT #4427

Merged

This comment was marked as resolved.

Sign in to view

joshuacwnewton changed the title ~~QC reports require locking index.html, which can cause issues with sct_run_batch concurrent writes~~ QC reports require locking index.html which doesn't function correctly on NFS-mounted drives Apr 12, 2024

joshuacwnewton mentioned this issue Apr 12, 2024

Is this 2016 StackOverflow answer still accurate? (fh = open(p); Lock(fh) vs. with Lock(p) as fh) wolph/portalocker#94

Closed

joshuacwnewton self-assigned this Apr 15, 2024

joshuacwnewton mentioned this issue Apr 15, 2024

Isolate index.html-writing code into reusable script + replace Lock with Semaphore-based mutex #4439

Merged

joshuacwnewton added this to the 6.3 milestone Apr 16, 2024

mguaypaq closed this as completed in 21031b2 Apr 23, 2024

mguaypaq closed this as completed in #4439 Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QC reports require locking `index.html` which doesn't function correctly on NFS-mounted drives #4423

QC reports require locking `index.html` which doesn't function correctly on NFS-mounted drives #4423

joshuacwnewton commented Apr 5, 2024 •

edited

This comment was marked as outdated.

This comment was marked as resolved.

This comment was marked as resolved.

joshuacwnewton commented Apr 10, 2024 •

edited

This comment was marked as resolved.

joshuacwnewton commented Apr 12, 2024 •

edited

joshuacwnewton commented Apr 15, 2024 •

edited

QC reports require locking index.html which doesn't function correctly on NFS-mounted drives #4423

QC reports require locking index.html which doesn't function correctly on NFS-mounted drives #4423

Comments

joshuacwnewton commented Apr 5, 2024 • edited

Background

Problem

Proposed solution

This comment was marked as outdated.

This comment was marked as resolved.

This comment was marked as resolved.

joshuacwnewton commented Apr 10, 2024 • edited

This comment was marked as resolved.

joshuacwnewton commented Apr 12, 2024 • edited

joshuacwnewton commented Apr 15, 2024 • edited

QC reports require locking `index.html` which doesn't function correctly on NFS-mounted drives #4423

QC reports require locking `index.html` which doesn't function correctly on NFS-mounted drives #4423

joshuacwnewton commented Apr 5, 2024 •

edited

joshuacwnewton commented Apr 10, 2024 •

edited

joshuacwnewton commented Apr 12, 2024 •

edited

joshuacwnewton commented Apr 15, 2024 •

edited