different hashing in release_feedback and fetch_feedback #1712

ronligt · 2022-12-14T13:38:26Z

Operating system

Ubuntu 20.04.5 LTS

`nbgrader --version`

0.8.0

`jupyterhub --version` (if used with JupyterHub)

2.3.1

`jupyter notebook --version`

6.4.12

`jupyter lab --version`

3.4.6

Expected behavior

same result from notebook_hash in release_feedback.py and fetch_feedback.py

Actual behavior

different results, resulting in [FetchFeedbackApp | WARNING] Could not find feedback for ... during nbgrader fetch_feedback --debug

Steps to reproduce the behavior

As instructor do nbgrader release_feedback --debug <assignment>. Then as student do nbgradere fetch_feedback --debug <assignment>.

Explanation

We are using nbgrader with jupyterhub and jupyter lab for one of our courses much to our liking. For several months all was working well but very recently we stumbled on the following incident. Two students (of 20) are not able to fetch the feedback of one particular assignment.

After some debugging I found that the hash in release_feedback.py was generated with different values than in fetch_feedback.py:

nbgrader release_feedback --debug 'Assignment 3 Wavefront sensing

[ReleaseFeedbackApp | DEBUG] nbfile is: /home/grader-ap3122/ap3122/submitted/<username>/Assignment 3 Wavefront sensing/Assignment Wavefront aberrations.ipynb
[ReleaseFeedbackApp | DEBUG] Unique key is: ap3122+Assignment 3 Wavefront sensing+Assignment Wavefront aberrations+<username>+2022-11-11 15:17:36.574853 UTC
[ReleaseFeedbackApp | DEBUG] checksum is: f9990f2f5f029af5091cbc417c5a591e
[ReleaseFeedbackApp | DEBUG] dest is: /usr/local/share/nbgrader/exchange/ap3122/feedback/f9990f2f5f029af5091cbc417c5a591e.html

nbgrader fetch_feedback --debug 'Assignment 3 Wavefront sensing

[FetchFeedbackApp | DEBUG] notebook is: /data/jupyterhub/home/<username>/.local/share/jupyter/nbgrader_cache/ap3122/<username>+Assignment 3 Wavefront sensing+2022-11-11 15:17:36.574853 UTC/Assignment Wavefront aberrations.ipynb
[FetchFeedbackApp | DEBUG] Unique key is: ap3122+Assignment 3 Wavefront sensing+Assignment Wavefront aberrations+<username>+2022-11-11 15:17:36.574853 UTC
[FetchFeedbackApp | DEBUG] nb_hash is: 209d0cfb1a1a05ba8d64273f2fd04271
[FetchFeedbackApp | DEBUG] feedbackpath is: /usr/local/share/nbgrader/exchange/ap3122/feedback/209d0cfb1a1a05ba8d64273f2fd04271.html

the student name has been replaced with <username>

I've added some debugging code to the two python-scripts logging the values of the appropriate variables

Clearly the nbfile and notebook have very different value resulting in different hash values. This is reflected in the different values of dest and feedbackpath which should be the same.

The text was updated successfully, but these errors were encountered:

perllaghu · 2022-12-14T14:12:20Z

Are the two users on the same machine, and are they using the same versions of libraries?

pip list will generate a list of libraries & their versions.

ronligt · 2022-12-14T14:32:32Z

All students are using our jupyterhub set up with nbgrader (https://nbgrader.readthedocs.io/en/stable/configuration/jupyterhub_config.html#example-use-case-one-class-multiple-graders). So in a way they're on the same machine using the same versions of libraries. As far as I can tell they have not installed own pip packages (pip install --user).

It is strange that these two users did not have this problem with earlier assignments and were able to fetch the feedback. Also it is strange that this problem does not occur with other users with the same assignment.

perllaghu · 2022-12-14T14:45:35Z

Yeah.... we have 1,000's of users [though we're running docker images, so there's much less chance of variation], and have been running for several years now.

It's interesting that it's only some students, and only for one assignment - which implies it's not a general problem.

The code that creates the checksum is here: https://github.com/jupyter/nbgrader/blob/main/nbgrader/exchange/default/release_feedback.py#L78 (in the master branch...) which is actually

nbgrader/nbgrader/utils.py

Lines 554 to 559 in eb1c19a

    
           def notebook_hash(path, unique_key=None): 
        
               m = hashlib.md5() 
        
               m.update(open(path, 'rb').read()) 
        
               if unique_key: 
        
                   m.update(to_bytes(unique_key)) 
        
               return m.hexdigest()

.... so it's an MD5 hashlib value - hashlib is a core python library, so should be consistent - for a given version of python :)

ronligt · 2022-12-14T14:55:46Z

Thank you very much for your help @perllaghu !

I also agree that hashlib is consistent. I think the problem lies in the input of notebook_hash:

for release_feedback.py this is

checksum = notebook_hash(nbfile, unique_key)

for fetch_feedback.py this is

nb_hash = notebook_hash(notebook, unique_key)

I've checked the values for both python-script and the unique_key is the same, but nbfile and notebook have different values:

nbfile: /home/grader-ap3122/ap3122/submitted/<username>/Assignment 3 Wavefront sensing/Assignment Wavefront aberrations.ipynb
notebook: /data/jupyterhub/home/<username>/.local/share/jupyter/nbgrader_cache/ap3122/<username>+Assignment 3 Wavefront sensing+2022-11-11 15:17:36.574853 UTC/Assignment Wavefront aberrations.ipynb

therefor the notebook_hash return values are different. Hence the fetch_feedback.py script cannot find the right feedback file with the filename generated in the release_feedback.py script.

perllaghu · 2022-12-14T15:34:52Z

Ah... my apologies - I miss-read the original.

Let me have a look-see - I would be surprised if this was not just two different names for the same string... but I'll check

perllaghu · 2022-12-15T09:32:24Z

OK... I can figure why they are different

nbfile is the path to the file the instructor collected (they can't "see" the student directory)
notebook is the path to the file the student submitted (they can't "see" the instructor directory)

The reason I've not seen any behaviour like this is I use an external exchange, and the plugins don't use hashes.

I think we [collectively] need to figure out if the full path is needed, or if the components for the string the hash is based on can be assembled on specific components (eg <username>/<assignment>/<.ipynb_file>).... and/or why the inbound path couldn't be used.

ronligt · 2022-12-20T12:16:28Z

Dear @perllaghu, I will have to put this issue on hold. Your last comment made me look better at the code in utils.py and I realised the hashing has nothing to do with the filenames that are passed to notebook_hash() (or at least I think so...). As it turns out it's more likely someone edited manually the submitted notebook. Hence the hashes differed. I will check further now. If needed I will close the issue.

Again thank you very much for your help!

ykazakov · 2022-12-21T16:05:57Z

@ronligt We have experienced a similar problem, which could be traced to the following situation:

students produce corrupted notebooks, e.g., by deleting or duplicating answer cells
nbgrader refuses to auto-grade such submissions, so instructors "fix" the submitted versions of the students (copied from the exchange)
the feedback is generated against the modified submission, so students could not fetch feedback because of the different hash value, which is a part of the feedback file name (students are not permitted to list files in the exchange directory, but if they "know" the file name of the feedback, they can copy it)

LukasMueller187 · 2023-05-23T12:13:01Z

We are also running in this problem several times. With our latest assignment we did introduce a canvas that will break the feedback fetching completly due to file changes during autograding.

Solving this issue like @tuncbkose would be appreciated. However I do think that brute-forceing the timestamp to get a foreign student's feedback seems unlikely, so even the submission_secret variant of him is maybe overengineered (due to the context nbgrader will be used in).

tuncbkose linked a pull request Apr 7, 2023 that will close this issue

Make hashing independent of notebook content #1774

Open

brichet added the bug label Mar 21, 2024

brichet mentioned this issue Apr 9, 2024

student not able to fetch feedback #1868

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

different hashing in release_feedback and fetch_feedback #1712

different hashing in release_feedback and fetch_feedback #1712

ronligt commented Dec 14, 2022

perllaghu commented Dec 14, 2022

ronligt commented Dec 14, 2022

perllaghu commented Dec 14, 2022 •

edited

ronligt commented Dec 14, 2022 •

edited

perllaghu commented Dec 14, 2022

perllaghu commented Dec 15, 2022

ronligt commented Dec 20, 2022

ykazakov commented Dec 21, 2022

LukasMueller187 commented May 23, 2023

different hashing in release_feedback and fetch_feedback #1712

different hashing in release_feedback and fetch_feedback #1712

Comments

ronligt commented Dec 14, 2022

Operating system

nbgrader --version

jupyterhub --version (if used with JupyterHub)

jupyter notebook --version

jupyter lab --version

Expected behavior

Actual behavior

Steps to reproduce the behavior

Explanation

perllaghu commented Dec 14, 2022

ronligt commented Dec 14, 2022

perllaghu commented Dec 14, 2022 • edited

ronligt commented Dec 14, 2022 • edited

perllaghu commented Dec 14, 2022

perllaghu commented Dec 15, 2022

ronligt commented Dec 20, 2022

ykazakov commented Dec 21, 2022

LukasMueller187 commented May 23, 2023

`nbgrader --version`

`jupyterhub --version` (if used with JupyterHub)

`jupyter notebook --version`

`jupyter lab --version`

perllaghu commented Dec 14, 2022 •

edited

ronligt commented Dec 14, 2022 •

edited