Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

different hashing in release_feedback and fetch_feedback #1712

Open
ronligt opened this issue Dec 14, 2022 · 9 comments · May be fixed by #1774
Open

different hashing in release_feedback and fetch_feedback #1712

ronligt opened this issue Dec 14, 2022 · 9 comments · May be fixed by #1774
Labels

Comments

@ronligt
Copy link

ronligt commented Dec 14, 2022

Operating system

Ubuntu 20.04.5 LTS

nbgrader --version

0.8.0

jupyterhub --version (if used with JupyterHub)

2.3.1

jupyter notebook --version

6.4.12

jupyter lab --version

3.4.6

Expected behavior

same result from notebook_hash in release_feedback.py and fetch_feedback.py

Actual behavior

different results, resulting in [FetchFeedbackApp | WARNING] Could not find feedback for ... during nbgrader fetch_feedback --debug

Steps to reproduce the behavior

As instructor do nbgrader release_feedback --debug <assignment>. Then as student do nbgradere fetch_feedback --debug <assignment>.

Explanation

We are using nbgrader with jupyterhub and jupyter lab for one of our courses much to our liking. For several months all was working well but very recently we stumbled on the following incident. Two students (of 20) are not able to fetch the feedback of one particular assignment.

After some debugging I found that the hash in release_feedback.py was generated with different values than in fetch_feedback.py:

nbgrader release_feedback --debug 'Assignment 3 Wavefront sensing

[ReleaseFeedbackApp | DEBUG] nbfile is: /home/grader-ap3122/ap3122/submitted/<username>/Assignment 3 Wavefront sensing/Assignment Wavefront aberrations.ipynb
[ReleaseFeedbackApp | DEBUG] Unique key is: ap3122+Assignment 3 Wavefront sensing+Assignment Wavefront aberrations+<username>+2022-11-11 15:17:36.574853 UTC
[ReleaseFeedbackApp | DEBUG] checksum is: f9990f2f5f029af5091cbc417c5a591e
[ReleaseFeedbackApp | DEBUG] dest is: /usr/local/share/nbgrader/exchange/ap3122/feedback/f9990f2f5f029af5091cbc417c5a591e.html

nbgrader fetch_feedback --debug 'Assignment 3 Wavefront sensing

[FetchFeedbackApp | DEBUG] notebook is: /data/jupyterhub/home/<username>/.local/share/jupyter/nbgrader_cache/ap3122/<username>+Assignment 3 Wavefront sensing+2022-11-11 15:17:36.574853 UTC/Assignment Wavefront aberrations.ipynb
[FetchFeedbackApp | DEBUG] Unique key is: ap3122+Assignment 3 Wavefront sensing+Assignment Wavefront aberrations+<username>+2022-11-11 15:17:36.574853 UTC
[FetchFeedbackApp | DEBUG] nb_hash is: 209d0cfb1a1a05ba8d64273f2fd04271
[FetchFeedbackApp | DEBUG] feedbackpath is: /usr/local/share/nbgrader/exchange/ap3122/feedback/209d0cfb1a1a05ba8d64273f2fd04271.html

the student name has been replaced with <username>

I've added some debugging code to the two python-scripts logging the values of the appropriate variables

Clearly the nbfile and notebook have very different value resulting in different hash values. This is reflected in the different values of dest and feedbackpath which should be the same.

@perllaghu
Copy link
Contributor

Are the two users on the same machine, and are they using the same versions of libraries?

pip list will generate a list of libraries & their versions.

@ronligt
Copy link
Author

ronligt commented Dec 14, 2022

All students are using our jupyterhub set up with nbgrader (https://nbgrader.readthedocs.io/en/stable/configuration/jupyterhub_config.html#example-use-case-one-class-multiple-graders). So in a way they're on the same machine using the same versions of libraries. As far as I can tell they have not installed own pip packages (pip install --user).

It is strange that these two users did not have this problem with earlier assignments and were able to fetch the feedback. Also it is strange that this problem does not occur with other users with the same assignment.

@perllaghu
Copy link
Contributor

perllaghu commented Dec 14, 2022

Yeah.... we have 1,000's of users [though we're running docker images, so there's much less chance of variation], and have been running for several years now.

It's interesting that it's only some students, and only for one assignment - which implies it's not a general problem.

The code that creates the checksum is here: https://github.com/jupyter/nbgrader/blob/main/nbgrader/exchange/default/release_feedback.py#L78 (in the master branch...) which is actually

nbgrader/nbgrader/utils.py

Lines 554 to 559 in eb1c19a

def notebook_hash(path, unique_key=None):
m = hashlib.md5()
m.update(open(path, 'rb').read())
if unique_key:
m.update(to_bytes(unique_key))
return m.hexdigest()

.... so it's an MD5 hashlib value - hashlib is a core python library, so should be consistent - for a given version of python :)

@ronligt
Copy link
Author

ronligt commented Dec 14, 2022

Thank you very much for your help @perllaghu !

I also agree that hashlib is consistent. I think the problem lies in the input of notebook_hash:

  • for release_feedback.py this is

    checksum = notebook_hash(nbfile, unique_key)
  • for fetch_feedback.py this is

    nb_hash = notebook_hash(notebook, unique_key)

I've checked the values for both python-script and the unique_key is the same, but nbfile and notebook have different values:

  • nbfile: /home/grader-ap3122/ap3122/submitted/<username>/Assignment 3 Wavefront sensing/Assignment Wavefront aberrations.ipynb
  • notebook: /data/jupyterhub/home/<username>/.local/share/jupyter/nbgrader_cache/ap3122/<username>+Assignment 3 Wavefront sensing+2022-11-11 15:17:36.574853 UTC/Assignment Wavefront aberrations.ipynb

therefor the notebook_hash return values are different. Hence the fetch_feedback.py script cannot find the right feedback file with the filename generated in the release_feedback.py script.

@perllaghu
Copy link
Contributor

Ah... my apologies - I miss-read the original.

Let me have a look-see - I would be surprised if this was not just two different names for the same string... but I'll check

@perllaghu
Copy link
Contributor

OK... I can figure why they are different

  • nbfile is the path to the file the instructor collected (they can't "see" the student directory)
  • notebook is the path to the file the student submitted (they can't "see" the instructor directory)

The reason I've not seen any behaviour like this is I use an external exchange, and the plugins don't use hashes.

I think we [collectively] need to figure out if the full path is needed, or if the components for the string the hash is based on can be assembled on specific components (eg <username>/<assignment>/<.ipynb_file>).... and/or why the inbound path couldn't be used.

@ronligt
Copy link
Author

ronligt commented Dec 20, 2022

Dear @perllaghu, I will have to put this issue on hold. Your last comment made me look better at the code in utils.py and I realised the hashing has nothing to do with the filenames that are passed to notebook_hash() (or at least I think so...). As it turns out it's more likely someone edited manually the submitted notebook. Hence the hashes differed. I will check further now. If needed I will close the issue.

Again thank you very much for your help!

@ykazakov
Copy link
Contributor

@ronligt We have experienced a similar problem, which could be traced to the following situation:

  • students produce corrupted notebooks, e.g., by deleting or duplicating answer cells
  • nbgrader refuses to auto-grade such submissions, so instructors "fix" the submitted versions of the students (copied from the exchange)
  • the feedback is generated against the modified submission, so students could not fetch feedback because of the different hash value, which is a part of the feedback file name (students are not permitted to list files in the exchange directory, but if they "know" the file name of the feedback, they can copy it)

@tuncbkose tuncbkose linked a pull request Apr 7, 2023 that will close this issue
@LukasMueller187
Copy link

We are also running in this problem several times. With our latest assignment we did introduce a canvas that will break the feedback fetching completly due to file changes during autograding.

Solving this issue like @tuncbkose would be appreciated. However I do think that brute-forceing the timestamp to get a foreign student's feedback seems unlikely, so even the submission_secret variant of him is maybe overengineered (due to the context nbgrader will be used in).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants