fix: Switch anonymous user ID hash from md5 to shake #26198

jinder1s · 2021-01-27T17:21:50Z

Now that we always return an existing value from the DB rather than trusting that ID generation is deterministic and constant over time, we're free to change the generation algorithm.

Our long term goal is to switch to random IDs, but we need to first investigate the uses of save=False. In the meantime, this is a good opportunity to move away from MD5, which has a number of cryptographic weaknesses. None of the known vulnerabilities are considered exploitable in this location, given the limited ability to control the input to the hash, but we should generally be moving away from it everywhere for consistency.

This change should not be breaking even for save=False callers, since those calls are extremely rare (1 in 100,000) and should only occur after a save=True call, at which point they'll use the stored value. Even if this were not true, for a save=False/True pair of calls to result in a mismatch in output, the first of the calls would have to occur around the time of the deploy of this code.

BREAKING CHANGE: since function for hashing is different, it will produce different results for the same (user, course_id)

common/djangoapps/student/models.py

timmc-edx · 2021-01-27T17:38:43Z

common/djangoapps/student/models.py

@@ -200,12 +200,18 @@ def anonymous_id_for_user(user, course_id, save=True):
        monitoring.increment('temp_anon_uid_v2.fetched_existing')
    else:
        # include the secret key as a salt, and to make the ids unique across different LMS installs.
-        hasher = hashlib.md5()
+        hasher = hashlib.shake_128()


No test changes?

Incidentally, I'd call this a fix in the nomenclature of Conventional Commits.

Oh no, yep, there will be.

hmmm, I can kinda see, but still think it doesn't really work. We are not fixing a bug, we are just moving to a better algorithm.

Honestly, I'm not real thrilled by the options suggested in that spec. :-) This isn't a feature, and it's not exactly a bug fix, it's... an "improvement".

lms/djangoapps/courseware/tests/test_module_render.py

nedbat · 2021-01-27T18:01:13Z

Can you include the reason for the breaking change?

timmc-edx · 2021-01-27T18:38:24Z

Updated PR message after discussion -- we had thought to call it a breaking change to be conservative, even though that was an unlikely scenario, but now we consider it so unlikely as to not be worth flagging that way.

common/djangoapps/student/models.py

Co-authored-by: Tim McCormack <tmccormack@edx.org>

edx-status-bot · 2021-02-09T19:09:03Z

Your PR has finished running tests. There were no failures.

edx-pipeline-bot · 2021-02-10T13:33:50Z

EdX Release Notice: This PR has been deployed to the staging environment in preparation for a release to production.

edx-pipeline-bot · 2021-02-10T14:28:23Z

EdX Release Notice: This PR has been deployed to the production environment.

refactor: switching hashing algorithm for md5 to shake

63240b1

BREAKING CHANGE: since function for hashing is different, it will produce different results for the same (user, course_id)

timmc-edx reviewed Jan 27, 2021

View reviewed changes

common/djangoapps/student/models.py Outdated Show resolved Hide resolved

fix: quality

d5d30ec

timmc-edx reviewed Jan 27, 2021

View reviewed changes

jinder1s changed the title ~~refactor: switching hashing algorithm for md5 to shake~~ fix: switching hashing algorithm for md5 to shake Jan 27, 2021

fix: tests hardcoded for shake algorithm

874acdd

timmc-edx reviewed Jan 27, 2021

View reviewed changes

lms/djangoapps/courseware/tests/test_module_render.py Show resolved Hide resolved

timmc-edx changed the title ~~fix: switching hashing algorithm for md5 to shake~~ fix: Switch anonymous user ID hash from md5 to shake Jan 27, 2021

timmc-edx reviewed Jan 28, 2021

View reviewed changes

common/djangoapps/student/models.py Outdated Show resolved Hide resolved

Update common/djangoapps/student/models.py

e80c93d

Co-authored-by: Tim McCormack <tmccormack@edx.org>

feanil approved these changes Feb 2, 2021

View reviewed changes

Merge branch 'master' into msingh/refactor/hashing/md5/shake

19d1d40

jinder1s merged commit cd60646 into master Feb 10, 2021

jinder1s deleted the msingh/refactor/hashing/md5/shake branch February 10, 2021 12:37

kaustavb12 mentioned this pull request Aug 8, 2022

feat: add feature flag to enable legacy md5 hash for anonymous user id #30832

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Switch anonymous user ID hash from md5 to shake #26198

fix: Switch anonymous user ID hash from md5 to shake #26198

jinder1s commented Jan 27, 2021 •

edited by timmc-edx

timmc-edx Jan 27, 2021

jinder1s Jan 27, 2021

jinder1s Jan 27, 2021

timmc-edx Jan 27, 2021

nedbat commented Jan 27, 2021

timmc-edx commented Jan 27, 2021

edx-status-bot commented Feb 9, 2021

edx-pipeline-bot commented Feb 10, 2021

edx-pipeline-bot commented Feb 10, 2021

fix: Switch anonymous user ID hash from md5 to shake #26198

fix: Switch anonymous user ID hash from md5 to shake #26198

Conversation

jinder1s commented Jan 27, 2021 • edited by timmc-edx

timmc-edx Jan 27, 2021

Choose a reason for hiding this comment

jinder1s Jan 27, 2021

Choose a reason for hiding this comment

jinder1s Jan 27, 2021

Choose a reason for hiding this comment

timmc-edx Jan 27, 2021

Choose a reason for hiding this comment

nedbat commented Jan 27, 2021

timmc-edx commented Jan 27, 2021

edx-status-bot commented Feb 9, 2021

edx-pipeline-bot commented Feb 10, 2021

edx-pipeline-bot commented Feb 10, 2021

jinder1s commented Jan 27, 2021 •

edited by timmc-edx