Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Switch anonymous user ID hash from md5 to shake #26198

Merged
merged 5 commits into from Feb 10, 2021

Conversation

jinder1s
Copy link
Contributor

@jinder1s jinder1s commented Jan 27, 2021

Now that we always return an existing value from the DB rather than trusting that ID generation is deterministic and constant over time, we're free to change the generation algorithm.

Our long term goal is to switch to random IDs, but we need to first investigate the uses of save=False. In the meantime, this is a good opportunity to move away from MD5, which has a number of cryptographic weaknesses. None of the known vulnerabilities are considered exploitable in this location, given the limited ability to control the input to the hash, but we should generally be moving away from it everywhere for consistency.

This change should not be breaking even for save=False callers, since those calls are extremely rare (1 in 100,000) and should only occur after a save=True call, at which point they'll use the stored value. Even if this were not true, for a save=False/True pair of calls to result in a mismatch in output, the first of the calls would have to occur around the time of the deploy of this code.

BREAKING CHANGE: since function for hashing is different, it will produce different results for the same (user, course_id)
@@ -200,12 +200,18 @@ def anonymous_id_for_user(user, course_id, save=True):
monitoring.increment('temp_anon_uid_v2.fetched_existing')
else:
# include the secret key as a salt, and to make the ids unique across different LMS installs.
hasher = hashlib.md5()
hasher = hashlib.shake_128()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No test changes?

Incidentally, I'd call this a fix in the nomenclature of Conventional Commits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, yep, there will be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, I can kinda see, but still think it doesn't really work. We are not fixing a bug, we are just moving to a better algorithm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I'm not real thrilled by the options suggested in that spec. :-) This isn't a feature, and it's not exactly a bug fix, it's... an "improvement".

@jinder1s jinder1s changed the title refactor: switching hashing algorithm for md5 to shake fix: switching hashing algorithm for md5 to shake Jan 27, 2021
@nedbat
Copy link
Contributor

nedbat commented Jan 27, 2021

Can you include the reason for the breaking change?

@timmc-edx timmc-edx changed the title fix: switching hashing algorithm for md5 to shake fix: Switch anonymous user ID hash from md5 to shake Jan 27, 2021
@timmc-edx
Copy link
Contributor

Updated PR message after discussion -- we had thought to call it a breaking change to be conservative, even though that was an unlikely scenario, but now we consider it so unlikely as to not be worth flagging that way.

Co-authored-by: Tim McCormack <tmccormack@edx.org>
@edx-status-bot
Copy link

Your PR has finished running tests. There were no failures.

@jinder1s jinder1s merged commit cd60646 into master Feb 10, 2021
@jinder1s jinder1s deleted the msingh/refactor/hashing/md5/shake branch February 10, 2021 12:37
@edx-pipeline-bot
Copy link
Contributor

EdX Release Notice: This PR has been deployed to the staging environment in preparation for a release to production.

@edx-pipeline-bot
Copy link
Contributor

EdX Release Notice: This PR has been deployed to the production environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants