Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Switch anonymous user ID hash from md5 to shake #26198

Merged
merged 5 commits into from
Feb 10, 2021
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
12 changes: 10 additions & 2 deletions common/djangoapps/student/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,12 +200,20 @@ def anonymous_id_for_user(user, course_id, save=True):
monitoring.increment('temp_anon_uid_v2.fetched_existing')
else:
# include the secret key as a salt, and to make the ids unique across different LMS installs.
hasher = hashlib.md5()
hasher = hashlib.shake_128()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No test changes?

Incidentally, I'd call this a fix in the nomenclature of Conventional Commits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, yep, there will be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, I can kinda see, but still think it doesn't really work. We are not fixing a bug, we are just moving to a better algorithm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I'm not real thrilled by the options suggested in that spec. :-) This isn't a feature, and it's not exactly a bug fix, it's... an "improvement".

# This is one of several uses of SECRET_KEY.
#
# Impact of exposure: If a person has the SECRET_KEY and a user's `id`
# they can correlate the users anonymous user IDs across any courses they have participated in.
#
# Rotation process: Can be rotated at will. There is a small chance (on the order of 1%) that
# any given new user will be assigned multiple anonymous user IDs during the period in which
# servers are configured with a mix of old and new keys.
hasher.update(settings.SECRET_KEY.encode('utf8'))
hasher.update(text_type(user.id).encode('utf8'))
if course_id:
hasher.update(text_type(course_id).encode('utf-8'))
anonymous_user_id = hasher.hexdigest()
anonymous_user_id = hasher.hexdigest(16) # pylint: disable=too-many-function-args

if save is True:
try:
Expand Down
6 changes: 3 additions & 3 deletions lms/djangoapps/courseware/tests/test_module_render.py
Original file line number Diff line number Diff line change
Expand Up @@ -1991,7 +1991,7 @@ def test_per_student_anonymized_id(self, descriptor_class):
self.assertEqual(
# This value is set by observation, so that later changes to the student
# id computation don't break old data
'5afe5d9bb03796557ee2614f5c9611fb',
'de619ab51c7f4e9c7216b4644c24f3b5',
jinder1s marked this conversation as resolved.
Show resolved Hide resolved
self._get_anonymous_id(CourseKey.from_string(course_id), descriptor_class)
)

Expand All @@ -2000,14 +2000,14 @@ def test_per_course_anonymized_id(self, descriptor_class):
self.assertEqual(
# This value is set by observation, so that later changes to the student
# id computation don't break old data
'e3b0b940318df9c14be59acb08e78af5',
'0c706d119cad686d28067412b9178454',
self._get_anonymous_id(CourseKey.from_string('MITx/6.00x/2012_Fall'), descriptor_class)
)

self.assertEqual(
# This value is set by observation, so that later changes to the student
# id computation don't break old data
'f82b5416c9f54b5ce33989511bb5ef2e',
'e9969c28c12c8efa6e987d6dbeedeb0b',
self._get_anonymous_id(CourseKey.from_string('MITx/6.00x/2013_Spring'), descriptor_class)
)

Expand Down