Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(telemetry): Unique User IDs in kedro-telemetry - merge only for kedro-telemetry release 0.4.0 #596

Merged
merged 18 commits into from Mar 18, 2024

Conversation

DimedS
Copy link
Contributor

@DimedS DimedS commented Mar 5, 2024

Description

This PR introduces a unique identifier (UUID) for each user of the kedro-telemetry.

  • Instead of relying on hashed usernames, we now generate a unique UUID for each user. This UUID is generated using uuid.uuid4() function.
  • Generated UUID is stored in a user-specific configuration file, leveraging the appdirs library to determine the appropriate OS-specific path for this file (~/Library/Application Support/kedro/telemetry.conf on MAC)
  • If a UUID does not already exist, the function generates a new one, stores it, and then uses this UUID in subsequent telemetry events.

Checklist

  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the relevant RELEASE.md file
  • Added tests to cover my changes

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>
@DimedS DimedS linked an issue Mar 5, 2024 that may be closed by this pull request
@DimedS DimedS requested a review from lrcouto March 5, 2024 13:04
Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>
@DimedS DimedS changed the title Unique User IDs in kedro-telemetry fix: Unique User IDs in kedro-telemetry Mar 5, 2024
@DimedS DimedS changed the title fix: Unique User IDs in kedro-telemetry telemetry: Unique User IDs in kedro-telemetry Mar 5, 2024
@DimedS DimedS changed the title telemetry: Unique User IDs in kedro-telemetry feat(telemetry): Unique User IDs in kedro-telemetry Mar 5, 2024
Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>
DimedS and others added 4 commits March 5, 2024 17:33
Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>
Signed-off-by: lrcouto <laurarccouto@gmail.com>
Signed-off-by: lrcouto <laurarccouto@gmail.com>
Signed-off-by: lrcouto <laurarccouto@gmail.com>
@lrcouto lrcouto marked this pull request as ready for review March 6, 2024 16:28
Copy link
Contributor

@lrcouto lrcouto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some small changes to the structure of the function to leave only the part that actually handles file operations inside the try block.

Signed-off-by: lrcouto <laurarccouto@gmail.com>
@@ -15,6 +16,7 @@
import requests
import toml
import yaml
from appdirs import user_config_dir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a new dependency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a new dependency?

yes, we added it to the dependencies list

@@ -41,6 +43,7 @@
"BUILDKITE", # https://buildkite.com/docs/pipelines/environment-variables
}
TIMESTAMP_FORMAT = "%Y-%m-%dT%H:%M:%S.%fZ"
CONFIG_FILENAME = "telemetry.conf"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it expected to be commited into the repo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it expected to be commited into the repo?

You mean this file? It will be stored in the user's configuration directory, not in project directory, so looks like it will not be possible to commit it.

Comment on lines 55 to 86
def _get_or_create_uuid():
"""
Reads a UUID from a configuration file or generates and saves a new one if not present.
"""
config_path = user_config_dir("kedro")
full_path = os.path.join(config_path, CONFIG_FILENAME)
config = ConfigParser()

try:
username = getpass.getuser()
return _hash(username)
except Exception as exc:
logger.warning(
"Something went wrong with getting the username. Exception: %s",
exc,
)
if os.path.exists(full_path):
config.read(full_path)

if config.has_section("telemetry") and "uuid" in config["telemetry"]:
try:
return uuid.UUID(config["telemetry"]["uuid"]).hex
except ValueError:
pass # Invalid UUID found, will generate a new one

# Generate a new UUID and save it to the config file
if not config.has_section("telemetry"):
config.add_section("telemetry")
new_uuid = uuid.uuid4().hex
config.set("telemetry", "uuid", new_uuid)

os.makedirs(config_path, exist_ok=True)
with open(full_path, "w") as configfile:
config.write(configfile)

return new_uuid

except Exception as e:
logging.error(f"Failed to get or create UUID: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be splitted into two function? doing both read and write in a function is a bit too much and hard to test.

For example, how are you testing for the case when config exist but telemetry section does not exist?

Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments, thanks @DimedS and @lrcouto !

kedro-telemetry/RELEASE.md Show resolved Hide resolved
kedro-telemetry/kedro_telemetry/plugin.py Show resolved Hide resolved
kedro-telemetry/kedro_telemetry/plugin.py Outdated Show resolved Hide resolved
kedro-telemetry/kedro_telemetry/plugin.py Outdated Show resolved Hide resolved
Signed-off-by: lrcouto <laurarccouto@gmail.com>
Signed-off-by: lrcouto <laurarccouto@gmail.com>
Signed-off-by: lrcouto <laurarccouto@gmail.com>
Signed-off-by: lrcouto <laurarccouto@gmail.com>
config = toml.load(f)

if "telemetry" in config and "uuid" in config["telemetry"]:
try:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we don't need additional try more, because we are returning the same for any error on level upper

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, I'm gonna change it

Signed-off-by: lrcouto <laurarccouto@gmail.com>
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor comments, but otherwise looks good! 👍 Also tested locally and seems like it's all working.

return ""


def _generate_new_uuid(full_path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed we're not doing any typing in kedro-telemetry, but can we maybe start adding that for new functions like this? 🙂

@@ -90,7 +122,7 @@ def before_command_run(
main_command = masked_command_args[0] if masked_command_args else "kedro"

logger.debug("You have opted into product usage analytics.")
hashed_username = _get_hashed_username()
hashed_username = _get_or_create_uuid()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the variable be renamed to user_uuid or something like that now it's not a hashed username anymore?


mocked_heap_call = mocker.patch("kedro_telemetry.plugin._send_heap_event")
telemetry_hook = KedroTelemetryCLIHooks()
command_args = []
telemetry_hook.before_command_run(fake_metadata, command_args)
expected_properties = {
"username": "digested",
"username": "hashed_username",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: This is the uuid now right? I know it's just test data, but maybe for clarity it would be helpful to specify the actual data so uuid in this case.

Signed-off-by: lrcouto <laurarccouto@gmail.com>
Copy link
Contributor

@ankatiyar ankatiyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments but looks good to me! Thanks @lrcouto and @DimedS!

@@ -1,4 +1,5 @@
# Upcoming release
* Updated the plugin to generate an unique UUID for each user of `kedro-telemetry`. (On release 0.4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Updated the plugin to generate an unique UUID for each user of `kedro-telemetry`. (On release 0.4)
* Updated the plugin to generate an unique UUID for each user of `kedro-telemetry`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The heading will say the release number anyway

@@ -90,17 +122,17 @@ def before_command_run(
main_command = masked_command_args[0] if masked_command_args else "kedro"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment on the line above since it hasn't been changed but could line 117 be changed to -
cli = KedroCLI(project_path=metadata.project_path)
Following the changes in kedro-org/kedro#3683, I think this will allow us to capture commands called from within subdirectories also and mask them properly

Signed-off-by: lrcouto <laurarccouto@gmail.com>
@lrcouto lrcouto changed the title feat(telemetry): Unique User IDs in kedro-telemetry feat(telemetry): Unique User IDs in kedro-telemetry - merge only for kedro-telemetry release 0.4.0 Mar 13, 2024
Signed-off-by: lrcouto <laurarccouto@gmail.com>
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! ⭐

@lrcouto lrcouto merged commit 846db2b into main Mar 18, 2024
13 checks passed
@lrcouto lrcouto deleted the unique-user-is-telemetry branch March 18, 2024 14:32
@astrojuanlu
Copy link
Member

Tested this and tried to break it in various ways, didn't find anything weird ⭐ Thanks @DimedS and @lrcouto!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

User identity for telemetry events is not unique
6 participants