-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: upgrade mysql charset and collation to utf8mb4 #1065
base: redwood
Are you sure you want to change the base?
feat!: upgrade mysql charset and collation to utf8mb4 #1065
Conversation
b203c94
to
938abca
Compare
0e17056
to
80796da
Compare
47f01f9
to
438b312
Compare
changelog.d/20240520_150131_danyal.faheem_upgrade_mysql_utf8mb4.md
Outdated
Show resolved
Hide resolved
docs/local.rst
Outdated
|
||
Your database's charset and collation might not support specific characters or emojis. To upgrade your charset and collation of all the tables in your database to utf8mb4 and utf8mb4_unicode_ci respectively, run:: | ||
|
||
tutor local do change-charset-collation --all-tables --charset=utf8mb4 --collation=utf8mb4_unicode_ci | ||
|
||
Alternatively, if you only want to upgrade certain tables or exclude certain tables, you can use the `status` option. To upgrade the `courseware_studentmodule` and `courseware_studentmodulehistory` tables, run:: | ||
|
||
tutor local do change-charset-collation --status=include courseware_studentmodule courseware_studentmodulehistory | ||
|
||
Tutor performs pattern matching from the start of the table name so you can just enter the name of the app to include/exclude all the tables under that app. To upgrade all the tables in the database except the ones under the student and wiki apps, run:: | ||
|
||
tutor local do change-charset-collation --status=exclude student wiki |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should specify that the change is not reversible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a warning note at the top.
tutor/commands/upgrade/common.py
Outdated
@@ -66,3 +66,55 @@ def get_mongo_upgrade_parameters( | |||
mv '/openedx/data/ora2/SET-ME-PLEASE (ex. bucket-name)' /openedx/data/ora2/openedxuploads | |||
fi | |||
""" | |||
|
|||
def get_mysql_change_charset_query( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we should move this util to another file as we are not using this in upgrade anymore.
docs/local.rst
Outdated
|
||
Your database's charset and collation might not support specific characters or emojis. To upgrade your charset and collation of all the tables in your database to utf8mb4 and utf8mb4_unicode_ci respectively, run:: | ||
|
||
tutor local do change-charset-collation --all-tables --charset=utf8mb4 --collation=utf8mb4_unicode_ci |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which situation(s) will users have to run this command? Is it for users upgrading from Quince? Is it optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they can run it after upgrading from Quince. It is optional as tutor runs fine without it unless you add specific characters such as emojis that will throw DB errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That explanation should be present in the docs :)
docs/local.rst
Outdated
|
||
tutor local do change-charset-collation --all-tables --charset=utf8mb4 --collation=utf8mb4_unicode_ci | ||
|
||
Alternatively, if you only want to upgrade certain tables or exclude certain tables, you can use the `status` option. To upgrade the `courseware_studentmodule` and `courseware_studentmodulehistory` tables, run:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All verbatim comments need double backticks. E.g: ``status``
.
tutor/commands/jobs.py
Outdated
context_settings={"ignore_unknown_options": True}, | ||
) | ||
@click.option("--all-tables", is_flag=True, help="change all tables in the openedx database") | ||
@click.option("-s", "--status", type=click.Choice(['include', 'exclude'], case_sensitive=False), help="Whether to include or exclude certain tables/apps") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This combination of options is not very intuitive. It means that the end users needs to specify at least one option -- and that is not a great pattern, as options are considered... optional :)
Instead, I suggest that:
- all tables are process by default.
- If
--include
is specified, then only those tables are processed. - If
--exclude
is specified, then those tables are not processed.
Note that with this approach, --include
and --exclude
are mutually compatible (--exclude
would override --include
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all tables are process by default
I did not want to make this the default scenario in case someone accidentally executes the command while trying out the different options provided and they did not want to upgrade all the tables since this is a potentially irreversible change. Upgrading all tables at once can also take a long time.
Maybe we can prompt the user if they wish to proceed?
Note that with this approach, --include and --exclude are mutually compatible (--exclude would override --include).
I think this can be achieved and sounds more plausible. I'll update the options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can prompt the user if they wish to proceed?
Yes, this sounds perfect. Please make sure to include a -I/--non-interactive
option for headless scripts.
tutor/commands/jobs.py
Outdated
@click.command( | ||
short_help="Upgrade mysql to a specific charset and collation", | ||
help=( | ||
"Upgrade mysql to a specific charset and collation. You can either upgrade all tables, specify only certain tables to upgrade or specify certain tables to exclude from the upgrade process" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This docstring is misleading. It implies that it allows the user to upgrade to just any charset/collation. Instead, I suggest that we rename this command to something like convert-mysql-utf8mb4-charset
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original command was meant to allow users to upgrade to any charset/collation, however that caused issues with the connection string. However, since we have removed that option, I think the convert-mysql-utf8mb4-charset
makes more sense. I'll update it.
tutor/commands/upgrade/common.py
Outdated
|
||
SET FOREIGN_KEY_CHECKS = 0; | ||
Select _table_name; | ||
SET @statement = CONCAT('ALTER TABLE ', _table_name, ' CONVERT TO CHARACTER SET {charset} COLLATE {collation};'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this change columns that currently use utf8mb4_bin
collation to be utf8mb4_unicode_ci
? If so, that's a problem. The relatively small number of columns that use utf8mb4_bin
are usually doing so because they explicitly want the values to be case sensitive, because case insensitivity on things like course keys has given us a lot of grief in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify that last statement, an example was that MongoDB and the Python layer of CourseKeys are case sensitive, but the key stored in our CourseOverview model is not case sensitive. So at one point we had two course runs in production that had keys that differed only by case (because someone had misspelled when doing the initial creation), and updates to one could clobber the values for the other because they were fighting over the same row in CourseOverview for updates. (The inactive course run was still getting automated push updates from an external system.)
I realize that fixing CourseOverview is well outside the scope of this work, and I'm not trying to suggest we tackle that here. I only put it out there as an example of why apps might want to explicitly specify a sensitive field for storing things like identifier keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this change columns that currently use utf8mb4_bin collation to be utf8mb4_unicode_ci?
It seems this is the case. I think we can add a condition that columns with the utf8mb4_bin
collation should not be upgraded. Any other specific charsets or collations that we should look out for here?
Or we can just upgrade all the columns that are using utf8mb3_general_ci
so that if some other column has their charset/collation fixed by the model in the future, we don't have to update the check for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on your findings, it seems like the only forced collations that are not utf8mb4_unicode_ci
are:
utf8mb3_bin
: This should be mapped intoutf8mb4_bin
, since that's the logically equivalent case-sensitive collation.utf8mb4_bin
: This should be preserved as-is, as discussed.utf8mb4_general_ci
: This would be mapped toutf8mb4_unicode_ci
, but it looks like it only exists for thebundles
app, and that app is unused in Redwood and removed entirely just after the Redwood cut, so it's fine to ignore or update it, depending on what's more convenient in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the command to perform the following mappings instead:
utf8mb3_general_ci
toutf8mb4_unicode_ci
utf8mb3_*
toutf8mb4_*
meaning that theutf8mb3_bin
collations are converted toutf8mb4_bin
now.
utf8mb4_bin: This should be preserved as-is, as discussed.
We ignore any columns that already have the utf8mb4
charset.
utf8mb4_general_ci: This would be mapped to utf8mb4_unicode_ci, but it looks like it only exists for the bundles app, and that app is unused in Redwood and removed entirely just after the Redwood cut, so it's fine to ignore or update it, depending on what's more convenient in the code.
Yup, trying it out on the nightly release, the bundles app was not there. Therefore, I've ignored it.
tutor/commands/upgrade/common.py
Outdated
LEAVE tables_loop; | ||
END IF; | ||
|
||
SET FOREIGN_KEY_CHECKS = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this cause any issue when trying to do a partial migration that includes the course_overviews_courseoverview
table? That's the only table I'm aware of that has a varchar primary key and foreign keys from other tables.
fixes #938.
Changes
utf8mb4
and the default collation toutf8mb4_unicode_ci
.dosqlshell
command to useutf8mb4
as the default character set.tutor local do change-charset-collation
to allow users to upgrade the charset and collation of their tutor installationsTesting
tutor [dev|k8s] launch