Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpin alembic #5249

Merged
merged 19 commits into from Jan 13, 2022
8 changes: 4 additions & 4 deletions .github/workflows/master.yml
Expand Up @@ -161,11 +161,11 @@ jobs:
./dev/run-large-python-tests.sh
# Separate build and run to make it easier to explore logs
- name: Run database initialization tests - build
working-directory: tests/db
run: |
python setup.py bdist_wheel
cp -r dist tests/db
cd tests/db
./build_wheel.sh
docker-compose pull
docker image ls | grep -E '(REPOSITORY|postgres|mysql|mssql)'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show database image versions for debugging.

docker-compose build
- name: Run database initialization tests - run
working-directory: tests/db
Expand All @@ -175,7 +175,7 @@ jobs:
docker-compose run mlflow-mysql python run_checks.py --schema-output schemas/mysql.sql
docker-compose run mlflow-mssql ./init-mssql-db.sh
docker-compose run mlflow-mssql python run_checks.py --schema-output schemas/mssql.sql
docker-compose down --rmi all --volumes --remove-orphans
docker-compose down --volumes --remove-orphans --rmi all
- name: Run anaconda compatibility tests
run: |
./dev/test-anaconda-compatibility.sh "anaconda3:2020.11"
Expand Down
Expand Up @@ -31,7 +31,8 @@ def upgrade():
# operation is expected to fail under certain circumstances, we execute `drop_constraint()`
# outside of the batch operation context.
try:
op.drop_constraint(constraint_name="status", table_name="runs", type_="check")
# op.drop_constraint(constraint_name="status", table_name="runs", type_="check")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why drop this ?

Copy link
Member Author

@harupy harupy Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just temporarily commented out to see how the table definition looks like without this line.

Copy link
Member Author

@harupy harupy Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How the runs table definition looks like for each database:

sqlite:

CREATE TABLE runs (
	run_uuid VARCHAR(32) NOT NULL,
	name VARCHAR(250),
	source_type VARCHAR(20),
	source_name VARCHAR(500),
	entry_point_name VARCHAR(50),
	user_id VARCHAR(256),
	status VARCHAR(9),
	start_time BIGINT,
	end_time BIGINT,
	source_version VARCHAR(50),
	lifecycle_stage VARCHAR(20),
	artifact_uri VARCHAR(200),
	experiment_id INTEGER,
	CONSTRAINT run_pk PRIMARY KEY (run_uuid),
	FOREIGN KEY(experiment_id) REFERENCES experiments (experiment_id),
	CONSTRAINT runs_lifecycle_stage CHECK (lifecycle_stage IN ('active', 'deleted')),
	CONSTRAINT source_type CHECK (source_type IN ('NOTEBOOK', 'JOB', 'LOCAL', 'UNKNOWN', 'PROJECT')),
        -- 👇 Unnamed check constraint, expression looks correct
        -- the reason it's unnamed is probably because we don't specify `name`
        -- when constructing `Enum`:
        -- https://github.com/mlflow/mlflow/pull/5249/files#diff-3492d101d4bd194139919dcac84b713b0ee4526b79d32e45c44db3655f95e838R46
	CHECK (status IN ('SCHEDULED', 'FAILED', 'FINISHED', 'RUNNING', 'KILLED'))
)

postgres:

CREATE TABLE runs (
	run_uuid VARCHAR(32) NOT NULL,
	name VARCHAR(250),
	source_type VARCHAR(20),
	source_name VARCHAR(500),
	entry_point_name VARCHAR(50),
	user_id VARCHAR(256),
	status VARCHAR(9),
	start_time BIGINT,
	end_time BIGINT,
	source_version VARCHAR(50),
	lifecycle_stage VARCHAR(20),
	artifact_uri VARCHAR(200),
	experiment_id INTEGER,
	CONSTRAINT run_pk PRIMARY KEY (run_uuid),
	CONSTRAINT runs_experiment_id_fkey FOREIGN KEY(experiment_id) REFERENCES experiments (experiment_id),
	CONSTRAINT source_type CHECK ((source_type)::text = ANY ((ARRAY['NOTEBOOK'::character varying, 'JOB'::character varying, 'LOCAL'::character varying, 'UNKNOWN'::character varying, 'PROJECT'::character varying])::text[])),
	CONSTRAINT runs_lifecycle_stage CHECK ((lifecycle_stage)::text = ANY ((ARRAY['active'::character varying, 'deleted'::character varying])::text[])),
        -- 👇 Named check constraint, expression looks correct
	CONSTRAINT runs_status_check CHECK ((status)::text = ANY ((ARRAY['SCHEDULED'::character varying, 'FAILED'::character varying, 'FINISHED'::character varying, 'RUNNING'::character varying, 'KILLED'::character varying])::text[]))
)

mysql:

CREATE TABLE runs (
	run_uuid VARCHAR(32) NOT NULL,
	name VARCHAR(250),
	source_type VARCHAR(20),
	source_name VARCHAR(500),
	entry_point_name VARCHAR(50),
	user_id VARCHAR(256),
	status VARCHAR(9),
	start_time BIGINT,
	end_time BIGINT,
	source_version VARCHAR(50),
	lifecycle_stage VARCHAR(20),
	artifact_uri VARCHAR(200),
	experiment_id INTEGER,
	PRIMARY KEY (run_uuid),
	CONSTRAINT runs_ibfk_1 FOREIGN KEY(experiment_id) REFERENCES experiments (experiment_id),
        -- 👇 Duplicate
	CONSTRAINT runs_chk_1 CHECK ((`status` in (_utf8mb4'SCHEDULED',_utf8mb4'FAILED',_utf8mb4'FINISHED',_utf8mb4'RUNNING',_utf8mb4'KILLED'))),
	CONSTRAINT runs_lifecycle_stage CHECK ((`lifecycle_stage` in (_utf8mb4'active',_utf8mb4'deleted'))),
	CONSTRAINT source_type CHECK ((`source_type` in (_utf8mb4'NOTEBOOK',_utf8mb4'JOB',_utf8mb4'LOCAL',_utf8mb4'UNKNOWN',_utf8mb4'PROJECT'))),
        -- 👇 Duplicate
	CONSTRAINT status CHECK ((`status` in (_utf8mb4'SCHEDULED',_utf8mb4'FAILED',_utf8mb4'FINISHED',_utf8mb4'RUNNING')))
)

Copy link
Member Author

@harupy harupy Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we need this drop_constraint operation for mysql.

pass
except Exception as e:
_logger.warning(
"Failed to drop check constraint. Dropping check constraints may not be supported"
Expand Down
Expand Up @@ -5,26 +5,29 @@
Create Date: 2019-10-11 15:55:10.853449

"""
import alembic
from alembic import op
from mlflow.entities import RunStatus, ViewType
from mlflow.entities.lifecycle_stage import LifecycleStage
from mlflow.store.tracking.dbmodels.models import SqlRun, SourceTypes
from sqlalchemy import CheckConstraint, Enum
from packaging.version import Version

# revision identifiers, used by Alembic.
revision = "cfd24bdc0731"
down_revision = "2b4d017a5e9b"
branch_labels = None
depends_on = None

new_run_statuses = [
old_run_statuses = [
RunStatus.to_string(RunStatus.SCHEDULED),
RunStatus.to_string(RunStatus.FAILED),
RunStatus.to_string(RunStatus.FINISHED),
RunStatus.to_string(RunStatus.RUNNING),
RunStatus.to_string(RunStatus.KILLED),
]

new_run_statuses = [*old_run_statuses, RunStatus.to_string(RunStatus.KILLED)]

# Certain SQL backends (e.g., SQLite) do not preserve CHECK constraints during migrations.
# For these backends, CHECK constraints must be specified as table arguments. Here, we define
# the collection of CHECK constraints that should be preserved when performing the migration.
Expand All @@ -40,14 +43,27 @@


def upgrade():
with op.batch_alter_table("runs", table_args=check_constraint_table_args) as batch_op:
# Transform the "status" column to an `Enum` and define a new check constraint. Specify
# `native_enum=False` to create a check constraint rather than a
# database-backend-dependent enum (see https://docs.sqlalchemy.org/en/13/core/
# type_basics.html#sqlalchemy.types.Enum.params.native_enum)
batch_op.alter_column(
"status", type_=Enum(*new_run_statuses, create_constraint=True, native_enum=False)
)
new_type = Enum(*new_run_statuses, create_constraint=True, native_enum=False)
if Version(alembic.__version__) < Version("1.7.0"):
with op.batch_alter_table("runs", table_args=check_constraint_table_args) as batch_op:
# Transform the "status" column to an `Enum` and define a new check constraint. Specify
# `native_enum=False` to create a check constraint rather than a
# database-backend-dependent enum (see https://docs.sqlalchemy.org/en/13/core/
# type_basics.html#sqlalchemy.types.Enum.params.native_enum)
batch_op.alter_column("status", type_=new_type)
else:
# In alembic >= 1.7.0, `table_args` can be removed since CHECK constraints are preserved.
with op.batch_alter_table("runs") as batch_op:
existing_type = Enum(
*old_run_statuses, create_constraint=True, native_enum=False, name="status"
)
batch_op.alter_column(
"status",
type_=new_type,
# In alembic >= 1.7.0, `existing_type` is required to drop the existing CHECK
# constraint on the status column.
existing_type=existing_type,
)


def downgrade():
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Expand Up @@ -65,7 +65,7 @@ def package_files(directory):
other capabilities.
"""
CORE_REQUIREMENTS = SKINNY_REQUIREMENTS + [
"alembic<=1.4.1",
"alembic",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

# Required
"docker>=4.0.0",
"Flask",
Expand Down
9 changes: 9 additions & 0 deletions tests/db/build_wheel.sh
@@ -0,0 +1,9 @@
#!/usr/bin/env bash

set -ex

rm -rf dist
prefix=$(git rev-parse --show-prefix)
pushd $(git rev-parse --show-cdup)
python setup.py bdist_wheel --dist-dir $prefix/dist
popd
3 changes: 2 additions & 1 deletion tests/db/docker-compose.yml
Expand Up @@ -19,13 +19,14 @@ services:
MLFLOW_TRACKING_URI: postgresql://mlflowuser:mlflowpassword@postgres:5432/mlflowdb

mysql:
image: mysql:5.7
image: mysql
restart: always
environment:
MYSQL_ROOT_PASSWORD: root-password
MYSQL_DATABASE: mlflowdb
MYSQL_USER: mlflowuser
MYSQL_PASSWORD: mlflowpassword
command: mysqld --default-authentication-plugin=mysql_native_password
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In MySQL >= 8.0.4, this command is required to log in using a password.


mlflow-mysql:
depends_on:
Expand Down
6 changes: 6 additions & 0 deletions tests/db/run_checks.py
Expand Up @@ -35,6 +35,12 @@ def run_logging_operations():
)
print(mlflow.get_run(run.info.run_id))

# Ensure the following migration scripts are applied correctly:
# - cfd24bdc0731_update_run_status_constraint_with_killed.py
# - versions/0a8213491aaa_drop_duplicate_killed_constraint.py
client = mlflow.tracking.MlflowClient()
client.set_terminated(run_id=run.info.run_id, status="KILLED")


def get_db_schema():
engine = sqlalchemy.create_engine(mlflow.get_tracking_uri())
Expand Down