Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unexpected error when URI is quoted in "mlflow run URI" #5114

Closed
2 of 23 tasks
dinaldoap opened this issue Nov 28, 2021 · 5 comments
Closed
2 of 23 tasks

[BUG] Unexpected error when URI is quoted in "mlflow run URI" #5114

dinaldoap opened this issue Nov 28, 2021 · 5 comments
Labels
area/projects MLproject format, project running backends bug Something isn't working

Comments

@dinaldoap
Copy link
Contributor

Thank you for submitting an issue. Please refer to our issue policy for additional information about bug reports. For help with debugging your code, please refer to Stack Overflow.

Please fill in this bug report template to ensure a timely and thorough response.

Willingness to contribute

The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
  • No. I cannot contribute a bug fix at this time.

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow):
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 20.04
  • MLflow installed from (source or binary): pip installed
  • MLflow version (run mlflow --version): 1.21.0
  • Python version: 3.6.13
  • npm version, if running the dev UI: -
  • Exact command to reproduce: mlflow run "'https://github.com/mlflow/mlflow.git#examples/sklearn_elasticnet_wine'" --version af8460f4f5f8bd407a597e1e52e2ff77d646a3cdD

Describe the problem

Describe the problem clearly here. Include descriptions of the expected behavior and the actual behavior.

When one submits a kubernetes job with the function mlflow.kubernetes.run_kubernetes_job, the API adds quotes to command's strings with '#', since this is the behavior of shlex.quote, called by mlflow.kubernetes._get_run_command. However, when there are quotes in the URI, the function mlflow.projects._parse_subdirectory doesn't work properly. I expected that mlflow.projects._parse_subdirectory worked the same way regardless the presence of quotes in the URI.

The following description was taken from a submitted job that didn't run properly:
...
Containers:
example:
Command:
mlflow
run
'https://github.com/mlflow.git#examples/sklearn_elasticnet_wine'
-e
main
--run-id
2ca2d29eb2684a488a3dd64a5a3d4ec6
...

Code to reproduce issue

Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Despite its association with mlflow.kubernetes, this error can be reproduced without it. The following command ensures that the single quotes around the URI reach mlflow.projects._parse_subdirectory function:

mlflow run "'https://github.com/mlflow/mlflow.git#examples/sklearn_elasticnet_wine'" --version af8460f4f5f8bd407a597e1e52e2ff77d646a3cdD

Other info / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

Log from the command to reproduce the bug:
mlflow-dev-env) miniconda@8af2d3c356c3:/workspace$ mlflow run "'https://github.com/mlflow/mlflow.git#examples/sklearn_elasticnet_wine'" --version af8460f
/workspace/mlflow/server/handlers.py:119: UserWarning: Failure attempting to register store for scheme "file-plugin": No module named 'mlflow_test_plugin.sqlalchemy_store'
self.register_entrypoints()
2021/11/28 11:00:25 INFO mlflow.projects.utils: === Fetching project from 'https://github.com/mlflow/mlflow.git#examples/sklearn_elasticnet_wine' into /tmp/tmp9noqut9_ ===
Traceback (most recent call last):
File "/workspace/.conda/envs/mlflow-dev-env/bin/mlflow", line 33, in
sys.exit(load_entry_point('mlflow', 'console_scripts', 'mlflow')())
File "/workspace/.conda/envs/mlflow-dev-env/lib/python3.6/site-packages/click/core.py", line 1128, in call
return self.main(*args, **kwargs)
File "/workspace/.conda/envs/mlflow-dev-env/lib/python3.6/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/workspace/.conda/envs/mlflow-dev-env/lib/python3.6/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/workspace/.conda/envs/mlflow-dev-env/lib/python3.6/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/workspace/.conda/envs/mlflow-dev-env/lib/python3.6/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/workspace/mlflow/cli.py", line 181, in run
run_id=run_id,
File "/workspace/mlflow/projects/init.py", line 304, in run
synchronous=synchronous,
File "/workspace/mlflow/projects/init.py", line 99, in _run
experiment_id,
File "/workspace/mlflow/projects/backend/local.py", line 45, in run
work_dir = fetch_and_validate_project(project_uri, version, entry_point, params)
File "/workspace/mlflow/projects/utils.py", line 125, in fetch_and_validate_project
work_dir = _fetch_project(uri=uri, version=version)
File "/workspace/mlflow/projects/utils.py", line 159, in _fetch_project
_fetch_git_repo(parsed_uri, version, dst_dir)
File "/workspace/mlflow/projects/utils.py", line 186, in _fetch_git_repo
origin.fetch(depth=GIT_FETCH_DEPTH)
File "/workspace/.conda/envs/mlflow-dev-env/lib/python3.6/site-packages/git/remote.py", line 828, in fetch
res = self._get_fetch_info_from_stderr(proc, progress)
File "/workspace/.conda/envs/mlflow-dev-env/lib/python3.6/site-packages/git/remote.py", line 702, in _get_fetch_info_from_stderr
proc.wait(stderr=stderr_text)
File "/workspace/.conda/envs/mlflow-dev-env/lib/python3.6/site-packages/git/cmd.py", line 447, in wait
raise GitCommandError(remove_password_if_present(self.args), status, errstr)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
cmdline: git fetch -v --depth=1 origin
stderr: 'fatal: protocol ''https' is not supported'

What component(s), interfaces, languages, and integrations does this bug affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations
@dinaldoap dinaldoap added the bug Something isn't working label Nov 28, 2021
@github-actions github-actions bot added the area/projects MLproject format, project running backends label Nov 28, 2021
@dinaldoap
Copy link
Contributor Author

Related Pull Request #5117

@harupy
Copy link
Member

harupy commented Nov 29, 2021

@dinaldoap Do you have a repository for your kubernetes project? Just curious how URI ends up looking like "'https://github.com/mlflow/mlflow.git#examples/sklearn_elasticnet_wine'"

@dinaldoap
Copy link
Contributor Author

@harupy Currently, my code is in my employer's private repository with additional content that I can't share. However, I can share the code snippet that makes the URI ending up with quotes:

elif backend == "mybackend":
    from mlflow.projects import kubernetes as kb
    kube_config = _parse_kubernetes_config(backend_config)
    image_tag = kube_config["image-tag"]
    image_digest = kube_config["image-digest"]
    command = _build_mlflow_run_cmd(uri, entry_point, storage_dir, use_conda, active_run.info.run_id, parameters)
    env_vars = _get_run_env_vars(
            run_id=active_run.info.run_uuid,
            experiment_id=active_run.info.experiment_id
        )
    env_vars.update(_get_pam_env_vars())
    submitted_run = kb.run_kubernetes_job(
        project.name,
        active_run,
        image_tag,
        image_digest,
        command,
        env_vars,
        kube_config.get('kube-context', None),
        kube_config['kube-job-template']
    )
    return submitted_run

Currently, this code is embedded in my private mlflow's fork, module mlflow.projects.init, function _run. But, my plan is to create a backend plugin to run the command "mlflow run URI" in a kubernetes container upon a pre-built image (e.g.: continuumio/miniconda3:4.10.3). That behavior is different from kubernetes' backend, which builds the image dynamically and runs a command extracted from MLproject's entrypoint. My approach was required because my environment doesn't have Docker to build images dynamically. Besides that, I see an opportunity for runtime optimization by using project's specific pre-built image with the command "mlflow run URI --no-conda".

@harupy
Copy link
Member

harupy commented Nov 29, 2021

@dinaldoap Thanks for the clarification, how does the command passed to run_kubernetes_job look like?

@dinaldoap
Copy link
Contributor Author

@harupy Thanks for the attention. I've captured the command and the kubernetes_job_definition from mybackend execution. They are as follows:

command = [
    "mlflow",
    "run",
    "https://github.com/mlflow/mlflow.git#examples/docker",
    "-e",
    "main",
    "--run-id",
    "5c1296cd350d4cc689570dd77b80f4d2",
    "-P",
    "alpha=.1",
]
kubernetes_job_definition = {
    "apiVersion": "batch/v1",
    "kind": "Job",
    "metadata": {"name": "docker-example-2021-11-30-11-47-14-471167", "namespace": "mlflow"},
    "spec": {
        "ttlSecondsAfterFinished": 100,
        "backoffLimit": 0,
        "template": {
            "spec": {
                "containers": [
                    {
                        "name": "docker-example",
                        "image": "continuumio/miniconda3:4.10.3@sha256:a137c7da98c8680467490e15ac3c54e25db77495be1737076b053a6790ad6082",
                        "command": [
                            "mlflow",
                            "run",
                            "'https://github.com/mlflow/mlflow.git#examples/docker'",
                            "-e",
                            "main",
                            "--run-id",
                            "5c1296cd350d4cc689570dd77b80f4d2",
                            "-P",
                            "alpha=.1",
                        ],
                        "resources": {
                            "limits": {"memory": "512Mi"},
                            "requests": {"memory": "256Mi"},
                        },
                        "env": [
                            {"name": "MLFLOW_RUN_ID", "value": "5c1296cd350d4cc689570dd77b80f4d2"},
                            {"name": "MLFLOW_TRACKING_URI", "value": "file:///workspace/mlruns"},
                            {"name": "MLFLOW_EXPERIMENT_ID", "value": "0"},
                        ],
                    }
                ],
                "restartPolicy": "Never",
            }
        },
    },
}

The initial command doesn't have quotes in uri, but mlflow.projects.kubernetes.run_kubernetes_job calls _get_run_command, and that function adds quotes to the uri.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/projects MLproject format, project running backends bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants