New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Unexpected error when URI is quoted in "mlflow run URI" #5114
Comments
Related Pull Request #5117 |
@dinaldoap Do you have a repository for your kubernetes project? Just curious how URI ends up looking like |
@harupy Currently, my code is in my employer's private repository with additional content that I can't share. However, I can share the code snippet that makes the URI ending up with quotes:
Currently, this code is embedded in my private mlflow's fork, module mlflow.projects.init, function _run. But, my plan is to create a backend plugin to run the command "mlflow run URI" in a kubernetes container upon a pre-built image (e.g.: continuumio/miniconda3:4.10.3). That behavior is different from kubernetes' backend, which builds the image dynamically and runs a command extracted from MLproject's entrypoint. My approach was required because my environment doesn't have Docker to build images dynamically. Besides that, I see an opportunity for runtime optimization by using project's specific pre-built image with the command "mlflow run URI --no-conda". |
@dinaldoap Thanks for the clarification, how does the |
@harupy Thanks for the attention. I've captured the
The initial |
Thank you for submitting an issue. Please refer to our issue policy for additional information about bug reports. For help with debugging your code, please refer to Stack Overflow.
Please fill in this bug report template to ensure a timely and thorough response.
Willingness to contribute
The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?
System information
mlflow --version
): 1.21.0Describe the problem
Describe the problem clearly here. Include descriptions of the expected behavior and the actual behavior.
When one submits a kubernetes job with the function mlflow.kubernetes.run_kubernetes_job, the API adds quotes to command's strings with '#', since this is the behavior of shlex.quote, called by mlflow.kubernetes._get_run_command. However, when there are quotes in the URI, the function mlflow.projects._parse_subdirectory doesn't work properly. I expected that mlflow.projects._parse_subdirectory worked the same way regardless the presence of quotes in the URI.
The following description was taken from a submitted job that didn't run properly:
...
Containers:
example:
Command:
mlflow
run
'https://github.com/mlflow.git#examples/sklearn_elasticnet_wine'
-e
main
--run-id
2ca2d29eb2684a488a3dd64a5a3d4ec6
...
Code to reproduce issue
Provide a reproducible test case that is the bare minimum necessary to generate the problem.
Despite its association with mlflow.kubernetes, this error can be reproduced without it. The following command ensures that the single quotes around the URI reach mlflow.projects._parse_subdirectory function:
mlflow run "'https://github.com/mlflow/mlflow.git#examples/sklearn_elasticnet_wine'" --version af8460f4f5f8bd407a597e1e52e2ff77d646a3cdD
Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
Log from the command to reproduce the bug:
mlflow-dev-env) miniconda@8af2d3c356c3:/workspace$ mlflow run "'https://github.com/mlflow/mlflow.git#examples/sklearn_elasticnet_wine'" --version af8460f
/workspace/mlflow/server/handlers.py:119: UserWarning: Failure attempting to register store for scheme "file-plugin": No module named 'mlflow_test_plugin.sqlalchemy_store'
self.register_entrypoints()
2021/11/28 11:00:25 INFO mlflow.projects.utils: === Fetching project from 'https://github.com/mlflow/mlflow.git#examples/sklearn_elasticnet_wine' into /tmp/tmp9noqut9_ ===
Traceback (most recent call last):
File "/workspace/.conda/envs/mlflow-dev-env/bin/mlflow", line 33, in
sys.exit(load_entry_point('mlflow', 'console_scripts', 'mlflow')())
File "/workspace/.conda/envs/mlflow-dev-env/lib/python3.6/site-packages/click/core.py", line 1128, in call
return self.main(*args, **kwargs)
File "/workspace/.conda/envs/mlflow-dev-env/lib/python3.6/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/workspace/.conda/envs/mlflow-dev-env/lib/python3.6/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/workspace/.conda/envs/mlflow-dev-env/lib/python3.6/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/workspace/.conda/envs/mlflow-dev-env/lib/python3.6/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/workspace/mlflow/cli.py", line 181, in run
run_id=run_id,
File "/workspace/mlflow/projects/init.py", line 304, in run
synchronous=synchronous,
File "/workspace/mlflow/projects/init.py", line 99, in _run
experiment_id,
File "/workspace/mlflow/projects/backend/local.py", line 45, in run
work_dir = fetch_and_validate_project(project_uri, version, entry_point, params)
File "/workspace/mlflow/projects/utils.py", line 125, in fetch_and_validate_project
work_dir = _fetch_project(uri=uri, version=version)
File "/workspace/mlflow/projects/utils.py", line 159, in _fetch_project
_fetch_git_repo(parsed_uri, version, dst_dir)
File "/workspace/mlflow/projects/utils.py", line 186, in _fetch_git_repo
origin.fetch(depth=GIT_FETCH_DEPTH)
File "/workspace/.conda/envs/mlflow-dev-env/lib/python3.6/site-packages/git/remote.py", line 828, in fetch
res = self._get_fetch_info_from_stderr(proc, progress)
File "/workspace/.conda/envs/mlflow-dev-env/lib/python3.6/site-packages/git/remote.py", line 702, in _get_fetch_info_from_stderr
proc.wait(stderr=stderr_text)
File "/workspace/.conda/envs/mlflow-dev-env/lib/python3.6/site-packages/git/cmd.py", line 447, in wait
raise GitCommandError(remove_password_if_present(self.args), status, errstr)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
cmdline: git fetch -v --depth=1 origin
stderr: 'fatal: protocol ''https' is not supported'
What component(s), interfaces, languages, and integrations does this bug affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsThe text was updated successfully, but these errors were encountered: