Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically update metric plots for in-progress runs #2099 #5017

Merged
merged 21 commits into from Dec 2, 2021

Conversation

cedkoffeto
Copy link
Contributor

@cedkoffeto cedkoffeto commented Nov 7, 2021

Signed-off-by: Cedric Koffeto cedkoffeto@gmail.com

What changes are proposed in this pull request?

setInterval function added in metricsPlotPanel component which retrieves execution status and metrics history from API every 8 seconds to update the plot and automatically stops when all executions are complete. closes #2099

How is this patch tested?

  • Unit tests
  • Manually verified the proposed feature works correctly using this script:
import time
import numpy as np
import mlflow
import multiprocessing as mp


def log(run_id, slope, repeat):
    sleep = 10
    with mlflow.start_run(run_id=run_id):
        for epoch in range(1, repeat + 1):
            print(epoch)
            mlflow.log_metric(key="metric1", value=slope * epoch * np.log(epoch), step=epoch)
            mlflow.log_metric(key="metric2", value=slope * (1 / epoch) * np.log(epoch), step=epoch)
            time.sleep(sleep)


client = mlflow.tracking.MlflowClient()
run_uuids = [client.create_run("0").info.run_id for _ in range(2)]
runs_param = "[" + ",".join(map(lambda s: f"%22{s}%22", run_uuids)) + "]"

print(
    "URL:",
    r"http://localhost:3000/#/metric/metric1?runs=<<< runs_param >>>&experiment=0&plot_metric_keys=[%22metric1%22]&plot_layout={%22autosize%22:true,%22xaxis%22:{},%22yaxis%22:{}}&x_axis=step&y_axis_scale=linear&line_smoothness=1&show_point=true&deselected_curves=[]&last_linear_y_axis_range=[]".replace(
        "<<< runs_param >>>", runs_param
    ),
)

args_list = [(run_uuid, idx + 1, 5 + idx * 3) for idx, run_uuid in enumerate(run_uuids)]

with mp.Pool() as pool:
    pool.starmap(log, args_list)

It can be tested with a unit test to check if the plot is actually updated

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

Automatically update metric plots for in-progress runs

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

@github-actions
Copy link

github-actions bot commented Nov 7, 2021

@cedkoffeto Thanks for the contribution! The DCO check failed. Please sign off your commits by following the instructions here: https://github.com/mlflow/mlflow/runs/4132478673. See https://github.com/mlflow/mlflow/blob/master/CONTRIBUTING.rst#sign-your-work for more details.

@github-actions github-actions bot added area/uiux Front-end, user experience, plotting, JavaScript, JavaScript dev server rn/feature Mention under Features in Changelogs. labels Nov 7, 2021
Signed-off-by: Cedric Koffeto
cedkoffeto@gmail.com
Signed-off-by: cedric koffeto <cedkoffeto@gmail.com>
Signed-off-by: cedric koffeto <cedkoffeto@gmail.com>
Signed-off-by: cedric koffeto <cedkoffeto@gmail.com>
@harupy
Copy link
Member

harupy commented Nov 8, 2021

Hi @cedkoffeto, thanks for the PR! I'll review it soon :)

@cedkoffeto
Copy link
Contributor Author

Hi @cedkoffeto, thanks for the PR! I'll review it soon :)

@harupy Glad to be able to help 😉

@harupy
Copy link
Member

harupy commented Nov 8, 2021

@cedkoffeto btw could you take a screen record of how the metric plot automatically gets updated?

Signed-off-by: cedric koffeto <cedkoffeto@gmail.com>
@cedkoffeto
Copy link
Contributor Author

@cedkoffeto btw could you take a screen record of how the metric plot automatically gets updated?

Here it is @harupy

Enregistrement.de.l.ecran.2021-11-08.a.03.09.24.mp4

@harupy
Copy link
Member

harupy commented Nov 8, 2021

Thanks for the screen recording. It looks like the entire plot gets rendered.

I'm investigating how we can only update the lines like this:

only-update-lines.mov

@harupy
Copy link
Member

harupy commented Nov 8, 2021

Here's my attempt: https://github.com/harupy/mlflow/tree/5017-harupy. The implementation is almost the same as yours. In my implementation, I don't call this.setState((prevState) => ({historyRequestIds: [...]})). I'm investigating whether this is required or not.

          this.setState((prevState) => ({
            historyRequestIds: [...prevState.historyRequestIds, ...requestIds],
          }));

The message showing up at the top is just for demo purposes.

auto-plot-update.mov

Python script I used:

import time
import numpy as np
import mlflow


with mlflow.start_run() as run:
    print(
        "URL:",
        r"http://localhost:3000/#/metric/metric1?runs=[%22<<< RUN_ID >>>%22]&experiment=0&plot_metric_keys=[%22metric1%22]&plot_layout={%22autosize%22:true,%22xaxis%22:{},%22yaxis%22:{}}&x_axis=relative&y_axis_scale=linear&line_smoothness=1&show_point=true&deselected_curves=[]&last_linear_y_axis_range=[]".replace(
            "<<< RUN_ID >>>", run.info.run_id
        ),
    )
    for epoch in range(1, 10):
        print(epoch)
        mlflow.log_metric(key="metric1", value=epoch * np.log(epoch), step=epoch)
        mlflow.log_metric(key="metric2", value=(1 / epoch) * np.log(epoch), step=epoch)
        time.sleep(3)

@dbczumar
Copy link
Collaborator

dbczumar commented Nov 8, 2021

@cedkoffeto @harupy Awesome stuff! Does the proposal from https://github.com/harupy/mlflow/tree/5017-harupy also preserve plot customizations and zoom?

@cedkoffeto
Copy link
Contributor Author

this.setState((prevState) => ({historyRequestIds: [...]}))

Hi @harupy,
In fact, I also think that saving requests seems unnecessary in our case.

@cedkoffeto
Copy link
Contributor Author

cedkoffeto commented Nov 8, 2021

@cedkoffeto @harupy Awesome stuff! Does the proposal from https://github.com/harupy/mlflow/tree/5017-harupy also preserve plot customizations and zoom?

Thanks @dbczumar
I think it does preserve plot customizations and zoom but let @harupy confirm.

@harupy
Copy link
Member

harupy commented Nov 9, 2021

@dbczumar Yep, it does. Here's a quick demo.

auto-plot-update-customization.mov

Signed-off-by: harupy <hkawamura0130@gmail.com>
@cedkoffeto
Copy link
Contributor Author

Hi @harupy, any update?

@harupy
Copy link
Member

harupy commented Nov 26, 2021

Hi @cedkoffeto, sorry for the late reply. We internally discussed this feature. Here's our latest prototype:

automatic-metric-plot-update.mov

code: https://github.com/harupy/mlflow/pull/28/files

@cedkoffeto
Copy link
Contributor Author

Hi @cedkoffeto, sorry for the late reply. We internally discussed this feature. Here's our latest prototype:

automatic-metric-plot-update.mov
code: https://github.com/harupy/mlflow/pull/28/files

Hi @harupy, that's great!

Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
@harupy
Copy link
Member

harupy commented Nov 29, 2021

@cedkoffeto I pushed some commits to update the PR.

Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
Comment on lines 27 to 28
// Stop polling when the polling duration exceeds this value
export const METRICS_PLOT_POLLING_DURATION_MS = 3600 * 1000; // 1 hour
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only stop polling when there's no new data.

Copy link
Member

@harupy harupy Nov 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbczumar I remember we discussed that we should set an appropriate polling threshold when a run never ends, but not setting such a threshold sounds ok to me because runs end in most cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offline discussion: check the timestamp of the last metric, and if it's more than 1 week, then we won't refresh.

@@ -22,10 +22,16 @@ import { getUUID } from '../../common/utils/ActionUtils';
export const CHART_TYPE_LINE = 'line';
export const CHART_TYPE_BAR = 'bar';

// Polling interval
export const METRICS_PLOT_POLLING_INTERVAL_MS = 5000;
Copy link
Collaborator

@dbczumar dbczumar Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we increase this to 10 seconds? 5 seems aggressive.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, what happens if the refresh fails? Does the page crash?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we increase this to 10 seconds? 5 seems aggressive.

Sure!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, what happens if the refresh fails? Does the page crash?

Let me test.

Copy link
Member

@harupy harupy Nov 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when-request-fails.mov
  • The page doesn't crash.
  • The page keeps polling.

@cedkoffeto
Copy link
Contributor Author

@cedkoffeto I pushed some commits to update the PR.
👍🏽

Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
export const METRICS_PLOT_POLLING_INTERVAL_MS = 10 * 1000; // 10 seconds
// A run is considered as 'hanging' if its status is 'RUNNING' but its latest metric was logged
// prior to this threshold. The metrics plot doesn't automatically update hanging runs.
export const METRICS_PLOT_HANGING_RUN_THRESHOLD_MS = 3600 * 24 * 7 * 1000; // 1 week
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "hanging" make sense?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

@@ -611,6 +611,10 @@
"defaultMessage": "Registered Models",
"description": "Text for registered model link in the title for model comparison page"
},
"UEDu0c": {
"defaultMessage": "MLflow UI automatically fetches metric histories for active runs and updates the metrics plot with a {interval} second interval.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Included interval so a user doesn't need to guess or measure how long the interval is.

Comment on lines -40 to +67
{ key: 'metric_1', value: 100, step: 2, timestamp: 1556662044000 },
{ key: 'metric_1', value: 50, step: 1, timestamp: 1556662043000 },
{ key: 'metric_1', value: 100, step: 2, timestamp: now },
{ key: 'metric_1', value: 50, step: 1, timestamp: now - 1 },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced hardcoded timestamps with now to prevent the metrics plot from considering these runs as hanging.

Copy link
Collaborator

@dbczumar dbczumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Awesome work, @cedkoffeto, @harupy ! Thank you so much for this contribution, @cedkoffeto!

Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
@harupy harupy merged commit 1b4bbb6 into mlflow:master Dec 2, 2021
@cedkoffeto
Copy link
Contributor Author

much

Thanks! It was a pleasure :)
Thanks also to @harupy for your great help 🙏🏽

@sim-san sim-san mentioned this pull request Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/uiux Front-end, user experience, plotting, JavaScript, JavaScript dev server rn/feature Mention under Features in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FR] Automatically update metric plots for in-progress runs
3 participants