Automatically update metric plots for in-progress runs #2099 #5017

cedkoffeto · 2021-11-07T20:13:58Z

Signed-off-by: Cedric Koffeto cedkoffeto@gmail.com

What changes are proposed in this pull request?

setInterval function added in metricsPlotPanel component which retrieves execution status and metrics history from API every 8 seconds to update the plot and automatically stops when all executions are complete. closes #2099

How is this patch tested?

Unit tests
Manually verified the proposed feature works correctly using this script:

import time
import numpy as np
import mlflow
import multiprocessing as mp


def log(run_id, slope, repeat):
    sleep = 10
    with mlflow.start_run(run_id=run_id):
        for epoch in range(1, repeat + 1):
            print(epoch)
            mlflow.log_metric(key="metric1", value=slope * epoch * np.log(epoch), step=epoch)
            mlflow.log_metric(key="metric2", value=slope * (1 / epoch) * np.log(epoch), step=epoch)
            time.sleep(sleep)


client = mlflow.tracking.MlflowClient()
run_uuids = [client.create_run("0").info.run_id for _ in range(2)]
runs_param = "[" + ",".join(map(lambda s: f"%22{s}%22", run_uuids)) + "]"

print(
    "URL:",
    r"http://localhost:3000/#/metric/metric1?runs=<<< runs_param >>>&experiment=0&plot_metric_keys=[%22metric1%22]&plot_layout={%22autosize%22:true,%22xaxis%22:{},%22yaxis%22:{}}&x_axis=step&y_axis_scale=linear&line_smoothness=1&show_point=true&deselected_curves=[]&last_linear_y_axis_range=[]".replace(
        "<<< runs_param >>>", runs_param
    ),
)

args_list = [(run_uuid, idx + 1, 5 + idx * 3) for idx, run_uuid in enumerate(run_uuids)]

with mp.Pool() as pool:
    pool.starmap(log, args_list)

It can be tested with a unit test to check if the plot is actually updated

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

Automatically update metric plots for in-progress runs

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

github-actions · 2021-11-07T20:14:15Z

@cedkoffeto Thanks for the contribution! The DCO check failed. Please sign off your commits by following the instructions here: https://github.com/mlflow/mlflow/runs/4132478673. See https://github.com/mlflow/mlflow/blob/master/CONTRIBUTING.rst#sign-your-work for more details.

Signed-off-by: Cedric Koffeto cedkoffeto@gmail.com Signed-off-by: cedric koffeto <cedkoffeto@gmail.com>

Signed-off-by: cedric koffeto <cedkoffeto@gmail.com>

harupy · 2021-11-08T00:11:08Z

Hi @cedkoffeto, thanks for the PR! I'll review it soon :)

cedkoffeto · 2021-11-08T00:19:51Z

Hi @cedkoffeto, thanks for the PR! I'll review it soon :)

@harupy Glad to be able to help 😉

harupy · 2021-11-08T00:44:58Z

@cedkoffeto btw could you take a screen record of how the metric plot automatically gets updated?

Signed-off-by: cedric koffeto <cedkoffeto@gmail.com>

cedkoffeto · 2021-11-08T02:23:21Z

@cedkoffeto btw could you take a screen record of how the metric plot automatically gets updated?

Here it is @harupy

Enregistrement.de.l.ecran.2021-11-08.a.03.09.24.mp4

harupy · 2021-11-08T05:49:24Z

Thanks for the screen recording. It looks like the entire plot gets rendered.

I'm investigating how we can only update the lines like this:

only-update-lines.mov

harupy · 2021-11-08T09:05:21Z

Here's my attempt: https://github.com/harupy/mlflow/tree/5017-harupy. The implementation is almost the same as yours. In my implementation, I don't call this.setState((prevState) => ({historyRequestIds: [...]})). I'm investigating whether this is required or not.

          this.setState((prevState) => ({
            historyRequestIds: [...prevState.historyRequestIds, ...requestIds],
          }));

The message showing up at the top is just for demo purposes.

auto-plot-update.mov

Python script I used:

import time
import numpy as np
import mlflow


with mlflow.start_run() as run:
    print(
        "URL:",
        r"http://localhost:3000/#/metric/metric1?runs=[%22<<< RUN_ID >>>%22]&experiment=0&plot_metric_keys=[%22metric1%22]&plot_layout={%22autosize%22:true,%22xaxis%22:{},%22yaxis%22:{}}&x_axis=relative&y_axis_scale=linear&line_smoothness=1&show_point=true&deselected_curves=[]&last_linear_y_axis_range=[]".replace(
            "<<< RUN_ID >>>", run.info.run_id
        ),
    )
    for epoch in range(1, 10):
        print(epoch)
        mlflow.log_metric(key="metric1", value=epoch * np.log(epoch), step=epoch)
        mlflow.log_metric(key="metric2", value=(1 / epoch) * np.log(epoch), step=epoch)
        time.sleep(3)

dbczumar · 2021-11-08T19:20:47Z

@cedkoffeto @harupy Awesome stuff! Does the proposal from https://github.com/harupy/mlflow/tree/5017-harupy also preserve plot customizations and zoom?

cedkoffeto · 2021-11-08T20:06:04Z

this.setState((prevState) => ({historyRequestIds: [...]}))

Hi @harupy,
In fact, I also think that saving requests seems unnecessary in our case.

cedkoffeto · 2021-11-08T20:12:31Z

@cedkoffeto @harupy Awesome stuff! Does the proposal from https://github.com/harupy/mlflow/tree/5017-harupy also preserve plot customizations and zoom?

Thanks @dbczumar
I think it does preserve plot customizations and zoom but let @harupy confirm.

harupy · 2021-11-09T00:31:17Z

@dbczumar Yep, it does. Here's a quick demo.

auto-plot-update-customization.mov

mlflow/server/js/src/experiment-tracking/components/MetricsPlotPanel.js

Signed-off-by: harupy <hkawamura0130@gmail.com>

mlflow/server/js/src/experiment-tracking/components/MetricsPlotPanel.js

cedkoffeto · 2021-11-19T17:59:59Z

Hi @harupy, any update?

harupy · 2021-11-26T08:32:17Z

Hi @cedkoffeto, sorry for the late reply. We internally discussed this feature. Here's our latest prototype:

automatic-metric-plot-update.mov

code: https://github.com/harupy/mlflow/pull/28/files

cedkoffeto · 2021-11-26T21:09:09Z

Hi @cedkoffeto, sorry for the late reply. We internally discussed this feature. Here's our latest prototype:

automatic-metric-plot-update.mov
code: https://github.com/harupy/mlflow/pull/28/files

Hi @harupy, that's great!

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy · 2021-11-29T06:55:39Z

@cedkoffeto I pushed some commits to update the PR.

Signed-off-by: harupy <hkawamura0130@gmail.com>

dbczumar · 2021-11-29T21:42:18Z

mlflow/server/js/src/experiment-tracking/components/MetricsPlotPanel.js

+// Stop polling when the polling duration exceeds this value
+export const METRICS_PLOT_POLLING_DURATION_MS = 3600 * 1000; // 1 hour


I think we should only stop polling when there's no new data.

@dbczumar I remember we discussed that we should set an appropriate polling threshold when a run never ends, but not setting such a threshold sounds ok to me because runs end in most cases.

Offline discussion: check the timestamp of the last metric, and if it's more than 1 week, then we won't refresh.

dbczumar · 2021-11-29T21:43:15Z

mlflow/server/js/src/experiment-tracking/components/MetricsPlotPanel.js

@@ -22,10 +22,16 @@ import { getUUID } from '../../common/utils/ActionUtils';
 export const CHART_TYPE_LINE = 'line';
 export const CHART_TYPE_BAR = 'bar';

+// Polling interval
+export const METRICS_PLOT_POLLING_INTERVAL_MS = 5000;


Can we increase this to 10 seconds? 5 seems aggressive.

In general, what happens if the refresh fails? Does the page crash?

Can we increase this to 10 seconds? 5 seems aggressive.

Sure!

In general, what happens if the refresh fails? Does the page crash?

Let me test.

when-request-fails.mov

The page doesn't crash.

The page keeps polling.

cedkoffeto · 2021-11-29T22:52:14Z

@cedkoffeto I pushed some commits to update the PR.
👍🏽

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy · 2021-11-30T11:56:13Z

mlflow/server/js/src/experiment-tracking/components/MetricsPlotPanel.js

+export const METRICS_PLOT_POLLING_INTERVAL_MS = 10 * 1000; // 10 seconds
+// A run is considered as 'hanging' if its status is 'RUNNING' but its latest metric was logged
+// prior to this threshold. The metrics plot doesn't automatically update hanging runs.
+export const METRICS_PLOT_HANGING_RUN_THRESHOLD_MS = 3600 * 24 * 7 * 1000; // 1 week


Does "hanging" make sense?

harupy · 2021-11-30T11:56:46Z

mlflow/server/js/src/i18n/default/en.json

@@ -611,6 +611,10 @@
    "defaultMessage": "Registered Models",
    "description": "Text for registered model link in the title for model comparison page"
  },
+  "UEDu0c": {
+    "defaultMessage": "MLflow UI automatically fetches metric histories for active runs and updates the metrics plot with a {interval} second interval.",


Included interval so a user doesn't need to guess or measure how long the interval is.

harupy · 2021-12-02T00:46:37Z

mlflow/server/js/src/experiment-tracking/components/MetricsPlotPanel.test.js

-            { key: 'metric_1', value: 100, step: 2, timestamp: 1556662044000 },
-            { key: 'metric_1', value: 50, step: 1, timestamp: 1556662043000 },
+            { key: 'metric_1', value: 100, step: 2, timestamp: now },
+            { key: 'metric_1', value: 50, step: 1, timestamp: now - 1 },


Replaced hardcoded timestamps with now to prevent the metrics plot from considering these runs as hanging.

dbczumar

LGTM! Awesome work, @cedkoffeto, @harupy ! Thank you so much for this contribution, @cedkoffeto!

Signed-off-by: harupy <hkawamura0130@gmail.com>

cedkoffeto · 2021-12-03T16:02:00Z

much

Thanks! It was a pleasure :)
Thanks also to @harupy for your great help 🙏🏽

github-actions bot added area/uiux Front-end, user experience, plotting, JavaScript, JavaScript dev server rn/feature Mention under Features in Changelogs. labels Nov 7, 2021

cedkoffeto added 3 commits November 7, 2021 21:27

Automatically update metric plots for in-progress runs mlflow#2099

4552061

Signed-off-by: Cedric Koffeto cedkoffeto@gmail.com Signed-off-by: cedric koffeto <cedkoffeto@gmail.com>

eslint corrections

42bb25e

Signed-off-by: cedric koffeto <cedkoffeto@gmail.com>

eslint

8af2942

Signed-off-by: cedric koffeto <cedkoffeto@gmail.com>

cedkoffeto force-pushed the master branch from 662bb4a to 8af2942 Compare November 7, 2021 20:28

harupy self-requested a review November 8, 2021 00:10

bug fix

18b9fac

Signed-off-by: cedric koffeto <cedkoffeto@gmail.com>

This was referenced Nov 10, 2021

Use ... when comparing diff between master and PR branch for cross version tests #5040

Closed

Obtain changed files using GitHub /pulls/{ pr_number }/files API in cross version tests #5041

Merged

harupy reviewed Nov 10, 2021

View reviewed changes

mlflow/server/js/src/experiment-tracking/components/MetricsPlotPanel.js Outdated Show resolved Hide resolved

commit

e4a22cb

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy reviewed Nov 10, 2021

View reviewed changes

mlflow/server/js/src/experiment-tracking/components/MetricsPlotPanel.js Outdated Show resolved Hide resolved

harupy reviewed Nov 11, 2021

View reviewed changes

mlflow/server/js/src/experiment-tracking/components/MetricsPlotPanel.js Outdated Show resolved Hide resolved

checkOnRunUnfinished() replaced

a91c53c

Merge branch 'master' into pr/cedkoffeto/5017

dbc980a

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy added 3 commits November 29, 2021 11:49

cherry pick

7777420

Signed-off-by: harupy <hkawamura0130@gmail.com>

add tests

70408f2

Signed-off-by: harupy <hkawamura0130@gmail.com>

i18n

164eb69

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy added 3 commits November 29, 2021 15:57

check state

90c976b

Signed-off-by: harupy <hkawamura0130@gmail.com>

refactor

df7ad32

Signed-off-by: harupy <hkawamura0130@gmail.com>

add rendering test

c0895fe

Signed-off-by: harupy <hkawamura0130@gmail.com>

dbczumar reviewed Nov 29, 2021

View reviewed changes

harupy added 6 commits November 30, 2021 09:51

increase polling duration

c41d94f

Signed-off-by: harupy <hkawamura0130@gmail.com>

ignore hanging runs

712f9a7

Signed-off-by: harupy <hkawamura0130@gmail.com>

show interval in tooltip

037cefb

Signed-off-by: harupy <hkawamura0130@gmail.com>

rename test

2bb4b62

Signed-off-by: harupy <hkawamura0130@gmail.com>

i18n

a5ed069

Signed-off-by: harupy <hkawamura0130@gmail.com>

lint

ebad40f

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy reviewed Nov 30, 2021

View reviewed changes

harupy reviewed Dec 2, 2021

View reviewed changes

dbczumar approved these changes Dec 2, 2021

View reviewed changes

harupy added 2 commits December 2, 2021 17:29

refactor

643e2ce

Signed-off-by: harupy <hkawamura0130@gmail.com>

fix flaky test

f246b63

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy merged commit 1b4bbb6 into mlflow:master Dec 2, 2021

sim-san mentioned this pull request Jan 19, 2022

[FR] Automatic reload #1849

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically update metric plots for in-progress runs #2099 #5017

Automatically update metric plots for in-progress runs #2099 #5017

cedkoffeto commented Nov 7, 2021 •

edited by harupy

github-actions bot commented Nov 7, 2021

harupy commented Nov 8, 2021

cedkoffeto commented Nov 8, 2021

harupy commented Nov 8, 2021 •

edited

cedkoffeto commented Nov 8, 2021

harupy commented Nov 8, 2021 •

edited

harupy commented Nov 8, 2021 •

edited

dbczumar commented Nov 8, 2021

cedkoffeto commented Nov 8, 2021

cedkoffeto commented Nov 8, 2021 •

edited

harupy commented Nov 9, 2021 •

edited

cedkoffeto commented Nov 19, 2021

harupy commented Nov 26, 2021 •

edited

cedkoffeto commented Nov 26, 2021

harupy commented Nov 29, 2021

dbczumar Nov 29, 2021

harupy Nov 30, 2021 •

edited

harupy Nov 30, 2021

dbczumar Nov 29, 2021 •

edited

dbczumar Nov 29, 2021

harupy Nov 29, 2021

harupy Nov 29, 2021

harupy Nov 30, 2021 •

edited

cedkoffeto commented Nov 29, 2021

harupy Nov 30, 2021

dbczumar Dec 2, 2021

harupy Nov 30, 2021

harupy Dec 2, 2021

dbczumar left a comment •

edited

cedkoffeto commented Dec 3, 2021

		// Stop polling when the polling duration exceeds this value
		export const METRICS_PLOT_POLLING_DURATION_MS = 3600 * 1000; // 1 hour

Automatically update metric plots for in-progress runs #2099 #5017

Automatically update metric plots for in-progress runs #2099 #5017

Conversation

cedkoffeto commented Nov 7, 2021 • edited by harupy

What changes are proposed in this pull request?

How is this patch tested?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

github-actions bot commented Nov 7, 2021

harupy commented Nov 8, 2021

cedkoffeto commented Nov 8, 2021

harupy commented Nov 8, 2021 • edited

cedkoffeto commented Nov 8, 2021

harupy commented Nov 8, 2021 • edited

harupy commented Nov 8, 2021 • edited

Python script I used:

dbczumar commented Nov 8, 2021

cedkoffeto commented Nov 8, 2021

cedkoffeto commented Nov 8, 2021 • edited

harupy commented Nov 9, 2021 • edited

cedkoffeto commented Nov 19, 2021

harupy commented Nov 26, 2021 • edited

cedkoffeto commented Nov 26, 2021

harupy commented Nov 29, 2021

Choose a reason for hiding this comment

harupy Nov 30, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbczumar Nov 29, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harupy Nov 30, 2021 • edited

Choose a reason for hiding this comment

cedkoffeto commented Nov 29, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbczumar left a comment • edited

Choose a reason for hiding this comment

cedkoffeto commented Dec 3, 2021

cedkoffeto commented Nov 7, 2021 •

edited by harupy

harupy commented Nov 8, 2021 •

edited

harupy commented Nov 8, 2021 •

edited

harupy commented Nov 8, 2021 •

edited

cedkoffeto commented Nov 8, 2021 •

edited

harupy commented Nov 9, 2021 •

edited

harupy commented Nov 26, 2021 •

edited

harupy Nov 30, 2021 •

edited

dbczumar Nov 29, 2021 •

edited

harupy Nov 30, 2021 •

edited

dbczumar left a comment •

edited