Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinct page for each Trial in the UI #1783

Merged
merged 1 commit into from
Jun 29, 2022

Conversation

d-gol
Copy link
Contributor

@d-gol d-gol commented Jan 19, 2022

What this PR does / why we need it:

This PR implements a distinct page for each Trial, as discussed in #1763.
It adds an OVERVIEW tab to the Trial page to show Trial name, status, performance, and conditions.

The plan is to add additional tabs, such as LOGS (#971, #1764), DETAILS and YAML to show full information about a trial.

Which issue(s) this PR fixes

Fixes #1763

@aws-kf-ci-bot
Copy link
Contributor

Hi @d-gol. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kimwnasptd
Copy link
Member

That's awesome @d-gol, thank you for your work! I'll give it a first pass today. We might be able to squish this first step in for the KF 1.5 release. But let's discuss this more after a first round of reviews

/ok-to-test

@andreyvelich
Copy link
Member

Thank you for this great work @d-gol!
I think it's better to not include this change in Katib 0.13.
We are planing to make the first RC today/tomorrow (ref #1774) and this PR requires some time to review.

We can continue our work on this change and include them in the future Katib releases (0.14).
Does it sound good @d-gol @kimwnasptd @kubeflow/wg-automl-leads ?

@d-gol
Copy link
Contributor Author

d-gol commented Jan 25, 2022

Thank you @kimwnasptd and @andreyvelich!

Both options are fine for me, whichever you prefer. If there is no time to review it, we can leave it for the next release. And there will be time to implement the LOGS tab as well.

@stale
Copy link

stale bot commented Jun 12, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@johnugeorge
Copy link
Member

@d-gol Sorry for the whole delay. Can you rebase the changes?

@stale stale bot removed the lifecycle/stale label Jun 13, 2022
@d-gol
Copy link
Contributor Author

d-gol commented Jun 15, 2022

@johnugeorge no worries! Sure, I'll rebase by the end of next week.

@d-gol
Copy link
Contributor Author

d-gol commented Jun 15, 2022

@johnugeorge rebased

@coveralls
Copy link

coveralls commented Jun 15, 2022

Coverage Status

Coverage decreased (-0.02%) to 72.951% when pulling 793d8c8 on d-gol:trial-page into 9ee8fda on kubeflow:master.

@kimwnasptd
Copy link
Member

@d-gol apologies for the huge delay on this one, I've started taking a look.

First of this looks awesome! @johnugeorge here's also an image of the new page, to help you give us feedback as well.
(The reason the graph doesn't have any lines is because of the specific Trial)
image

I'm also including the YAML of the above Trial

YAML
apiVersion: kubeflow.org/v1beta1
kind: Trial
metadata:
  annotations:
    kubeflow-kale.org/kfp-run-uuid: d556db46-4c36-4509-9ebe-a6f56d21bdcc
  creationTimestamp: "2022-05-12T08:42:28Z"
  finalizers:
  - clean-metrics-in-db
  generation: 1
  labels:
    katib.kubeflow.org/experiment: kale-automl-y1mw41
  name: kale-automl-y1mw41-ab5a9937
  namespace: kubeflow-user
  ownerReferences:
  - apiVersion: kubeflow.org/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: Experiment
    name: kale-automl-y1mw41
    uid: 72ac6842-7e26-41e9-8da9-8a028d19d5ab
  resourceVersion: "8583745"
  selfLink: /apis/kubeflow.org/v1beta1/namespaces/kubeflow-user/trials/kale-automl-y1mw41-ab5a9937
  uid: 463d789e-8521-4f6c-99a8-e6604e629eb8
spec:
  failureCondition: status.conditions.#(type=="Failed")#|#(status=="True")#
  metricsCollector:
    collector:
      kind: StdOut
  objective:
    metricStrategies:
    - name: mean_squared_log_error
      value: max
    objectiveMetricName: mean_squared_log_error
    type: maximize
  parameterAssignments:
  - name: bootstrap
    value: "True"
  - name: criterion
    value: friedman_mse
  - name: max_features
    value: "0.1"
  - name: min_samples_leaf
    value: "1"
  - name: min_samples_split
    value: "2"
  primaryContainerName: main
  retainRun: true
  runSpec:
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: kale-automl-y1mw41-ab5a9937
      namespace: kubeflow-user
    spec:
      backoffLimit: 0
      template:
        metadata:
          annotations:
            sidecar.istio.io/inject: "false"
          labels:
            access-ml-pipeline: "true"
        spec:
          containers:
          - command:
            - python3 -u -c "from kale.common.katibutils import create_and_wait_kfp_run;               create_and_wait_kfp_run(pipeline_id='5f031196-9fa0-4300-b1ef-d7e9d7486082',
              version_id='a63681cf-2dd5-4bc8-8bde-d6fa44f76136', run_name='kale-automl-y1mw41-ab5a9937',
              experiment_name='kale-automl-y1mw41', api_version='v1beta1', pipeline_parameters={'bootstrap':'True',
              'criterion':'friedman_mse', 'max_features':'0.1', 'min_samples_leaf':'1',
              'min_samples_split':'2'})"
            image: gcr.io/arrikto/kale-py38:release-1.5-l0-release-1.5-rc1-17-gef3a04d6a
            name: main
          restartPolicy: Never
          serviceAccountName: pipeline-runner
  successCondition: status.conditions.#(type=="Complete")#|#(status=="True")#
status:
  completionTime: "2022-05-12T08:54:49Z"
  conditions:
  - lastTransitionTime: "2022-05-12T08:42:28Z"
    lastUpdateTime: "2022-05-12T08:42:28Z"
    message: Trial is created
    reason: TrialCreated
    status: "True"
    type: Created
  - lastTransitionTime: "2022-05-12T08:54:49Z"
    lastUpdateTime: "2022-05-12T08:54:49Z"
    message: Trial is running
    reason: TrialRunning
    status: "False"
    type: Running
  - lastTransitionTime: "2022-05-12T08:54:49Z"
    lastUpdateTime: "2022-05-12T08:54:49Z"
    message: Trial has succeeded
    reason: TrialSucceeded
    status: "True"
    type: Succeeded
  observation:
    metrics:
    - latest: "0.3525149502461068"
      max: "0.3525149502461068"
      min: "0.3525149502461068"
      name: mean_squared_log_error
  startTime: "2022-05-12T08:42:28Z"

I have the following initial product/UX suggestions, and then I'll move over to the code review:

  1. Let's also show the .status.completionTime at the OVERVIEW tab as well
  2. Since the .status.metrics can be a list I propose the following:
    • Let's have some sub-sections, like we do in KServe web app, for each metric
    • The title of each section can be Metric: {{ .status.observation.metrics[i].name }}
    • The columns inside that sub-section will be the other fields, max: ..., min: ... etc

This is what I meant by sub-sections in the KServe web app
image

@d-gol @johnugeorge wdyt?

@d-gol
Copy link
Contributor Author

d-gol commented Jun 20, 2022

Thank you, @kimwnasptd! Your suggestions seem good to me, I can add .status.completionTime in the OVERVIEW tab, and .status.metrics as it is done in the KServe web app. Then I will update the PR.

If you have additional suggestions, please let me know!

@johnugeorge
Copy link
Member

johnugeorge commented Jun 25, 2022

@d-gol Feature freeze for this release is on June 29th. We plan to cut rc release by this week. It would be great if we can take this in the rc release.

@d-gol
Copy link
Contributor Author

d-gol commented Jun 26, 2022

@johnugeorge thanks for the info! Edited the code and resubmitted it. Now it includes the changes that @kimwnasptd suggested: showing completionTime and metrics in the OVERVIEW.

We can edit the details, please let me know if you have any other suggestions.

Screenshot 2022-06-26 at 12 23 04

@johnugeorge
Copy link
Member

johnugeorge commented Jun 28, 2022

@kimwnasptd Can you please review this and lgtm as this has to go in v0.14.0 release?

Related: #1908

@kimwnasptd
Copy link
Member

@d-gol it looks awesome, thanks!

I don't have any further comments, even after a look in the code. The only future work item I'd add is to add some simple unit tests, but currently even the existing ones are broken.

@elenzio9 has some work underway for fixing these tests, so we can do a pass then.

/lgtm
/approve

@johnugeorge
Copy link
Member

/approve

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: d-gol, johnugeorge, kimwnasptd

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 2d35224 into kubeflow:master Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[katib-trials-page] Have a distinct page for each Trial in the UI
6 participants