Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dedicated logs tab for Trials #1764

Closed
kimwnasptd opened this issue Dec 22, 2021 · 11 comments
Closed

Dedicated logs tab for Trials #1764

kimwnasptd opened this issue Dec 22, 2021 · 11 comments
Assignees

Comments

@kimwnasptd
Copy link
Member

/kind feature

Describe the solution you'd like
As part of #1763, and #1745, lets us this issue to discuss how to expose the logs for a Trial.

Looking at the docs there are many different types of workers for a Trial https://www.kubeflow.org/docs/components/katib/trial-template/#custom-resource. The K8s clients can allow the backend to fetch logs from a Pod. With this , the main question I have is how can the backend find the Pod for a specific Trial worker type?

  • Will there always be an annotation/label that reflects the Trial that the Pod belongs to?
  • Is this the case for all worker types?
  • What could we show when we have an Argo Workflow?

I suggest that we handle each worker type separately. We can start with K8s Jobs and/or TFJobs and start adding more later on.

@d-gol @andreyvelich @johnugeorge @gaocegege


Love this feature? Give it a 👍 We prioritize the features with the most 👍

@d-gol
Copy link
Contributor

d-gol commented Feb 4, 2022

@kimwnasptd thank you for creating this issue and organizing the effort!

Let me try to answer your questions to the best of my knowledge:

Will there always be an annotation/label that reflects the Trial that the Pod belongs to?

Yes. There is a label job-name in every pod which belongs to a specific trial - https://github.com/kubeflow/common/blob/v0.4.1/pkg/apis/common/v1/interface.go#L30

job-name is equal to the trial-name - https://github.com/kubeflow/katib/blob/release-0.13/pkg/webhook/v1beta1/pod/inject_webhook.go#L120

In the case of an MPI job, the label is mpi-job-name - https://github.com/kubeflow/training-operator/blob/v1.4-branch/pkg/controller.v1/mpi/mpijob.go#L46

This allows us to obtain all pods belonging to various Trial types (Job, TFJob, PyTorchJob...)

Is this the case for all worker types?

Yes for Job, TFJob, PyTorchJob, MXJob, XGBoostJob and MPIJob.
For Pipelines, I would expect the same, but I would first focus on getting the logs from the other CRDs since Pipeline logs can already be accessed on the Run page.

What could we show when we have an Argo Workflow?

We could show a link (button) that would route to the Pipelines Run page with logs, as in trials table - https://github.com/kubeflow/katib/blob/release-0.13/pkg/new-ui/v1beta1/frontend/src/app/pages/experiment-details/trials-table/trials-table.component.html#L34
Later, we could technically parse responses from the pipeline Run page and show them in Katib UI as well.
For now, since the logs already exist elsewhere, think that a link would be enough.

Please correct me if I'm wrong and share your ideas.

@elenzio9
Copy link
Contributor

I would like to work on this, if no one else is working on it right now!

@johnugeorge
Copy link
Member

Thanks @elenzio9

/assign @elenzio9

Fixes: #971 as well

@elenzio9
Copy link
Contributor

We realized with @kimwnasptd that we need to extend the backend by adding a new route for the LOGS tab which:

  1. Will get a Trial name/namespace
  2. Fetch the underlying Pod name based on the job-name label
  3. Return the logs of that Pod

Also, we saw that if the Trial does not have retain: true then the underlying CRs will not be persisted and thus there won't be any Pods to gather logs from.

We can make the frontend to show a message for this though, to help users understand why they don't see logs for a Trial.

Unfortunately I'm not very familiar with Golang and the backend, so I can't help much there. But would really like to help with the frontend work once we have such a route for the Trial logs!

@d-gol
Copy link
Contributor

d-gol commented Nov 14, 2022

Hi @elenzio9, thank you for your efforts with this!

I did start some work on it a while ago, but didn't finish. Here you can see the functions for extracting logs from all pods from a specific trial: https://github.com/d-gol/katib/blob/64ac7034d81faeeb7a554417f47a0c7c445c3d72/pkg/new-ui/v1beta1/backend.go#L421 and https://github.com/d-gol/katib/blob/cf7106dfae69e66f39582f1ef981ed65a208732b/pkg/util/v1beta1/katibclient/katib_client.go#L230

The reason it's checking multiple pods is because the trial can also be a TFJob, PytorchJob, or any other training operator CR. So we need to fetch logs from each worker (pod), and also find a way to show them nicely in the UI.

Is this something that would be useful for you? If so, I can rebase my changes and take care of the backend. We can even submit separate PRs later. If you can take of the frontend part, that would be amazing!

@johnugeorge
Copy link
Member

That would be great @d-gol

Btw, do you need logs from each pod? We just need logs from Master right?

@d-gol
Copy link
Contributor

d-gol commented Nov 15, 2022

@johnugeorge sure, we can get logs only from the master. I can implement that, submit a PR, and then later if needed we can obtain logs from all workers.

@andreyvelich
Copy link
Member

@kimwnasptd @elenzio9 Since backend changes for Trial logs have been merged: #2039, are we going to make changes in the frontend to see the logs ?
Do we know if we have bandwidth to implement it before Katib 0.15 release ?

@johnugeorge
Copy link
Member

@elenzio9 Are planning in this release ?

@elenzio9
Copy link
Contributor

@johnugeorge @andreyvelich I'm working on it right now, and I'll send the PR as soon as possible.

@andreyvelich
Copy link
Member

This was implemented, thank you for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants