Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backend for getting logs of a trial #2039

Merged
merged 9 commits into from Dec 24, 2022

Conversation

d-gol
Copy link
Contributor

@d-gol d-gol commented Nov 25, 2022

Implementation of the backend for fetching logs, as a part of #1764.

Providing a route /katib/fetch_trial_logs/ to obtain logs for a specific trial.
Logs are obtained from a master pod.

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for implementing this @d-gol!
I left few comments.

pkg/new-ui/v1beta1/backend.go Outdated Show resolved Hide resolved
pkg/new-ui/v1beta1/backend.go Outdated Show resolved Hide resolved
pkg/new-ui/v1beta1/backend.go Outdated Show resolved Hide resolved
pkg/new-ui/v1beta1/backend.go Outdated Show resolved Hide resolved
pkg/new-ui/v1beta1/backend.go Outdated Show resolved Hide resolved
pkg/new-ui/v1beta1/backend.go Outdated Show resolved Hide resolved
pkg/new-ui/v1beta1/backend.go Outdated Show resolved Hide resolved
pkg/new-ui/v1beta1/backend.go Outdated Show resolved Hide resolved
@d-gol
Copy link
Contributor Author

d-gol commented Nov 30, 2022

Thank you for implementing this @d-gol! I left few comments.

Hey @andreyvelich, thanks a lot for checking! I will modify the PR according to your suggestions. I left some comments for clarification.

@tenzen-y
Copy link
Member

tenzen-y commented Dec 2, 2022

@d-gol Thanks for your effort.
We merged #2047 with the master branch to mutate the trial name label to pods. So, can you rebase this PR?

@d-gol d-gol force-pushed the ui-logs-backend branch 2 times, most recently from bc8200e to 4b3ecd1 Compare December 3, 2022 14:00
@d-gol
Copy link
Contributor Author

d-gol commented Dec 3, 2022

Thank you @tenzen-y and @andreyvelich, rebased it now.

@johnugeorge
Copy link
Member

Thanks Dejan

/lgtm

trialName := r.URL.Query()["trialName"][0]
namespace := r.URL.Query()["namespace"][0]

user, err := IsAuthorized(consts.ActionTypeGet, namespace, consts.PluralTrial, "", trialName, trialsv1beta1.SchemeGroupVersion, k.katibClient.GetClient(), r)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it enough to check if user can get the Trial ?
Should we also verify that user can view logs from pods @d-gol @kimwnasptd @apo-ger ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. Added authorization checks for listing the pods and getting the logs. Had to reorganize the code a bit to fit in the additional checks, but the logic is the same. Adding as a separate commit, in the end we can squash the commits.

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d-gol Thanks for driving this!

/lgtm
/assign @andreyvelich

Copy link
Member

@kimwnasptd kimwnasptd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d-gol @andreyvelich apologies for the late reply. Mostly have some questions around the return type and the structure of the response data.

trialName := trialNames[0]
namespace := namespaces[0]

user, err := IsAuthorized(consts.ActionTypeGet, namespace, consts.PluralTrial, "", trialName, trialsv1beta1.SchemeGroupVersion, k.katibClient.GetClient(), r)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly orthogonal to the PR, but I think the function signature user, err = IsAuthorized(...) is not ideal. We end up getting the same information and do duplicate checks on errors depending on the user value.

Why not have a distinct function for getting the current user and do this check once? Then IsAuthorized(user, ...) will only be responsible for the SubjectAccessReviews check.

Or at least for this PR, we could only check if the returned user is not "" only once, the first time we call this function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, we are checking twice the user. I think we can proceed fixing the auth after we merge this PR? Then we can have a separate PR to improve the authentication in the entire file. Or we can do the other way around?

pkg/new-ui/v1beta1/backend.go Outdated Show resolved Hide resolved
pkg/new-ui/v1beta1/backend.go Show resolved Hide resolved
Comment on lines +679 to +677
if err != nil {
log.Printf("Marshal logs failed: %v", err)
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
if _, err = w.Write(response); err != nil {
log.Printf("Write logs failed: %v", err)
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question 2: Would we expect in the future to return logs from other worker pods?

If that's the case I'd propose that the backend actually returns a JSON type response like

logs: {
    master: "..."
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea, if we want to change in the future. @johnugeorge @andreyvelich what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like @kimwnasptd idea, let's add Primary Pod Label to the JSON response.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @andreyvelich did you mean the result to be in the form:

logs: {
    master: "..."
}

or something else?
We can have multiple primary pod labels, did you mean to also add them as key value pairs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

master is bit confusing term. Eg: There can be job with just workers where worker0 acts as the master. If we really need to add pod info, pod name might be better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d-gol It's not always "master" label for the pod that we get labels.
For example, for Argo we get pod with katib.kubeflow.org/model-training: true label.
Maybe pod name to include in the response make sense as @johnugeorge mentioned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with creating separate issue to track improvements for this API response (e.g. add trial name to the response).
So we can merge this PR and unblock UI team to start working on the UI changes to have this feature in the next release.
What do you think @d-gol @kimwnasptd @johnugeorge ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. +1 to merge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to merge it, and later we can improve the API response with more information if needed. So again, to clarify, we want to merge this PR with a simple string response (current implementation)? Or in the form of json, like below?

{
    "pod_name": logs
}

@tenzen-y
Copy link
Member

@d-gol Can you rebase this since we have merged #2064 to fix CI into the master branch?

@johnugeorge
Copy link
Member

@d-gol We can merge this. Can you create an issue to track the Json response discussion ? Also, please do a rebase

@d-gol
Copy link
Contributor Author

d-gol commented Dec 23, 2022

@andreyvelich great, thank you!

@johnugeorge
Copy link
Member

@d-gol Can you try a failing e2e test locally?

@d-gol
Copy link
Contributor Author

d-gol commented Dec 23, 2022

@johnugeorge sure, checking it.

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d-gol Thanks for implementing this powerful feature!

LGTM
Although I wonder why our E2E failed.

pkg/new-ui/v1beta1/backend.go Outdated Show resolved Hide resolved
@tenzen-y
Copy link
Member

@johnugeorge @andreyvelich @d-gol Same errors seem to occur for E2E Test with Katib UI, random search, and postgres / e2e and E2E Test with mxnet-mnist / e2e in #2060 and #2067.

@google-oss-prow google-oss-prow bot removed the lgtm label Dec 23, 2022
@tenzen-y
Copy link
Member

It seems that errors are caused by mxnet-mnist image. You can reproduce by ytenzen/mxnet-mnist:debug-error.

@tenzen-y
Copy link
Member

ASAP, I will create a PR to fix this issue.

@tenzen-y
Copy link
Member

Blocked by: #2070

@tenzen-y
Copy link
Member

@d-gol Can you rebase since we fixed CI?

@d-gol
Copy link
Contributor Author

d-gol commented Dec 24, 2022

@tenzen-y done, thank you for fixing the CI!

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d-gol Thank you for the update!
/lgtm
/approve

@google-oss-prow google-oss-prow bot added the lgtm label Dec 24, 2022
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: d-gol, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit c9dd1b4 into kubeflow:master Dec 24, 2022
@d-gol
Copy link
Contributor Author

d-gol commented Dec 24, 2022

@tenzen-y great, thanks a lot!
And thanks everyone for all the help with this @andreyvelich @johnugeorge @kimwnasptd @apo-ger

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants