Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create notebooks page and get [500] error #6419

Closed
kindomLee opened this issue Mar 28, 2022 · 8 comments
Closed

Create notebooks page and get [500] error #6419

kindomLee opened this issue Mar 28, 2022 · 8 comments
Labels

Comments

@kindomLee
Copy link

/kind bug

What steps did you take and what happened:
[A clear and concise description of what the bug is.]
Enter notebook page and get this error

[500] An error occured in the backend. https://xxx.endpoints.xxx.cloud.goog/jupyter/api/gpus

image

What did you expect to happen:
can success get GPU Vendor

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Kubeflow version: (version number can be found at the bottom left corner of the Kubeflow dashboard): gcp-blueprints 1.5.0
  • Kubernetes platform: (e.g. minikube): gke
  • Kubernetes version: (use kubectl version): GitVersion:"v1.20.15"

error log from jupyter-web-app-deployment

2022-03-28 09:40:09,681 | kubeflow.kubeflow.crud_backend.errors.handlers | ERROR | Caught and unhandled Exception!
2022-03-28 09:40:09,681 | kubeflow.kubeflow.crud_backend.errors.handlers | ERROR | Invalid value for `type` (KernelDeadlock), must be one of ['DiskPressure', 'MemoryPressure', 'NetworkUnavailable', 'PIDPressure', 'Ready']
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1516, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1502, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "/src/apps/common/routes/get.py", line 66, in get_gpu_vendors
    nodes = api.list_nodes().items
  File "/usr/local/lib/python3.7/site-packages/kubeflow/kubeflow/crud_backend/api/node.py", line 6, in list_nodes
    return v1_core.list_node()
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 16844, in list_node
    return self.list_node_with_http_info(**kwargs)  # noqa: E501
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 16965, in list_node_with_http_info
    collection_formats=collection_formats)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 353, in call_api
    _preload_content, _request_timeout, _host)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 192, in __call_api
    return_data = self.deserialize(response_data, response_type)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 264, in deserialize
    return self.__deserialize(data, response_type)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 303, in __deserialize
    return self.__deserialize_model(data, klass)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 639, in __deserialize_model
    kwargs[attr] = self.__deserialize(value, attr_type)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 281, in __deserialize
    for sub_data in data]
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 281, in <listcomp>
    for sub_data in data]
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 303, in __deserialize
    return self.__deserialize_model(data, klass)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 639, in __deserialize_model
    kwargs[attr] = self.__deserialize(value, attr_type)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 303, in __deserialize
    return self.__deserialize_model(data, klass)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 639, in __deserialize_model
    kwargs[attr] = self.__deserialize(value, attr_type)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 281, in __deserialize
    for sub_data in data]
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 281, in <listcomp>
    for sub_data in data]
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 303, in __deserialize
    return self.__deserialize_model(data, klass)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 641, in __deserialize_model
    instance = klass(**kwargs)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/models/v1_node_condition.py", line 76, in __init__
    self.type = type
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/models/v1_node_condition.py", line 221, in type
    .format(type, allowed_values)
ValueError: Invalid value for `type` (KernelDeadlock), must be one of ['DiskPressure', 'MemoryPressure', 'NetworkUnavailable', 'PIDPressure', 'Ready']
10.120.2.3 - - [28/Mar/2022:09:40:09 +0000] "GET /api/gpus HTTP/1.1" 500 132 "https://xxx.endpoints.xxx.cloud.goog/jupyter/new" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.83 Safari/537.3
@kindomLee
Copy link
Author

Anything else you would like to add:

-> No error message at the beginning when opening the page。
Until after triggering the message that gke gpu specific type out of stock, [1] Then I did the gke gpu limit adjustment, added some available GPU types.
When I come back to this page, there is an error message and I can't recover it

[1] https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-provisioning#gpu_limits

This was referenced Mar 29, 2022
@kindomLee
Copy link
Author

@benjamintanweihao It looks like it will be fixed after the 1.23 update, but it doesn't look like the merge has been successful yet kubernetes/kubernetes#108740

@mokpolar
Copy link

@kindomLee Hello! I'm having the same problem. Did you solve this problem? Could you tell me what did you do?

@mokpolar
Copy link

This issue was solved. use 1.6.0 image

@saumilsdk
Copy link

@mokpolar Using 1.6.0 image of kubeflow? Can we not fix it with KF 1.5?

@mokpolar
Copy link

mokpolar commented Aug 2, 2022

@saumilsdk Or.. you could build your own kubeflow container image using 1.5.0

@juliusvonkohout
Copy link
Member

/close

There has been no activity for a long time. Please reopen if necessary.
Please also consult the Kubeflow slack channel for support questions.

@google-oss-prow
Copy link

@juliusvonkohout: Closing this issue.

In response to this:

/close

There has been no activity for a long time. Please reopen if necessary.
Please also consult the Kubeflow slack channel for support questions.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Needs Triage automation moved this from To Do to Closed Aug 25, 2023
@kubeflow-bot kubeflow-bot removed this from Closed in Needs Triage Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants