Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backend.ai manager server does not start propely after upgrading python version from 3.9 to 3.10 #389

Closed
jungyh0218 opened this issue Mar 15, 2022 · 2 comments
Labels
type:bug Reports about that are not working

Comments

@jungyh0218
Copy link

jungyh0218 commented Mar 15, 2022

Describe the bug
A clear and concise description of what the bug is.
문제가 발생하는 상황에 대한 요약 설명을 적어주세요.

Manager server does not start properly after upgrading python version from 3.9 to 3.10 and an error message is printed out.

To Reproduce
Steps to reproduce the behavior:
문제를 재현하기 위한 순서를 자세히 적어주세요:

  1. Install backend.ai with scripts/install-dev.sh file.
  2. Wait until the backend.ai_dev directory is created.
  3. Move to backend.ai_dev/manager folder.
  4. Run manager server. ( python -m ai.backend.manager.server --debug)
  5. See error.

Expected behavior
A clear and concise description of what you expected to happen.
원래 기대했던 동작은 어떤 것인지 명확하게 적어주세요.

Manager server starts to run.

Screenshots
If applicable, add screenshots to help explain your problem.
Just take a screenshot in your clipboard or copy a picture file and paste it here (Ctrl+V/Cmd+V) so that GitHub automatically attach and insert it.
가능하다면, 문제 상황을 보여주는 스크린샷을 첨부해주세요.
클립보드에 스크린샷을 찍거나 이미지 파일을 복사해서 여기에 붙여넣으면(Ctrl+V/Cmd+V) GitHub이 자동으로 이곳에 첨부 및 삽입해줍니다.

022-03-15 10:54:18.422 ERROR __main__ [42700] Error initializing cleanup_contexts: storage_manager_ctx
Traceback (most recent call last):
  File "/home/yoohee/.pyenv/versions/venv-ywvfkji0-manager/lib/python3.10/site-packages/backend.ai_manager-22.3.0a1-py3.10.egg/ai/backend/manager/server.py", line 527, in _cleanup_context_wrapper
    async with cctx_instance:
  File "/home/yoohee/.pyenv/versions/3.10.2/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/home/yoohee/.pyenv/versions/venv-ywvfkji0-manager/lib/python3.10/site-packages/backend.ai_manager-22.3.0a1-py3.10.egg/ai/backend/manager/server.py", line 347, in storage_manager_ctx
    config = volume_config_iv.check(raw_vol_config)
  File "/home/yoohee/.pyenv/versions/venv-ywvfkji0-manager/lib/python3.10/site-packages/trafaret/base.py", line 110, in check
    return self.transform(value, context=context)
  File "/home/yoohee/.pyenv/versions/venv-ywvfkji0-manager/lib/python3.10/site-packages/trafaret/base.py", line 1150, in transform
    self._failure(error=errors, code=codes.SOME_ELEMENTS_DID_NOT_MATCH)
  File "/home/yoohee/.pyenv/versions/venv-ywvfkji0-manager/lib/python3.10/site-packages/trafaret/base.py", line 140, in _failure
    raise DataError(error=error, value=value, trafaret=self, code=code)
trafaret.dataerror.DataError: {'default_host': DataError('is required'), 'proxies': DataError('is required')}
Traceback (most recent call last):
  File "/home/yoohee/.pyenv/versions/venv-ywvfkji0-manager/lib/python3.10/site-packages/backend.ai_manager-22.3.0a1-py3.10.egg/ai/backend/manager/server.py", line 613, in server_main
    await runner.setup()
  File "/home/yoohee/.pyenv/versions/venv-ywvfkji0-manager/lib/python3.10/site-packages/aiohttp/web_runner.py", line 279, in setup
    self._server = await self._make_server()
  File "/home/yoohee/.pyenv/versions/venv-ywvfkji0-manager/lib/python3.10/site-packages/aiohttp/web_runner.py", line 375, in _make_server
    await self._app.startup()
  File "/home/yoohee/.pyenv/versions/venv-ywvfkji0-manager/lib/python3.10/site-packages/aiohttp/web_app.py", line 417, in startup
    await self.on_startup.send(self)
  File "/home/yoohee/.pyenv/versions/venv-ywvfkji0-manager/lib/python3.10/site-packages/aiosignal/__init__.py", line 36, in send
    await receiver(*args, **kwargs)  # type: ignore
  File "/home/yoohee/.pyenv/versions/venv-ywvfkji0-manager/lib/python3.10/site-packages/aiohttp/web_app.py", line 539, in _on_startup
    await it.__anext__()
StopAsyncIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/yoohee/.pyenv/versions/venv-ywvfkji0-manager/lib/python3.10/site-packages/backend.ai_manager-22.3.0a1-py3.10.egg/ai/backend/manager/server.py", line 655, in server_main_logwrapper
    async with server_main(loop, pidx, _args):
  File "/home/yoohee/.pyenv/versions/3.10.2/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
RuntimeError: async generator raised StopAsyncIteration
2022-03-15 10:54:18.851 ERROR aiotools.server [42700] Worker 0: Error during context manager initialization
Traceback (most recent call last):
  File "/home/yoohee/.pyenv/versions/venv-ywvfkji0-manager/lib/python3.10/site-packages/aiotools/server.py", line 319, in _wrapped_worker
    async with ctx:
  File "/home/yoohee/.pyenv/versions/3.10.2/lib/python3.10/contextlib.py", line 201, in __aenter__
    raise RuntimeError("generator didn't yield") from None
RuntimeError: generator didn't yield
2022-03-15 10:54:18.853 ERROR __main__ [42700] Error initializing cleanup_contexts: database_ctx
Traceback (most recent call last):
  File "/home/yoohee/.pyenv/versions/venv-ywvfkji0-manager/lib/python3.10/site-packages/backend.ai_manager-22.3.0a1-py3.10.egg/ai/backend/manager/server.py", line 527, in _cleanup_context_wrapper
    async with cctx_instance:
  File "/home/yoohee/.pyenv/versions/3.10.2/lib/python3.10/contextlib.py", line 217, in __aexit__
    await self.gen.athrow(typ, value, traceback)
  File "/home/yoohee/.pyenv/versions/venv-ywvfkji0-manager/lib/python3.10/site-packages/backend.ai_manager-22.3.0a1-py3.10.egg/ai/backend/manager/server.py", line 308, in database_ctx
    async with connect_database(root_ctx.local_config) as db:
  File "/home/yoohee/.pyenv/versions/3.10.2/lib/python3.10/contextlib.py", line 249, in __aexit__
    raise RuntimeError("generator didn't stop after athrow()")
RuntimeError: generator didn't stop after athrow()
2022-03-15 10:54:18.854 ERROR __main__ [42700] Error initializing cleanup_contexts: redis_ctx
Traceback (most recent call last):
  File "/home/yoohee/.pyenv/versions/venv-ywvfkji0-manager/lib/python3.10/site-packages/backend.ai_manager-22.3.0a1-py3.10.egg/ai/backend/manager/server.py", line 527, in _cleanup_context_wrapper
    async with cctx_instance:
  File "/home/yoohee/.pyenv/versions/3.10.2/lib/python3.10/contextlib.py", line 249, in __aexit__
    raise RuntimeError("generator didn't stop after athrow()")
RuntimeError: generator didn't stop after athrow()

Server:

  • OS: Ubuntu 20.04 on Google Cloud VM.
  • Backend.AI version: Latest (main branch)
  • Python version: 3.10.2 (installed with pyenv)
  • Installation method: On Google Cloud VM, using scripts/install-dev.sh file

Additional context
Add any other context about the problem here.

backend.ai_dev was created but there are some missing files.
There is no run-with-halfstack.sh file in manager/scripts folder.
There are no authorizing scripts (e.g. env-local-user-session.sh) in client-py folder.

Additional)
I tried to install backend.ai with macOS 11 this time and it still fails. While installing with the automated script file, I found out there was an error while configuring the Lablup's official Docker registry. Following is the error stacktrace.

[69080] An error occurred.
Traceback (most recent call last):
  File "/Users/yoohee/projects/lablup/backend.ai-dev/manager/src/ai/backend/manager/cli/image_impl.py", line 123, in rescan_images
    await rescan_images_func(etcd, db, registry=registry)
  File "/Users/yoohee/projects/lablup/backend.ai-dev/manager/src/ai/backend/manager/models/image.py", line 101, in rescan_images
    async with aiotools.TaskGroup() as tg:
  File "/Users/yoohee/.pyenv/versions/3.10.2/envs/venv-mbxbopf0-manager/lib/python3.10/site-packages/aiotools/taskgroup/base_compat.py", line 189, in __aexit__
    raise me from None
aiotools.taskgroup.types.TaskGroupError: ('unhandled errors in a TaskGroup; 1 sub errors: (ProgrammingError)\n + ProgrammingError: (sqlalchemy.dialects.postgresql.asyncpg.ProgrammingError) <class \'asyncpg.exceptions.UndefinedTableError\'>: relation "images" does not exist\n[SQL: SELECT images.id, images.name, images.image, images.created_at, images.tag, images.registry, images.architecture, images.config_digest, images.size_bytes, images.type, images.accelerators, images.labels, images.resources \nFROM images \nWHERE ROW(images.name, images.architecture) IN ((%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s), (%s, %s))]\n[parameters: (\'cr.backend.ai/community/spectravis:19.10\', \'x86_64\', \'cr.backend.ai/stable/julia:1.6-ubuntu20.04\', \'x86_64\', \'cr.backend.ai/stable/julia:1.5-ubuntu20.04\', \'x86_64\', \'cr.backend.ai/stable/swift:5.3-ubuntu20.04\', \'x86_64\', \'cr.backend.ai/stable/ubuntu-linux:18.04-xfce\', \'x86_64\', \'cr.backend.ai/stable/r-base:3.6\', \'x86_64\', \'cr.backend.ai/stable/r-base:4.0\', \'x86_64\', \'cr.backend.ai/stable/r-base:4.1\', \'x86_64\', \'cr.backend.ai/stable/python:3.10-ubuntu20.04\', \'x86_64\', \'cr.backend.ai/stable/python:3.6-ubuntu18.04\', \'x86_64\', \'cr.backend.ai/stable/python:3.7-ubuntu18.04\', \'x86_64\', \'cr.backend.ai/stable/python:2.7-ubuntu18.04\', \'x86_64\', \'cr.backend.ai/stable/python:3.8-ubuntu18.04\', \'x86_64\', \'cr.backend.ai/stable/python:3.7-ubuntu20.04\', \'x86_64\', \'cr.backend.ai/stable/python:3.8-ubuntu20.04-arm64\', \'aarch64\', \'cr.backend.ai/stable/python:3.8-ubuntu20.04\', \'x86_64\', \'cr.backend.ai/stable/python-pytorch:1.0-py36-cuda10\', \'x86_64\', \'cr.backend.ai/stable/python-ff:21.01-py38-cuda11.1\', \'x86_64\', \'cr.backend.ai/stable/python-pytorch:1.1-py36-cuda10\', \'x86_64\', \'cr.backend.ai/stable/python-pytorch:1.2-py36-cuda10\', \'x86_64\', \'cr.backend.ai/stable/python:3.9-ubuntu20.04\', \'x86_64\', \'cr.backend.ai/stable/python-pytorch:1.3-py36-cuda10.1\', \'x86_64\', \'cr.backend.ai/stable/python-pytorch:1.4-py36-cuda10.1\', \'x86_64\', \'cr.backend.ai/stable/python-pytorch:1.5-py36-cuda10.1\', \'x86_64\', \'cr.backend.ai/stable/python-pytorch:1.7-py36-cuda10.1\', \'x86_64\', \'cr.backend.ai/stable/python-pytorch:1.6-py36-cuda10.1\', \'x86_64\', \'cr.backend.ai/stable/python-pytorch:1.7-py38-cuda11.1\', \'x86_64\', \'cr.backend.ai/stable/python-pytorch:1.8-py38-cuda11.1\', \'x86_64\', \'cr.backend.ai/stable/python-tensorflow:2.3-py36-cuda10.1\', \'x86_64\', \'cr.backend.ai/stable/python-tensorflow:1.15-py36-cuda10\', \'x86_64\', \'cr.backend.ai/stable/python-tensorflow:2.4-py38-cuda11.1\', \'x86_64\', \'cr.backend.ai/stable/python-tensorflow:2.5-py38-cuda11.3\', \'x86_64\', \'cr.backend.ai/stable/python-tensorflow:2.6-py38-cuda11.3\', \'x86_64\', \'cr.backend.ai/stable/filebrowser:21.02-ubuntu20.04\', \'x86_64\', \'cr.backend.ai/stable/python-tensorflow:2.7-py38-cuda11.3\', \'x86_64\', \'cr.backend.ai/stable/julia-flux:0.11-ji15\', \'x86_64\', \'cr.backend.ai/stable/julia-flux:0.11-ubuntu20.04\', \'x86_64\', \'cr.backend.ai/stable/julia-flux:0.12-ji16\', \'x86_64\', \'cr.backend.ai/stable/julia-flux:0.12-ji16-cuda11.3\', \'x86_64\', \'cr.backend.ai/stable/julia-flux:0.12-ubuntu20.04\', \'x86_64\', \'cr.backend.ai/community/afni:ubuntu18.04\', \'x86_64\', \'cr.backend.ai/stable/fortran-caf:8.3\', \'x86_64\', \'cr.backend.ai/stable/python-torch:1.5-py36-cuda10.1\', \'x86_64\')]\n(Background on this error at: https://sqlalche.me/e/14/f405)\n |   File "/Users/yoohee/projects/lablup/backend.ai-dev/manager/src/ai/backend/manager/container_registry/base.py", line 111, in rescan_single_registry\n |     existing_images = await session.scalars(\n |   File "/Users/yoohee/.pyenv/versions/3.10.2/envs/venv-mbxbopf0-manager/lib/python3.10/site-packages/sqlalchemy/ext/asyncio/session.py", line 269, in scalars\n |     result = await self.execute(\n |   File "/Users/yoohee/.pyenv/versions/3.10.2/envs/venv-mbxbopf0-manager/lib/python3.10/site-packages/sqlalchemy/ext/asyncio/session.py", line 212, in execute\n |     result = await greenlet_spawn(\n |   File "/Users/yoohee/.pyenv/versions/3.10.2/envs/venv-mbxbopf0-manager/lib/python3.10/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 134, in greenlet_spawn\n |     result = context.throw(*sys.exc_info())\n |   File "/Users/yoohee/.pyenv/versions/3.10.2/envs/venv-mbxbopf0-manager/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1692, in execute\n |     result = conn._execute_20(statement, params or {}, execution_options)\n |   File "/Users/yoohee/.pyenv/versions/3.10.2/envs/venv-mbxbopf0-manager/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1620, in _execute_20\n |     return meth(self, args_10style, kwargs_10style, execution_options)\n |   File "/Users/yoohee/.pyenv/versions/3.10.2/envs/venv-mbxbopf0-manager/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 325, in _execute_on_connection\n |     return connection._execute_clauseelement(\n |   File "/Users/yoohee/.pyenv/versions/3.10.2/envs/venv-mbxbopf0-manager/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1487, in _execute_clauseelement\n |     ret = self._execute_context(\n |   File "/Users/yoohee/.pyenv/versions/3.10.2/envs/venv-mbxbopf0-manager/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1851, in _execute_context\n |     self._handle_dbapi_exception(\n |   File "/Users/yoohee/.pyenv/versions/3.10.2/envs/venv-mbxbopf0-manager/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2032, in _handle_dbapi_exception\n |     util.raise_(\n |   File "/Users/yoohee/.pyenv/versions/3.10.2/envs/venv-mbxbopf0-manager/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 207, in raise_\n |     raise exception\n |   File "/Users/yoohee/.pyenv/versions/3.10.2/envs/venv-mbxbopf0-manager/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1808, in _execute_context\n |     self.dialect.do_execute(\n |   File "/Users/yoohee/.pyenv/versions/3.10.2/envs/venv-mbxbopf0-manager/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute\n |     cursor.execute(statement, parameters)\n |   File "/Users/yoohee/.pyenv/versions/3.10.2/envs/venv-mbxbopf0-manager/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 479, in execute\n |     self._adapt_connection.await_(\n |   File "/Users/yoohee/.pyenv/versions/3.10.2/envs/venv-mbxbopf0-manager/lib/python3.10/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 76, in await_only\n |     return current.driver.switch(awaitable)\n |   File "/Users/yoohee/.pyenv/versions/3.10.2/envs/venv-mbxbopf0-manager/lib/python3.10/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 129, in greenlet_spawn\n |     value = await result\n |   File "/Users/yoohee/.pyenv/versions/3.10.2/envs/venv-mbxbopf0-manager/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 454, in _prepare_and_execute\n |     self._handle_exception(error)\n |   File "/Users/yoohee/.pyenv/versions/3.10.2/envs/venv-mbxbopf0-manager/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 389, in _handle_exception\n |     self._adapt_connection._handle_exception(error)\n |   File "/Users/yoohee/.pyenv/versions/3.10.2/envs/venv-mbxbopf0-manager/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 682, in _handle_exception\n |     raise translated_error from error\n\n',)
Usage: python -m ai.backend.manager.cli etcd alias [OPTIONS] ALIAS TARGET
                                                   ARCHITECTURE
Try 'python -m ai.backend.manager.cli etcd alias -h' for help.

Error: Missing argument 'ARCHITECTURE'.
@jungyh0218 jungyh0218 added the type:bug Reports about that are not working label Mar 15, 2022
@jungyh0218
Copy link
Author

jungyh0218 commented Mar 16, 2022

I installed backend.ai server with this command
meta/scripts/install-dev.sh` --server-branch 21.09 --client-branch 21.09.3
and it works fine. I guess upgrading Python from 3.9.x to 3.10.2 is the cause of the issue.

@achimnol
Copy link
Member

Now we have migrated to the mono-repo (#417), and this issue should be revisited with the result of it and #300. Closing as no-longer valid. Feel free to open another issue if you still have similar problems with the new architecture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Reports about that are not working
Projects
None yet
Development

No branches or pull requests

2 participants