Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] [OSPP2023] Support for deploying Cloudpods on Ubuntu 22.04(支持将 Cloudpods 融合云平台部署在 Ubuntu 22.04 发行版) #17543

Closed
niconical opened this issue Jul 17, 2023 · 12 comments · Fixed by yunionio/ocboot#935
Assignees
Labels

Comments

@niconical
Copy link

niconical commented Jul 17, 2023

支持将 Cloudpods 融合云平台部署在 Ubuntu 22.04 发行版

导师:Zexi Li
中选学生:Cuih (niconical)

项目简述

目前 Cloudpods 使用 ocboot 这个部署工具,能够以分布式多节点的方式部署运行在 CentOS 7 、 Debian 10 和 Kylin 等 Linux 发行版上。另外服务分布式多节点运行的架构是依赖 Kubernetes 作为底座的,所以 ocboot 调用 ansible 部署平台的时候,会优先部署一套 Kubernetes ,最后再把各个服务以容器化的方式运行在 Kubernetes 上,这样底层的节点可以是不同架构的不通发行版。

Ubuntu 是一个使用很普及的 Linux 发行版,该项目需要能够在 ocboot 工具里面适配 Ubuntu 发行版的部署,让 Cloudpods 相关服务运行在 Ubuntu 22.04 LTS 发行版上。

部署服务的方式主要是使用 python 读取配置文件,然后生成 ansible 的 inventory ,最后调用 ansible 运行相关 playbook 完成服务的部署。

项目产出要求

  • 将 Cloudpods 融合云部署运行在 Ubuntu 22.04 LTS 发行版上
  • 实现功能,以 PR 的方式提交代码到 ocboot upstream 仓库

参考

@niconical
Copy link
Author

I will try to submit my proposal under this issue in both Chinese and English. @zexi

@niconical
Copy link
Author

niconical commented Jul 18, 2023

项目开发方案

1.1 待修改的源代码

  • onecloud/roles/common/tasks/ubuntu_22_aarch64.yml
  • onecloud/roles/common/tasks/ubuntu_22_x86_64.yml
  • onecloud/roles/mariadb/tasks/ubuntu-aarch64.yml
  • onecloud/roles/mariadb/tasks/ubuntu-x86_64.yml
  • onecloud/roles/utils/detect-os/vars/debian-10.aarch64.yml
  • onecloud/roles/utils/detect-os/vars/debian.yml
  • onecloud/roles/utils/detect-os/vars/ubuntu-22.aarch64.yml
  • onecloud/roles/utils/detect-os/vars/ubuntu-22.x86_64.yml
  • onecloud/roles/utils/detect-os/vars/uniontech_os_server_20_enterprise-20.aarch64.yml
  • onecloud/roles/utils/detect-os/vars/uniontech_os_server_20_enterprise-20.x86_64.yml
  • onecloud/roles/utils/misc-check/tasks/os.yml
  • run.py

1.1.1 run.py

run.py 为了兼容 Ubuntu 22.04 LTS 需要对下列函数做如下改动(以类似git diff的形式显示改动):

  1. 修改install_packages()方法
def install_packages(pkgs):
    if os.system('grep -Pq "Kylin Linux Advanced Server|CentOS Linux|openEuler" /etc/os-release') == 0:
        return os.system("yum install -y %s" % (" ".join(pkgs)))
-   elif os.system('grep -wq "Debian GNU/Linux" /etc/os-release') == 0:
+   elif os.system('grep -Pq "Debian GNU/Linux|Ubuntu" /etc/os-release') == 0:
  return os.system("apt install -y %s" % (" ".join(pkgs)))
 else:
  print("Unsupported OS")
  return 255
  1. 在 Ubuntu 22.04 LTS 版本中删除 python2-pyyamlPyYAML

目前代码多处在安装环境时,需要安装 python2-pyyaml包,但是目前有几个证据证明不需要在 Ubuntu 22.04 LTS 机器上安装此包:

以下以install_ansible()作为修改示例

def install_ansible():
-  for pkg in ['python2-pyyaml', 'PyYAML']:
-       install_packages([pkg])
+   if os.system('grep -wq "Ubuntu" /etc/os-release') == 0:
+       install_packages(["python3-yaml"])
+   else:
+       for pkg in ['python2-pyyaml', 'PyYAML']:
+           install_packages([pkg])
...

1.1.2 修改 misc-check/tasks/os.yml

严格界定所支持的os(centos/kylin/debian/uos/euler)及其版本范围 #842 中新增了 os.yml 文件进行更为严格的操作系统版本控制,所以为了支持 Ubuntu,需要在该文件中添加相应版本。

+     ubuntu:
+       ansible_distribution_name: "Ubuntu"
+       conditions:
+         - "'{{ ansible_distribution_version }}' is version('22.04', '>=')"

1.1.2 修改 roles

  1. utils/detect-os

因为不同操作系统上的宿主机需要安装不同软件,而且 ocboot 安装了一部分自有软件(例如 yunion-climc, yunion-executor 等),因此为了支持 Ubuntu 22.04 LTS 操作系统版本,需要 Cloudpods 提供官方软件源。根据官网文档,目前 Cloudpods 支持以下软件源:

  • CentOS 7: aarch64, x86_64
  • Kylin V10: aarch64, x86_64
  • Debian 10: aarch64, x86_64

因为 UOS V20 基于 Debian 10.x,因此在 ocboot 中使用和 Debian 10 相同的软件源。因此,为了需要支持 Ubuntu 22.04 LTS,需要提供官方 Ubuntu 22.04 LTS jammy 软件源
Debian 10 上的软件依赖与 Ubuntu 22.04 LTS 类似,因此本方案在 utils/detect-os/vars下新建名为 ubuntu-20.x86_64.yml 与 ubuntu-20.aarch64.yml 文件,文件内容为需要安装的软件包,内容参照 UOS 与 Debian 10。
以下为 ubuntu-22.x86_64.yml 示例:

is_debian_based: true
is_ubuntu_based: true
is_ubuntu_x86: true

common_packages:
  - apt-transport-https
  - bash-completion
  - bridge-utils
  - ceph-common
  - chntpw
  - conntrack
  - conntrackd
  - curl
  - dkms
  - docker-ce         
  - git
  - glusterfs-common
  - gnupg-agent
  - gnupg2
  - ipset
  - ipvsadm
  - jq
  - kubeadm=1.15.12-00 
  - kubectl=1.15.12-00 
  - kubelet=1.15.12-00 
  - libusb-1.0-0
  - libusbredirparser1
  - librbd1
  - libspice-server1
  - nfs-common
  - ntp
  - openvswitch-switch
  - ovmf
  - parallel
  - python3-selinux
  - software-properties-common
  - usbutils
  - wget
  - "{{ yunion_qemu_package }}"

common_services:
    - ntp
    - yunion-executor

latest_packages:
    - ca-certificates
    - yunion-climc
    - yunion-executor
    - yunion-fetcherfs
    - yunion-ocadm

该文件定义 Ubuntu 22.04 LTS x86_64 设置变量:

  • is_debian_based: true
  • is_ubuntu_based: true
  • is_ubuntu_x86: true

ocboot 中一般在 when语句中根据操作系统变量中的定义进行条件过滤,所以对于 Debian 和 Ubuntu 中相同的操作,例如 make cache for debian like os这个 task,不需要对 when语句进行更改。

- name: make cache for debian like os
  shell: "apt-get update"
  args:
    warn: no
  when:
  - is_debian_based is defined

而对于 init apt cache for debian等 task,Debian 和 Ubuntu 操作并不相同,需要修改对应的 when 语句或者修改相应 task 内容,将 is_debian_based 只抽象为 debian-based 操作系统共同的内容。

- name: init apt cache for debian
  get_url:
    url: https://iso.yunion.cn/uos/buster/{{ debian_based_arch }}/3.8/yunion.gpg-key.asc
    dest: /tmp/yunion.gpg-key.asc
    validate_certs: no
  become: yes
  when:
  - is_debian_based is defined

- name: apply debian sig key
  shell: |
    apt-key add /tmp/yunion.gpg-key.asc;
    apt-get update -y;
    rm -f /tmp/yunion.gpg-key.asc
  args:
    executable: /bin/bash
  when:
  - is_debian_based is defined
  1. common

可以新建 onecloud/roles/common/tasks/ubuntu_22_x86_64.ymlonecloud/roles/common/tasks/ubuntu_22_aarch64.yml 两个文件,文件内容可以参照 Debian 10。此外,Cloudpods 组件概览文档详细的描述了需要安装的组件。
以下为示例:

---
# TODO: add ubuntu iso.yunion repo
- name: set var
  set_fact:
    onecloud_version_abbr: "{{ onecloud_version | regex_replace('[^0-9.]+') | regex_findall('^[0-9]+\\.[0-9]+') | join('')}}"

- name: config iptables for ubuntu 
  shell: |
    if iptables -V |grep -wq nf_tables && ls -l /usr/sbin/iptables |grep -wq alternatives; then
      update-alternatives --set iptables /usr/sbin/iptables-legacy
      update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
      update-alternatives --set arptables /usr/sbin/arptables-legacy
      update-alternatives --set ebtables /usr/sbin/ebtables-legacy
      if [ -x /usr/bin/aptitude ] && [ ! -x /usr/sbin/nft ]; then
          aptitude install nftables -y
          nft flush ruleset
      fi
    fi
  args:
    executable: /bin/bash

- name: init apt cache for ubuntu 
  get_url:
    url: https://iso.yunion.cn/ubuntu/{{ ansible_distribution_major_version }}/{{ onecloud_version_abbr }}/{{ ansible_architecture }}/yunion.gpg-key.asc
    dest: /tmp/yunion.gpg-key.asc
    validate_certs: no

- name: apply ubuntu sig key
  shell: |
    echo "deb [trusted=yes] https://iso.yunion.cn/ubuntu/{{ ansible_distribution_major_version }}/{{ onecloud_version_abbr }}/{{ ansible_architecture }}/ ./" > /etc/apt/sources.list.d/yunion.list;
    apt-key add /tmp/yunion.gpg-key.asc;
    apt-get update -y;
    rm -f /tmp/yunion.gpg-key.asc
  args:
    executable: /bin/bash

- name: install common packages via loop
...

- name: install latest packages via loop
...

- name: Check that if selinux config exists
...

- name: Turn off selinux
...
  1. MariaDB

可以参照 Debian 10 相应架构内容,以 Ubuntu 22.04 LTS x86_64 安装 MariaDB 为例:

- name: Install mariadb
  package:
    name: "{{ item }}"
    state: "present"
  with_items:
    - mariadb-server

- name: pips for mysql/mariadb
  pip:
    name: PyMySQL
  vars:
    ansible_python_interpreter: /usr/bin/python3

- name: Allow remote hosts to connect (Ubuntu)
  lineinfile:
    path: /etc/mysql/mariadb.conf.d/50-server.cnf
    backrefs: yes
    regexp: '^bind-address'
    line: 'bind-address            = 0.0.0.0'
    state: present

- name: make conf dir
  file:
    name: /etc/my.cnf.d
    state: directory

# restart the mariadb service asap to allow remote access
- name: Restart Mariadb
  systemd:
    name: mariadb
    state: restarted
    enabled: yes

- name: set fact for socket
  set_fact:
    login_unix_socket: /var/run/mysqld/mysqld.sock
  1. 其他一些修改

包括改动后一些文档的更新,例如 README.md、Cloudpods 安装相关文档等内容。

Ref

参考了以下 PR:
UOS:

Debian

CentOS

OpenEuler

Kylin

@niconical
Copy link
Author

niconical commented Jul 18, 2023

After forking the ocboot, I work on my local branch patch_add_ubuntu_22_04_lts_support. If needed, I can submit a draft PR on GitHub.

fork ocboot 之后我在本地的分支patch_add_ubuntu_22_04_lts_support 进行开发工作,如果需要,我可以在 GitHub 上提交 Draft 类型的PR。

@niconical
Copy link
Author

niconical commented Jul 18, 2023

Currently, I have made some progress with Ubuntu 22.04 by modifying run.py and the related roles, but I encountered a failure in the task TASK [common : init apt cache for ubuntu]. This is because there is currently a lack of software repositories specifically tailored for the Ubuntu.

目前我已经通过修改 run.py 和一些的 roles 使得 Ubuntu 22.04 初步通过了一些 task,但是在 TASK [common : init apt cache for ubuntu] 这个 task 上失败了,因为目前缺乏针对 Ubuntu 的 ISO 软件仓库

Output:

输出:

current ansible version: 2.15.1. PASS
ch-OptiPlex-9020
loading path:
reuse current yaml: /home/ch/ocboot/config-allinone-current.yml
ansible-playbook -e @/tmp/oc_vars.yml -i /tmp/host_inventory.yml ./onecloud/install-cluster.yml

PLAY [all] *********************************************************************
...

TASK [common : init apt cache for ubuntu] **************************************
fatal: [127.0.0.1]: FAILED! => {"changed": false, "dest": "/tmp/yunion.gpg-key.asc", "elapsed": 0, "msg": "Request failed", "response": "HTTP Error 404: Not Found", "status_code": 404, "url": "https://iso.yunion.cn/debian/22/3.10/x86_64/yunion.gpg-key.asc"}

PLAY RECAP *********************************************************************
127.0.0.1                  : ok=30   changed=2    unreachable=0    failed=1    skipped=12   rescued=0    ignored=0

@zexi
Copy link
Member

zexi commented Jul 18, 2023

TASK [common : init apt cache for ubuntu] **************************************
fatal: [127.0.0.1]: FAILED! => {"changed": false, "dest": "/tmp/yunion.gpg-key.asc", "elapsed": 0, "msg": "Request failed", "response": "HTTP Error 404: Not Found", "status_code": 404, "url": "https://iso.yunion.cn/debian/22/3.10/x86_64/yunion.gpg-key.asc"}

@niconical
We'll add the ubuntu repositories ASAP, and will reply here when we're done.

我们这边尽快添加下 ubuntu 的仓库,弄好后会在这里回复。

@zexi
Copy link
Member

zexi commented Jul 18, 2023

@niconical We have added the ubuntu repository and the link is https://iso.yunion.cn/ubuntu/22/3.10/x86_64/ , please check it.

我们已经添加了 ubuntu 的源,连接是 https://iso.yunion.cn/ubuntu/22/3.10/x86_64/ ,请检查下。

@niconical
Copy link
Author

TASK [primary-master-node/setup_k8s : Use ocadm init first master node] ********
fatal: [192.168.1.111]: FAILED! => {
   "changed":true,
   "cmd":"/opt/yunion/bin/ocadm init --control-plane-endpoint 192.168.1.111:6443 --mysql-host 192.168.1.111 --mysql-user root --mysql-password 5h3TKBU4rP6P --mysql-port 3306 --image-repository registry.cn-beijing.aliyuncs.com/yunion --apiserver-advertise-address 192.168.1.111  --node-ip 192.168.1.111 --enable-hugepage --onecloud-version v3.10.3 --operator-version v3.10.3 --pod-network-cidr 10.40.0.0/16 --service-cidr 10.96.0.0/12 --service-dns-domain cluster.local --addon-calico-ip-autodetection-method 'can-reach=192.168.1.111' --enable-host-agent\n",
   "delta":"0:02:25.985363",
   "end":"2023-07-26 10:05:04.675952",
   "msg":"non-zero return code",
   "rc":1,
   "start":"2023-07-26 10:02:38.690589",
   "stderr":"\t[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 24.0.5. Latest validated version: 18.09\nerror execution phase upload-config/kubelet: Error writing Crisocket information for the control-plane node: timed out waiting for the condition",
   "stderr_lines":[
      "\t[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 24.0.5. Latest validated version: 18.09",
      "error execution phase upload-config/kubelet: Error writing Crisocket information for the control-plane node: timed out waiting for the condition"
   ],
   "stdout":"[init] Using Kubernetes and Onecloud version: v1.15.8 & v3.10.3\n[preflight] Running pre-flight checks\n[preflight] Pulling images required for setting up a OneCloud on Kubernetes cluster\n[preflight] This might take a minute or two, depending on the speed of your internet connection\n[preflight] You can also perform this action in beforehand using 'ocadm config images pull'\nvip is empty. no need to install keepalived.\n[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"\n[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"\n[kubelet-start] Activating the kubelet service\n[certs] Using certificateDir folder \"/etc/kubernetes/pki\"\n[certs] Generating \"ca\" certificate and key\n[certs] Generating \"apiserver\" certificate and key\n[certs] apiserver serving cert is signed for DNS names [ch-optiplex-9020 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.1.111 192.168.1.111]\n[certs] Generating \"apiserver-kubelet-client\" certificate and key\n[certs] Generating \"front-proxy-ca\" certificate and key\n[certs] Generating \"front-proxy-client\" certificate and key\n[certs] Generating \"etcd/ca\" certificate and key\n[certs] Generating \"etcd/server\" certificate and key\n[certs] etcd/server serving cert is signed for DNS names [ch-optiplex-9020 localhost] and IPs [192.168.1.111 127.0.0.1 ::1]\n[certs] Generating \"etcd/healthcheck-client\" certificate and key\n[certs] Generating \"etcd/peer\" certificate and key\n[certs] etcd/peer serving cert is signed for DNS names [ch-optiplex-9020 localhost] and IPs [192.168.1.111 127.0.0.1 ::1]\n[certs] Generating \"apiserver-etcd-client\" certificate and key\n[certs] Generating \"sa\" key and public key\n[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"\n[kubeconfig] Writing \"admin.conf\" kubeconfig file\n[kubeconfig] Writing \"kubelet.conf\" kubeconfig file\n[kubeconfig] Writing \"controller-manager.conf\" kubeconfig file\n[kubeconfig] Writing \"scheduler.conf\" kubeconfig file\n[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"\n[control-plane] Creating static Pod manifest for \"kube-apiserver\"\n[control-plane] Creating static Pod manifest for \"kube-controller-manager\"\n[control-plane] Creating static Pod manifest for \"kube-scheduler\"\n[etcd] Creating static Pod manifest for local etcd in \"/etc/kubernetes/manifests\"\n[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 4m0s\n[apiclient] All control plane components are healthy after 21.501670 seconds\n[upload-config] Storing the configuration used in ConfigMap \"kubeadm-config\" in the \"kube-system\" Namespace\n[kubelet] Creating a ConfigMap \"kubelet-config-1.15\" in namespace kube-system with the configuration for the kubelets in the cluster\n[kubelet-check] Initial timeout of 40s passed.",
   "stdout_lines":[
      "[init] Using Kubernetes and Onecloud version: v1.15.8 & v3.10.3",
      "[preflight] Running pre-flight checks",
      "[preflight] Pulling images required for setting up a OneCloud on Kubernetes cluster",
      "[preflight] This might take a minute or two, depending on the speed of your internet connection",
      "[preflight] You can also perform this action in beforehand using 'ocadm config images pull'",
      "vip is empty. no need to install keepalived.",
      "[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"",
      "[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"",
      "[kubelet-start] Activating the kubelet service",
      "[certs] Using certificateDir folder \"/etc/kubernetes/pki\"",
      "[certs] Generating \"ca\" certificate and key",
      "[certs] Generating \"apiserver\" certificate and key",
      "[certs] apiserver serving cert is signed for DNS names [ch-optiplex-9020 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.1.111 192.168.1.111]",
      "[certs] Generating \"apiserver-kubelet-client\" certificate and key",
      "[certs] Generating \"front-proxy-ca\" certificate and key",
      "[certs] Generating \"front-proxy-client\" certificate and key",
      "[certs] Generating \"etcd/ca\" certificate and key",
      "[certs] Generating \"etcd/server\" certificate and key",
      "[certs] etcd/server serving cert is signed for DNS names [ch-optiplex-9020 localhost] and IPs [192.168.1.111 127.0.0.1 ::1]",
      "[certs] Generating \"etcd/healthcheck-client\" certificate and key",
      "[certs] Generating \"etcd/peer\" certificate and key",
      "[certs] etcd/peer serving cert is signed for DNS names [ch-optiplex-9020 localhost] and IPs [192.168.1.111 127.0.0.1 ::1]",
      "[certs] Generating \"apiserver-etcd-client\" certificate and key",
      "[certs] Generating \"sa\" key and public key",
      "[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"",
      "[kubeconfig] Writing \"admin.conf\" kubeconfig file",
      "[kubeconfig] Writing \"kubelet.conf\" kubeconfig file",
      "[kubeconfig] Writing \"controller-manager.conf\" kubeconfig file",
      "[kubeconfig] Writing \"scheduler.conf\" kubeconfig file",
      "[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"",
      "[control-plane] Creating static Pod manifest for \"kube-apiserver\"",
      "[control-plane] Creating static Pod manifest for \"kube-controller-manager\"",
      "[control-plane] Creating static Pod manifest for \"kube-scheduler\"",
      "[etcd] Creating static Pod manifest for local etcd in \"/etc/kubernetes/manifests\"",
      "[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 4m0s",
      "[apiclient] All control plane components are healthy after 21.501670 seconds",
      "[upload-config] Storing the configuration used in ConfigMap \"kubeadm-config\" in the \"kube-system\" Namespace",
      "[kubelet] Creating a ConfigMap \"kubelet-config-1.15\" in namespace kube-system with the configuration for the kubelets in the cluster",
      "[kubelet-check] Initial timeout of 40s passed."
   ]
}

@zexi
Copy link
Member

zexi commented Jul 26, 2023

TASK [primary-master-node/setup_k8s : Use ocadm init first master node] ********
fatal: [192.168.1.111]: FAILED! => {
   "changed":true,
   "cmd":"/opt/yunion/bin/ocadm init --control-plane-endpoint 192.168.1.111:6443 --mysql-host 192.168.1.111 --mysql-user root --mysql-password 5h3TKBU4rP6P --mysql-port 3306 --image-repository registry.cn-beijing.aliyuncs.com/yunion --apiserver-advertise-address 192.168.1.111  --node-ip 192.168.1.111 --enable-hugepage --onecloud-version v3.10.3 --operator-version v3.10.3 --pod-network-cidr 10.40.0.0/16 --service-cidr 10.96.0.0/12 --service-dns-domain cluster.local --addon-calico-ip-autodetection-method 'can-reach=192.168.1.111' --enable-host-agent\n",
   "delta":"0:02:25.985363",
   "end":"2023-07-26 10:05:04.675952",
   "msg":"non-zero return code",
   "rc":1,
   "start":"2023-07-26 10:02:38.690589",
   "stderr":"\t[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 24.0.5. Latest validated version: 18.09\nerror execution phase upload-config/kubelet: Error writing Crisocket information for the control-plane node: timed out waiting for the condition",
   "stderr_lines":[
      "\t[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 24.0.5. Latest validated version: 18.09",
      "error execution phase upload-config/kubelet: Error writing Crisocket information for the control-plane node: timed out waiting for the condition"
   ],
   "stdout":"[init] Using Kubernetes and Onecloud version: v1.15.8 & v3.10.3\n[preflight] Running pre-flight checks\n[preflight] Pulling images required for setting up a OneCloud on Kubernetes cluster\n[preflight] This might take a minute or two, depending on the speed of your internet connection\n[preflight] You can also perform this action in beforehand using 'ocadm config images pull'\nvip is empty. no need to install keepalived.\n[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"\n[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"\n[kubelet-start] Activating the kubelet service\n[certs] Using certificateDir folder \"/etc/kubernetes/pki\"\n[certs] Generating \"ca\" certificate and key\n[certs] Generating \"apiserver\" certificate and key\n[certs] apiserver serving cert is signed for DNS names [ch-optiplex-9020 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.1.111 192.168.1.111]\n[certs] Generating \"apiserver-kubelet-client\" certificate and key\n[certs] Generating \"front-proxy-ca\" certificate and key\n[certs] Generating \"front-proxy-client\" certificate and key\n[certs] Generating \"etcd/ca\" certificate and key\n[certs] Generating \"etcd/server\" certificate and key\n[certs] etcd/server serving cert is signed for DNS names [ch-optiplex-9020 localhost] and IPs [192.168.1.111 127.0.0.1 ::1]\n[certs] Generating \"etcd/healthcheck-client\" certificate and key\n[certs] Generating \"etcd/peer\" certificate and key\n[certs] etcd/peer serving cert is signed for DNS names [ch-optiplex-9020 localhost] and IPs [192.168.1.111 127.0.0.1 ::1]\n[certs] Generating \"apiserver-etcd-client\" certificate and key\n[certs] Generating \"sa\" key and public key\n[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"\n[kubeconfig] Writing \"admin.conf\" kubeconfig file\n[kubeconfig] Writing \"kubelet.conf\" kubeconfig file\n[kubeconfig] Writing \"controller-manager.conf\" kubeconfig file\n[kubeconfig] Writing \"scheduler.conf\" kubeconfig file\n[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"\n[control-plane] Creating static Pod manifest for \"kube-apiserver\"\n[control-plane] Creating static Pod manifest for \"kube-controller-manager\"\n[control-plane] Creating static Pod manifest for \"kube-scheduler\"\n[etcd] Creating static Pod manifest for local etcd in \"/etc/kubernetes/manifests\"\n[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 4m0s\n[apiclient] All control plane components are healthy after 21.501670 seconds\n[upload-config] Storing the configuration used in ConfigMap \"kubeadm-config\" in the \"kube-system\" Namespace\n[kubelet] Creating a ConfigMap \"kubelet-config-1.15\" in namespace kube-system with the configuration for the kubelets in the cluster\n[kubelet-check] Initial timeout of 40s passed.",
   "stdout_lines":[
      "[init] Using Kubernetes and Onecloud version: v1.15.8 & v3.10.3",
      "[preflight] Running pre-flight checks",
      "[preflight] Pulling images required for setting up a OneCloud on Kubernetes cluster",
      "[preflight] This might take a minute or two, depending on the speed of your internet connection",
      "[preflight] You can also perform this action in beforehand using 'ocadm config images pull'",
      "vip is empty. no need to install keepalived.",
      "[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"",
      "[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"",
      "[kubelet-start] Activating the kubelet service",
      "[certs] Using certificateDir folder \"/etc/kubernetes/pki\"",
      "[certs] Generating \"ca\" certificate and key",
      "[certs] Generating \"apiserver\" certificate and key",
      "[certs] apiserver serving cert is signed for DNS names [ch-optiplex-9020 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.1.111 192.168.1.111]",
      "[certs] Generating \"apiserver-kubelet-client\" certificate and key",
      "[certs] Generating \"front-proxy-ca\" certificate and key",
      "[certs] Generating \"front-proxy-client\" certificate and key",
      "[certs] Generating \"etcd/ca\" certificate and key",
      "[certs] Generating \"etcd/server\" certificate and key",
      "[certs] etcd/server serving cert is signed for DNS names [ch-optiplex-9020 localhost] and IPs [192.168.1.111 127.0.0.1 ::1]",
      "[certs] Generating \"etcd/healthcheck-client\" certificate and key",
      "[certs] Generating \"etcd/peer\" certificate and key",
      "[certs] etcd/peer serving cert is signed for DNS names [ch-optiplex-9020 localhost] and IPs [192.168.1.111 127.0.0.1 ::1]",
      "[certs] Generating \"apiserver-etcd-client\" certificate and key",
      "[certs] Generating \"sa\" key and public key",
      "[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"",
      "[kubeconfig] Writing \"admin.conf\" kubeconfig file",
      "[kubeconfig] Writing \"kubelet.conf\" kubeconfig file",
      "[kubeconfig] Writing \"controller-manager.conf\" kubeconfig file",
      "[kubeconfig] Writing \"scheduler.conf\" kubeconfig file",
      "[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"",
      "[control-plane] Creating static Pod manifest for \"kube-apiserver\"",
      "[control-plane] Creating static Pod manifest for \"kube-controller-manager\"",
      "[control-plane] Creating static Pod manifest for \"kube-scheduler\"",
      "[etcd] Creating static Pod manifest for local etcd in \"/etc/kubernetes/manifests\"",
      "[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 4m0s",
      "[apiclient] All control plane components are healthy after 21.501670 seconds",
      "[upload-config] Storing the configuration used in ConfigMap \"kubeadm-config\" in the \"kube-system\" Namespace",
      "[kubelet] Creating a ConfigMap \"kubelet-config-1.15\" in namespace kube-system with the configuration for the kubelets in the cluster",
      "[kubelet-check] Initial timeout of 40s passed."
   ]
}

@niconical Please check the logs of the kubelet service with the following command:

journactl -u kubelet --no-pager 

@niconical
Copy link
Author

niconical commented Jul 28, 2023

Currently, I can access the Cloudpods Dashboard via http://ip/dashboard. However, some of Cloudpods' Kubernetes clusters are encountering errors while trying to pull images, such as ImagePullBackOff or ErrImagePull. Additionally, I have noticed a DNS issue with Ubuntu 22.04 after deploying Cloudpods. For example, when I try to ping baidu.com, I receive the error message: Temporary failure in name resolution.

During the process of porting Cloudpods to the Ubuntu 22.04 platform, I skipped an abnormal task: Remove immutable flag on /etc/resolv.conf. This is because modifying /etc/resolve.conf directly is not recommended in Ubuntu 22.04. I suspect that the DNS issue might be related to skipping this particular task, but I'm not entirely sure about the exact cause.

现在我可以通过 http://ip/dashboard 访问 Cloudpods Dashboard,但是有一些 Cloudpods 的 Kubernetes 因为无法正常拉取镜像而报错:ImagePullBackOff 或者 ErrImagePull。同时,我发现在部署 Cloudpods 之后,Ubuntu 22.04 的 DNS 好像出了问题,例如 ping baidu.com 会报错:Temporary failure in name resolution。我在将 Cloudpods 移植到 Ubuntu 22.04 平台时跳过了一个异常 task:Remove immutable flag on /etc/resolv.conf,因为在Ubuntu 22.04 中不建议通过直接修改 /etc/resolve.conf 的方式改变 DNS,我猜测 DNS 异常和跳过这一个异常 task 有关,但是我不太清楚具体原因。

@zexi
Copy link
Member

zexi commented Jul 30, 2023

@niconical
Can you open a PR for this feature? Then I can test the problem you're having with the Ubuntu 22.04 OS.

现在能提个 PR 吗?我就可以在 Ubuntu 22.04 发行版上测试下这个问题了。

@niconical
Copy link
Author

@niconical Can you open a PR for this feature? Then I can test the problem you're having with the Ubuntu 22.04 OS.

现在能提个 PR 吗?我就可以在 Ubuntu 22.04 发行版上测试下这个问题了。

I'll be submitting a PR later this evening.

@niconical
Copy link
Author

Due to changes in DNS management in Ubuntu 22.04 LTS, /etc/resolv.conf becomes a symlink pointing to /run/systemd/resolve/stub-resolv.conf, and systemd-resolved maintains the DNS configuration. I found the following issues during deployment:

  • Port 53 is occupied by systemd-resolved, causing default-region-dns-xxxxx to fail to start
  • After referring to this article to solve the port occupation problem, default-host-image-xxxxx failed to start. Because the Pod tries to mount /etc/resolv.conf, and /etc/resolv.conf is a soft link, the mount fails.

因为 Ubuntu 22.04 LTS DNS 管理发生了变化,/etc/resolv.conf 变为一个指向 /run/systemd/resolve/stub-resolv.conf的软链接,由 systemd-resolved 维护 DNS 配置。我在部署过程中发现以下问题:

  • 端口 53 被 systemd-resolved 占用,导致 default-region-dns-xxxxx 无法启动
  • 在参考这篇文章解决端口占用问题之后,default-host-image-xxxxx无法启动。因为该 Pod 尝试将 /etc/resolv.conf 进行 mount 操作,而 /etc/resolv.conf 是软链接导致 mount 失败。

f4007186fafaa8f3e8d4aee39678af3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants