kolla-mitaka-eol 部署openstack mitaka遇到的问题

实验需要使用 kolla 部署 openstack mitaka环境,由于是两年前的版本,实验过程中遇到了一些坑,记录如下。

系统环境

操作系统:CentOS Linux release 7.2.1511 (Core)
内核版本:3.10.0-327.28.3.el7.x86_64
kolla版本:mitaka-eol
docker版本:Docker version 1.13.1, build 092cba3
docker镜像:官方tag 2.0.2 (对应 openstack mitaka版本)

问题一: openvswitch_db 容器无法运行

问题描述

kolla-ansible deploy 部署openstack的时候总会遇到 openvswitch_db service 无法启动的问题

TASK: [neutron | Waiting the openvswitch_db service to be ready] ************** 
failed: [localhost] => {"attempts": 30, "changed": false, "cmd": ["docker", "exec", "openvswitch_db", "ovs-vsctl", "--no-wait", "show"], "delta": "0:00:00.032518", "end": "2018-07-09 07:33:12.680647", "failed": true, "rc": 1, "start": "2018-07-09 07:33:12.648129", "stdout_lines": [], "warnings": []}
stderr: Error response from daemon: Container 0cec739aabe06805aa0e1624318ac052d9f8fb176078df3d20a13c4df304fa7a is restarting, wait until the container is running
msg: Task failed as maximum retries was encountered

FATAL: all hosts have already failed -- aborting

查看容器日志有如下报错

INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Copying service configuration files
INFO:__main__:Writing out command to execute
Running command: '/usr/sbin/ovsdb-server /var/lib/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/run/openvswitch/db.sock --log-file=/var/log/openvswitch/ovsdb-server.log'
ovsdb-server: I/O error: open: /var/lib/openvswitch/conf.db failed (No such file or directory)

排查问题

1. 手动创建 openvswitch_db 容器,并进入交互模式

docker run -it kolla/centos-source-openvswitch-db-server:2.0.2 /bin/bash

2. 查看启动命令

# cd /usr/local/bin
# vi kolla_start

#!/bin/bash
set -o errexit

# Wait for the log socket
if [[ ! "${!SKIP_LOG_SETUP[@]}" && -e /var/lib/kolla/heka ]]; then
while [[ ! -S /var/lib/kolla/heka/log ]]; do
sleep 1
done
fi

# Processing /var/lib/kolla/config_files/config.json as root. This is necessary
# to permit certain files to be controlled by the root user which should
# not be writable by the dropped-privileged user, especially /run_command
sudo -E kolla_set_configs
CMD=$(cat /run_command)
ARGS=""

if [[ ! "${!KOLLA_SKIP_EXTEND_START[@]}" ]]; then
# Run additional commands if present
source kolla_extend_start
fi

echo "Running command: '${CMD}${ARGS:+ $ARGS}'"
exec ${CMD} ${ARGS}

注意到 kolla_extend_start 这个脚本

3. 查看 kolla_extend_start

# vi kolla_extend_start

#!/bin/bash

mkdir -p "/run/openvswitch"
if [[ ! -e "/etc/openvswitch/conf.db" ]]; then
ovsdb-tool create "/etc/openvswitch/conf.db"
fi

容器启动首先运行 kolla_start 脚本,当没有配置 KOLLA_SKIP_EXTEND_START 变量的时候继续执行 kolla_extend_start 进行一些初始化操作。
问题就出在创建 conf.db ,kolla_extend_start 创建的 conf.db 位于 /etc/openvswitch/conf.db ,而启动命令传入的路径却是 /var/lib/openvswitch/conf.db ,所以会报错找不到 /var/lib/openvswitch/conf.db

修改方法

两种修改方法,一是修改 openvswitch_db 镜像里面的 kolla_extend_start 脚本创建conf.db的位置,二是修改 openvswitch_db 容器的启动命令参数,这里为了镜像的完整性和操做便利我选择了方法二。
编辑 vim /etc/kolla/openvswitch-db-server/config.json 修改如下

{
"command": "/usr/sbin/ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/run/openvswitch/db.sock --log-file=/var/log/openvswitch/ovsdb-server.log",
"config_files": []
}

这种情况下的配置文件会被 clean-host 命令清理掉,所有我们继续查找 /etc/kolla/openvswitch-db-server/config.json 是在哪里生成的。
查找到如下位置 /usr/share/kolla/ansible/roles/neutron/templates/openvswitch-db-server.json.j2 当执行 deploy 操做的时候 openvswitch-db-server.json.j2 会被解析成 openvswitch-db-server.json 并复制到 /etc/kolla/的对应位置,所以只需要修改 openvswitch-db-server.json.j2 即可。

# vim /usr/share/kolla/ansible/roles/neutron/templates/openvswitch-db-server.json.j2

{
"command": "/usr/sbin/ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/run/openvswitch/db.sock --log-file=/var/log/openvswitch/ovsdb-server.log",
"config_files": []
}

验证

# /root/kolla/tools/cleanup-containers
# /root/kolla/tools/cleanup-host
# kolla-ansible deploy

问题二: kolla_toolbox 镜像中 ansible 版本不符

问题描述

解决问题一后运行 deploy 到最后依旧会报错

TASK: [horizon | Creating the _member_ role] ********************************** 
failed: [localhost] => {"attempts": 10, "changed": false, "cmd": ["docker", "exec", "-t", "kolla_toolbox", "/usr/bin/ansible", "localhost", "-m", "os_keystone_role", "-a", "name=_member_ auth={# openstack_horizon_auth #}", "-e", "{'openstack_horizon_auth':{'username': 'admin', 'project_name': 'admin', 'password': 'admin', 'auth_url': 'http://172.16.15.115:35357'}}"], "delta": "0:00:01.223878", "end": "2018-07-09 09:18:24.813123", "failed": true, "rc": 2, "start": "2018-07-09 09:18:23.589245", "stdout_lines": ["localhost | FAILED! => {", " \"failed\": true, ", " \"msg\": \"The module os_keystone_role was not found in configured module paths. Additionally, core modules are missing. If this is a checkout, run 'git submodule update --init --recursive' to correct this problem.\"", "}"], "warnings": []}
stdout: localhost | FAILED! => {
"failed": true,
"msg": "The module os_keystone_role was not found in configured module paths. Additionally, core modules are missing. If this is a checkout, run 'git submodule update --init --recursive' to correct this problem."
}
msg: Task failed as maximum retries was encountered

FATAL: all hosts have already failed -- aborting

执行 git submodule update –init –recursive 后报此错误

排查错误

经查询这是一个 bug ,详情见: Hitting ansible error “The module os_keystone_role was not found in configured module paths”

1. 查看 ansible 版本

# docker exec -ti kolla_toolbox /usr/bin/ansible --version

ansible 2.1.0
config file = /home/ansible/.ansible.cfg
configured module search path = /usr/share/ansible

2. 更新 ansible

# docker exec -ti kolla_toolbox sudo pip install ansible==2.1.1.0

We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:

#1) Respect the privacy of others.
#2) Think before you type.
#3) With great power comes great responsibility.

[sudo] password for ansible:

需要 root 权限,容器是以 ansible 用户运行,并且没有密码,无法更新。执行尝试手动 build kolla_toolbox 镜像

3. 使用 kolla_toolbox 的 Dockerfile 文件手动 build 镜像

build 镜像的时候由于各种预制的源已经不存在或者无法访问,因此决定不使用官方pull的镜像,而采用手动更改源的地址,重新build所有镜像,所以也能解决问题二。

问题三: 重新build kolla镜像

问题描述

需要重新 build openstack mitaka的镜像,但是由于时间问题,镜像内部很多源的地址已经失效,手动替换源为可用的地址,重新build

操作步骤

1. 修改kolla使用的源的地址

以下文件会被 COPY 到容器内,可以直接修改

  • kibana.yum.repo

    修改为最新的6.x版本,4.x版本无法访问
    vi /usr/share/kolla/docker/base/kibana.yum.repo

    -[kibana-4.4]
    -name=Kibana repository for 4.4.x packages
    -baseurl=http://packages.elastic.co/kibana/4.4/centos
    -gpgcheck=1
    -gpgkey=http://packages.elastic.co/GPG-KEY-elasticsearch
    -enabled=1

    +[kibana-6.x]
    +name=Kibana repository for 6.x packages
    +baseurl=https://artifacts.elastic.co/packages/6.x/yum
    +gpgcheck=0
    +enabled=1
  • elasticsearch.repo

    为了和kibana的兼容性,修改elasticsearch源为6.x版本

    # vi /usr/share/kolla/docker/base/elasticsearch.repo

    -[elasticsearch-2.x]
    -name=Elasticsearch repository for 2.x packages
    -baseurl=http://packages.elastic.co/elasticsearch/2.x/centos
    -gpgcheck=1
    -gpgkey=http://packages.elastic.co/GPG-KEY-elasticsearch
    -enabled=1

    +[elasticsearch-6.x]
    +name=Elasticsearch repository for 6.x packages
    +baseurl=https://artifacts.elastic.co/packages/6.x/yum
    +gpgcheck=0
    +enabled=1 #elasticsearch的源在 AWS 上,国内访问不太稳定

2. 修改 ceph、openstack、QEMU-EV 源地址

因为 ceph 、openstack 和 QEMU-EV 的源是在安装了centos-release-ceph-hammer 、centos-release-openstack-mitaka 和 centos-release-qemu-ev 自动生成的因此需要修改对应的 Dockerfile文件,加入修改源地址的操作。

# vim /usr/share/kolla/docker/base/Dockerfile.j2

RUN rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7 \
&& yum install -y \
epel-release \
- centos-release-openstack-mitaka \ #extra源里面已经没有centos-release-openstack-mitaka这个包,手动安装
+ http://mirror.neu.edu.cn/centos/7/cloud/x86_64/openstack-mitaka/centos-release-openstack-mitaka-1-5.el7.noarch.rpm \
yum-plugin-priorities \
centos-release-ceph-hammer \
centos-release-qemu-ev \
+ && sed -i s/mirror.centos.org/mirror.neu.edu.cn/g /etc/yum.repos.d/CentOS-Ceph-Hammer.repo \
+ && sed -i s/mirror.centos.org/mirror.neu.edu.cn/g /etc/yum.repos.d/CentOS-OpenStack-mitaka.repo \
+ && sed -i s#mirror.centos.org/\$contentdir/#mirror.neu.edu.cn/centos/#g /etc/yum.repos.d/CentOS-QEMU-EV.repo \
+ && rm -rf /etc/yum.repos.d/CentOS-Base.repo && curl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo \
&& rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7 \
&& rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-Storage \
&& rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-Virtualization \
&& yum clean all

3. 使用阿里云和豆瓣的源进行加速

# vim /usr/share/kolla/docker/base/Dockerfile.j2

RUN rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7 \
&& yum install -y \
epel-release \
http://mirror.neu.edu.cn/centos/7/cloud/x86_64/openstack-mitaka/centos-release-openstack-mitaka-1-5.el7.noarch.rpm \
yum-plugin-priorities \
centos-release-ceph-hammer \
centos-release-qemu-ev \
&& sed -i s/mirror.centos.org/mirror.neu.edu.cn/g /etc/yum.repos.d/CentOS-Ceph-Hammer.repo \
&& sed -i s/mirror.centos.org/mirror.neu.edu.cn/g /etc/yum.repos.d/CentOS-OpenStack-mitaka.repo \
&& sed -i s#mirror.centos.org/\$contentdir/#mirror.neu.edu.cn/centos/#g /etc/yum.repos.d/CentOS-QEMU-EV.repo \
+ && rm -rf /etc/yum.repos.d/CentOS-Base.repo && curl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo \
+ && rm -rf /etc/yum.repos.d/epel.* && curl -o /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo \
&& rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7 \
&& rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-Storage \
&& rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-Virtualization \
&& yum clean all

+#use douban source
+RUN mkdir ~/.pip \
+ && > ~/.pip/pip.conf \
+ && echo "[global]" > ~/.pip/pip.conf \
+ && echo "index-url = http://pypi.douban.com/simple" >> ~/.pip/pip.conf \
+ && echo "[install]" >> ~/.pip/pip.conf \
+ && echo "trusted-host = pypi.douban.com" >> ~/.pip/pip.conf

4. 修改 kolla_toolbox

测试发现编译 kolla_toolbox 镜像的时候使用 pip 安装python包的时候安装了最新的openstack client版本,需要安装 requests>=2.14.2 ,由于依赖会升级 chardet, 而chardet 是系统依赖,在容器中无法升级。所以手动指定openstack client的版本,版本来源 OpenStack Projects Release Notes

# vim /usr/share/kolla/docker/kolla_toolbox/Dockerfile.j2

RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py \
&& python get-pip.py \
&& rm get-pip.py \
&& pip --no-cache-dir install \
+ openstacksdk==0.9.0 \
+ osc-lib==0.4.0 \
+ oslo.config==3.13.0 \
+ oslo.i18n==3.8.0 \
+ oslo.serialization==2.11.0 \
+ oslo.utils==3.16.0 \
+ python-cinderclient==1.8.0 \
+ python-glanceclient==2.2.0 \
+ python-heatclient==1.3.0 \
+ python-ironicclient==1.5.0 \
+ python-keystoneclient==3.2.0 \
+ python-neutronclient==4.2.0 \
+ python-novaclient==5.0.0 \
+ python-openstackclient==2.6.0 \
+ python-swiftclient==3.0.0 \
+ python-troveclient==2.3.0 \
+ stevedore==1.16.0 \
+ debtcollector==1.6.0 \
+ keystoneauth1==2.9.0 \
+ cliff==2.1.0 \
+ cmd2==0.6.8 \
+ pbr==1.10.0 \
+ requests==2.10.0 \
ansible==2.1.1.0 \
MySQL-python \
os-client-config==1.16.0 \
pyudev \
shade==1.4.0

5. 手动 build 镜像

# 只build基本的镜像
# kolla-build --base centos -t binary horizon cinder heat nova neutron glance keystone ironic rabbitmq keepalived haproxy heka kolla_toolbox mariadb memcached cron openvswitch

6. 查看编译后的镜像

# docker images

REPOSITORY TAG IMAGE ID CREATED SIZE
kolla/centos-binary-openvswitch-db-server 2.0.4 58e8a1cdc387 10 minutes ago 379 MB
kolla/centos-binary-openvswitch-vswitchd 2.0.4 cb85d198f02c 10 minutes ago 379 MB
kolla/centos-binary-openvswitch-base 2.0.4 0baee06b57a4 10 minutes ago 379 MB
kolla/centos-binary-kolla-toolbox 2.0.4 d3f9e86e1292 43 minutes ago 631 MB
kolla/centos-binary-base 2.0.4 4481fe643afa About an hour ago 344 MB
kolla/centos-binary-openvswitch-base 2.0.4 09ab40d1a684 6 hours ago 380 MB
kolla/centos-binary-ironic-inspector 2.0.4 0a692e9b679e 10 hours ago 641 MB
kolla/centos-binary-ironic-api 2.0.4 087a20a24a84 10 hours ago 635 MB
kolla/centos-binary-ironic-conductor 2.0.4 35450cd6b73b 10 hours ago 662 MB
kolla/centos-binary-nova-compute-ironic 2.0.4 d0a5a0a85ab7 10 hours ago 1.07 GB
kolla/centos-binary-ironic-pxe 2.0.4 d505622b6982 10 hours ago 639 MB
kolla/centos-binary-elasticsearch 2.0.4 63064a9d79d1 10 hours ago 692 MB
kolla/centos-binary-kibana 2.0.4 d52426e4d06f 10 hours ago 874 MB
kolla/centos-binary-ironic-base 2.0.4 09a4902a06be 10 hours ago 612 MB
kolla/centos-binary-nova-libvirt 2.0.4 81f523f3d656 11 hours ago 1.11 GB
kolla/centos-binary-nova-compute 2.0.4 5b70975fe56d 11 hours ago 1.11 GB
kolla/centos-binary-cinder-volume 2.0.4 acc66141a640 11 hours ago 859 MB
kolla/centos-binary-cinder-api 2.0.4 5ff44bfe4063 11 hours ago 850 MB
kolla/centos-binary-cinder-rpcbind 2.0.4 d241518b407c 11 hours ago 838 MB
kolla/centos-binary-cinder-backup 2.0.4 70e279b6adc7 11 hours ago 808 MB
kolla/centos-binary-cinder-scheduler 2.0.4 d4c8e5140be4 11 hours ago 808 MB
kolla/centos-binary-glance-api 2.0.4 92ed32bb6344 11 hours ago 732 MB
kolla/centos-binary-nova-conductor 2.0.4 c7f752689dfc 11 hours ago 671 MB
kolla/centos-binary-nova-consoleauth 2.0.4 5bb4f725b42d 11 hours ago 671 MB
kolla/centos-binary-nova-scheduler 2.0.4 00ddedda23c4 11 hours ago 671 MB
kolla/centos-binary-glance-registry 2.0.4 e59b0948281e 11 hours ago 732 MB
kolla/centos-binary-nova-ssh 2.0.4 01c404afdc8b 11 hours ago 672 MB
kolla/centos-binary-nova-api 2.0.4 d29b3451c045 11 hours ago 671 MB
kolla/centos-binary-nova-network 2.0.4 a5cead8aee0b 11 hours ago 672 MB
kolla/centos-binary-neutron-openvswitch-agent 2.0.4 36226e14b4d7 11 hours ago 670 MB
kolla/centos-binary-neutron-linuxbridge-agent 2.0.4 e9ad4cb7c6cc 11 hours ago 670 MB
kolla/centos-binary-nova-novncproxy 2.0.4 3689a51a5db8 11 hours ago 672 MB
kolla/centos-binary-nova-spicehtml5proxy 2.0.4 55fc9d8a62b5 11 hours ago 672 MB
kolla/centos-binary-neutron-metadata-agent 2.0.4 45f0f090cf38 11 hours ago 646 MB
kolla/centos-binary-cinder-base 2.0.4 dd0c5b78af7b 11 hours ago 808 MB
kolla/centos-binary-neutron-server 2.0.4 171b7bab73ab 11 hours ago 646 MB
kolla/centos-binary-heat-api 2.0.4 f334caf10d5a 11 hours ago 633 MB
kolla/centos-binary-horizon 2.0.4 88ceecbc8cf8 11 hours ago 763 MB
kolla/centos-binary-heat-engine 2.0.4 a53651463235 11 hours ago 633 MB
kolla/centos-binary-heat-api-cfn 2.0.4 7ec6cdd4b04b 11 hours ago 633 MB
kolla/centos-binary-neutron-l3-agent 2.0.4 d4744d180b09 11 hours ago 646 MB
kolla/centos-binary-neutron-dhcp-agent 2.0.4 d4744d180b09 11 hours ago 646 MB
kolla/centos-binary-glance-base 2.0.4 f78fa9de5c7b 11 hours ago 732 MB
kolla/centos-binary-nova-base 2.0.4 aa7ae5ae4818 11 hours ago 648 MB
kolla/centos-binary-neutron-base 2.0.4 ed5f4f60a6f4 11 hours ago 646 MB
kolla/centos-binary-heat-base 2.0.4 4798734eb0d4 11 hours ago 610 MB
kolla/centos-binary-keystone 2.0.4 55cf2686b33a 11 hours ago 644 MB
kolla/centos-binary-openstack-base 2.0.4 9d511be689b7 22 hours ago 572 MB
kolla/centos-binary-mariadb 2.0.4 cb4c65a6a637 22 hours ago 682 MB
kolla/centos-binary-openvswitch-vswitchd 2.0.4 0179076733aa 22 hours ago 380 MB
kolla/centos-binary-openvswitch-db-server 2.0.4 a9e0e1bd0968 22 hours ago 380 MB
kolla/centos-binary-rabbitmq 2.0.4 cbaeb0b64930 22 hours ago 438 MB
kolla/centos-binary-memcached 2.0.4 11aa506130a6 22 hours ago 403 MB
kolla/centos-binary-heka 2.0.4 89c723045d40 22 hours ago 420 MB
kolla/centos-binary-cron 2.0.4 4b0b36b058a1 22 hours ago 366 MB
kolla/centos-binary-keepalived 2.0.4 7fbc06505ddb 23 hours ago 411 MB
kolla/centos-binary-haproxy 2.0.4 7f65eba43909 23 hours ago 367 MB
centos latest 49f7960eb7e4 5 weeks ago 200 MB

问题四: openvswitch_db 容器无法启动

问题描述

重新编译了镜像,可是 deploy 的时候又出现了以下错误

TASK: [neutron | Waiting the openvswitch_db service to be ready] ************** 
failed: [localhost] => {"attempts": 30, "changed": false, "cmd": ["docker", "exec", "openvswitch_db", "ovs-vsctl", "--no-wait", "show"], "delta": "0:00:00.057827", "end": "2018-07-12 08:04:09.663118", "failed": true, "rc": 1, "start": "2018-07-12 08:04:09.605291", "stdout_lines": [], "warnings": []}
stderr: Error response from daemon: Container 03426314b560db08c762a8f9aebdb4423571a29ba1c22862e3415ac913289c21 is restarting, wait until the container is running
msg: Task failed as maximum retries was encountered

FATAL: all hosts have already failed -- aborting

和问题一的报错一致,查看容器日志

INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Copying service configuration files
INFO:__main__:Writing out command to execute
Running command: '/usr/sbin/ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/run/openvswitch/db.sock --log-file=/var/log/openvswitch/ovsdb-server.log'
ovsdb-server: I/O error: open: /etc/openvswitch/conf.db failed (No such file or directory)

排查问题

之前使用的镜像是直接从dockerhub pull的官方的镜像,tag 是 2.0.2,而我使用的 kolla 的版本是 2.0.4。对比了下 2.0.2 和 2.0.4 的openvswitch_db 部分的代码,问题就很明了了。

tag:2.0.2

kolla/docker/openvswitch/openvswitch-db-server/extend_start.sh

#!/bin/bash

mkdir -p "/run/openvswitch"
if [[ ! -e "/etc/openvswitch/conf.db" ]]; then
ovsdb-tool create "/etc/openvswitch/conf.db"
fi

tag:mitaka-eol

kolla/docker/openvswitch/openvswitch-db-server/extend_start.sh

mkdir -p "/run/openvswitch"
if [[ ! -e "/var/lib/openvswitch/conf.db" ]]; then
ovsdb-tool create "/var/lib/openvswitch/conf.db"
fi

2.0.2的镜像会创建 /etc/openvswitch/conf.db 这个文件,而使用2.0.4的版本启动命令是

{
"command": "/usr/sbin/ovsdb-server /var/lib/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/run/openvswitch/db.sock --log-file=/var/log/openvswitch/ovsdb-server.log",
"config_files": []
}

所有会导致问题一的出现,而我重新 build 了镜像,也就是说现在容器启动的时候创建的是 /var/lib/openvswitch/conf.db 。而我修改了启动命令,所以会找不到 /etc/openvswitch/conf.db 文件。解决办法是还原问题一所作的修改:

# vim /usr/share/kolla/ansible/roles/neutron/templates/openvswitch-db-server.json.j2

{
"command": "/usr/sbin/ovsdb-server /var/lib/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/run/openvswitch/db.sock --log-file=/var/log/openvswitch/ovsdb-server.log",
"config_files": []
}

验证操作

重新deploy后成功部署

# kolla-ansible deploy

……
TASK: [manila | Creating Manila database] *************************************
skipping: [localhost]

TASK: [manila | Reading json from variable] ***********************************
skipping: [localhost]

TASK: [manila | Creating Manila database user and setting permissions] ********
skipping: [localhost]

TASK: [manila | Running Manila bootstrap container] ***************************
skipping: [localhost]

TASK: [manila | Starting manila-api container] ********************************
skipping: [localhost]

TASK: [manila | Starting manila-scheduler container] **************************
skipping: [localhost]

TASK: [manila | Starting manila-share container] ******************************
skipping: [localhost]

PLAY RECAP ********************************************************************
localhost : ok=311 changed=123 unreachable=0 failed=0

问题五: nova_compute 和 nova_libvirt 容器启动失败

问题描述

接问题四,虽然部署成功了,但是查看容器状态,nova 的两个容器总是在启动中

# docker ps

648c226f0980 kolla/centos-binary-nova-compute:2.0.4 "kolla_start" 3 days ago Restarting (0) 21 hours ago nova_compute
c492de413c81 kolla/centos-binary-nova-libvirt:2.0.4 "kolla_start" 3 days ago Restarting (6) 21 hours ago nova_libvirt

查看对应容器的日志
nova_compute 容器

# docker logs 648c226f0980
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Copying service configuration files
INFO:__main__:Removing existing destination: /etc/nova/nova.conf
INFO:__main__:Copying /var/lib/kolla/config_files/nova.conf to /etc/nova/nova.conf
INFO:__main__:Setting permissions for /etc/nova/nova.conf
INFO:__main__:Writing out command to execute
Running command: 'nova-compute'
/usr/lib/python2.7/site-packages/pkg_resources/__init__.py:187: RuntimeWarning: You have iterated over the result of pkg_resources.parse_version. This is a legacy behavior which is inconsistent with the new version class introduced in setuptools 8.0. In most cases, conversion to a tuple is unnecessary. For comparison of versions, sort the Version instances directly. If you have another use case requiring the tuple, please file a bug with the setuptools project describing that need.
stacklevel=1,
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/eventlet/queue.py", line 118, in switch
self.greenlet.switch(value)
File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
result = function(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 683, in run_service
raise SystemExit(1)
SystemExit: 1

nova_libvirt 容器

# docker logs c492de413c81

INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Copying service configuration files
INFO:__main__:Removing existing destination: /etc/libvirt/libvirtd.conf
INFO:__main__:Copying /var/lib/kolla/config_files/libvirtd.conf to /etc/libvirt/libvirtd.conf
INFO:__main__:Setting permissions for /etc/libvirt/libvirtd.conf
INFO:__main__:Removing existing destination: /etc/libvirt/qemu.conf
INFO:__main__:Copying /var/lib/kolla/config_files/qemu.conf to /etc/libvirt/qemu.conf
INFO:__main__:Setting permissions for /etc/libvirt/qemu.conf
INFO:__main__:Writing out command to execute
Running command: '/usr/sbin/libvirtd --listen'

问题排查

1. google

查找到此问题是 kolla-ansible 的一个 bug,详细介绍见 Fix nova-libvirt and nova-compute fails to deploy

2. 依照上文的方法进行修复

# vim /usr/share/kolla/ansible/roles/nova/templates/libvirtd.conf.j2

+listen_tls = 0
listen_tcp = 1
auth_tcp = "none"
ca_file = ""
log_level = 3
log_outputs = "3:file:/var/log/kolla/libvirt/libvirtd.log"
listen_addr = "{{ hostvars[inventory_hostname]['ansible_' + api_interface]['ipv4']['address'] }}"

验证

清除之前的容器重新 deploy 后,nova_compute 和 nova_libvirt 状态正常

# /root/kolla/tools/cleanup-containers
# /root/kolla/tools/cleanup-host
# kolla-ansible deploy

# docker ps

......

cffc9fc2774b kolla/centos-binary-nova-compute:2.0.4 "kolla_start" 5 minutes ago Up 5 minutes nova_compute
5f27c8052238 kolla/centos-binary-nova-libvirt:2.0.4 "kolla_start" 5 minutes ago Up 5 minutes nova_libvirt

问题六: dashboard 无法访问

问题描述

完成以上步骤后,发现控制台无法访问,端口已经监听,浏览器访问报 “504 Gateway Time-out”

问题排查

1. 查看 dashboard 日志

# docker exec -it heka bash
(heka)[heka@allinone /]$ tail -50f /var/log/kolla/horizon/horizon.log
[Mon Jul 16 05:25:13.065432 2018] [core:error] [pid 41] [client 172.16.15.246:59248] End of script output before headers: django.wsgi
[Mon Jul 16 05:31:23.408902 2018] [core:error] [pid 43] [client 172.16.15.227:57733] End of script output before headers: django.wsgi
[Mon Jul 16 05:31:33.443843 2018] [core:error] [pid 40] [client 172.16.15.246:36708] End of script output before headers: django.wsgi

2. dashboard无法访问的问题

之前使用 packstack 安装M版也遇到 dashboard 无法访问的问题,问题和此问题一致。详情可以参考 Openstack Mitaka: can not access dashboard(internal server 500)

3. 修改dashboard文件,加入以下内容

# vim /etc/kolla/horizon/horizon.conf

WSGIScriptReloading On
WSGIDaemonProcess horizon-http processes=5 threads=1 user=horizon group=horizon display-name=%{GROUP} python-path=/usr/lib/python2.7/site-packages
WSGIProcessGroup horizon-http
+ WSGIApplicationGroup %{GLOBAL}

为了使 clean 之后还能使用,还需要修改以下部分

# vim /usr/share/kolla/ansible/roles/horizon/templates/horizon.conf.j2

WSGIScriptReloading On
WSGIDaemonProcess horizon-http processes=5 threads=1 user=horizon group=horizon display-name=%{GROUP} python-path={{ python_path }}
WSGIProcessGroup horizon-http
+ WSGIApplicationGroup %{GLOBAL}

验证

重启容器后验证 dashboard 能否打开

# docker restart horizon

问题七: dashboard 登陆报错

问题描述

可以正常打开 dashboard 了,但是登陆后有如下错误

排查问题

1. 查看 dashboard 日志

# docker exec -it heka bash

(heka)[heka@allinone /]$ tail -50f /var/log/kolla/horizon/horizon.log
……
[Mon Jul 16 09:29:07.368340 2018] [:error] [pid 19] File "/usr/lib/python2.7/site-packages/django/template/base.py", line 905, in render
[Mon Jul 16 09:29:07.368353 2018] [:error] [pid 19] bit = self.render_node(node, context)
[Mon Jul 16 09:29:07.368373 2018] [:error] [pid 19] File "/usr/lib/python2.7/site-packages/django/template/base.py", line 919, in render_node
[Mon Jul 16 09:29:07.368387 2018] [:error] [pid 19] return node.render(context)
[Mon Jul 16 09:29:07.368399 2018] [:error] [pid 19] File "/usr/lib/python2.7/site-packages/django/templatetags/i18n.py", line 145, in render
[Mon Jul 16 09:29:07.368412 2018] [:error] [pid 19] result = translation.ungettext(singular, plural, count)
[Mon Jul 16 09:29:07.368425 2018] [:error] [pid 19] File "/usr/lib/python2.7/site-packages/django/utils/translation/__init__.py", line 88, in ungettext
[Mon Jul 16 09:29:07.368438 2018] [:error] [pid 19] return _trans.ungettext(singular, plural, number)
[Mon Jul 16 09:29:07.368451 2018] [:error] [pid 19] File "/usr/lib/python2.7/site-packages/django/utils/translation/trans_real.py", line 381, in ungettext
[Mon Jul 16 09:29:07.368464 2018] [:error] [pid 19] return do_ntranslate(singular, plural, number, 'ungettext')
[Mon Jul 16 09:29:07.368506 2018] [:error] [pid 19] File "/usr/lib/python2.7/site-packages/django/utils/translation/trans_real.py", line 358, in do_ntranslate
[Mon Jul 16 09:29:07.368550 2018] [:error] [pid 19] return getattr(t, translation_function)(singular, plural, number)
[Mon Jul 16 09:29:07.368571 2018] [:error] [pid 19] File "/usr/lib64/python2.7/gettext.py", line 411, in ungettext
[Mon Jul 16 09:29:07.368585 2018] [:error] [pid 19] tmsg = self._catalog[(msgid1, self.plural(n))]
[Mon Jul 16 09:29:07.368597 2018] [:error] [pid 19] AttributeError: DjangoTranslation instance has no attribute 'plural'

2. 查看 dashboard 国际化文件夹

发现中文等目录中内容都为空,对比正常环境得知此目录中应该有 LC_MESSAGES 文件,用于生成国际化文件。

# docker exec -it horizon bash

(horizon)[root@allinone ~]# ls /usr/lib/python2.7/site-packages/openstack_dashboard/locale/zh_CN

3. 启动容器,模拟 horizon 镜像的生成

以 kolla/centos-binary-openstack-base:2.0.4 为镜像创建一个容器

# docker run -it kolla/centos-binary-openstack-base:2.0.4 /bin/bash

在其中执行安装生成 horizon 镜像的操作

# vim /usr/share/kolla/docker/horizon/Dockerfile.j2

RUN yum -y install \
openstack-dashboard \
httpd \
mod_wsgi \
gettext \
&& yum clean all \
&& useradd --user-group horizon \
&& sed -i -r 's,^(Listen 80),#\1,' /etc/httpd/conf/httpd.conf \
&& ln -s /usr/share/openstack-dashboard/openstack_dashboard /usr/lib/python2.7/site-packages/openstack_dashboard \
&& ln -s /usr/share/openstack-dashboard/static /usr/lib/python2.7/site-packages/static \
&& chown -R horizon: /etc/openstack-dashboard /usr/share/openstack-dashboard \
&& chown -R apache: /usr/share/openstack-dashboard/static
……

最终发现在执行了第一步安装 openstack-dashboard 后就没有在 /usr/share/openstack-dashboard/openstack_dashboard/locale 目录中生成对应的 LC_MESSAGES 文件

4. 使用 rpm 命令安装

直接使用 rpm 安装可以生成对应的国际化文件

()[root@e06f6d94adba ~]# yum remove openstack-dashboard -y
()[root@e06f6d94adba ~]# rpm -ivh http://mirror.neu.edu.cn/centos/7/cloud/x86_64/openstack-mitaka/openstack-dashboard-9.0.1-1.el7.noarch.rpm
()[root@e06f6d94adba ~]# ls /usr/share/openstack-dashboard/openstack_dashboard/locale/zh_CN/
LC_MESSAGES

5. 修改 horizon 的 dockerfile

先用 yum 安装 openstack-dashboard,解决依赖问题。在删除,使用 rpm 安装 openstack-dashboard

# vim /usr/share/kolla/docker/horizon/Dockerfile.j2

RUN yum -y install \
openstack-dashboard \
httpd \
mod_wsgi \
gettext \
+ && rpm -e openstack-dashboard && rm -rf /etc/openstack-dashboard/local_settings.rpmsave \
+ && rpm -ivh http://mirror.neu.edu.cn/centos/7/cloud/x86_64/openstack-mitaka/openstack-dashboard-9.0.1-1.el7.noarch.rpm \
&& yum clean all \

6. 重新 build horizon镜像

# kolla-build --base centos -t binary horizon

验证

重新 deploy 后,dashboard 可以正常打开,登陆后可以正常显示

-------------本文结束感谢阅读-------------
0%