使用 docker 快速部署 ceph

系统环境

  • 至少需要三台虚拟机或者物理机,这里使用虚拟机
  • 每台虚拟机至少需要两块硬盘(一块系统盘,一块OSD),本例中有三块硬盘
  1. 部署流程(博客使用的markdown解析器不支持流程图使用图片代替)
    liuchengtu.png

  2. 主机规划
    biaoge.png

安装 docker

登录 https://cr.console.aliyun.com/#/accelerator 获取自己的阿里云 docker 加速地址

  1. 安装升级 docker 客户端
# curl -sSL http://acs-public-mirror.oss-cn-hangzhou.aliyuncs.com/docker-engine/internet | sh -
  1. 使用 docker 加速器
    可以通过修改 daemon 配置文件 /etc/docker/daemon.json 来使用加速器,注意修改使用自己的加速地址
# mkdir -p /etc/docker
# tee /etc/docker/daemon.json <<-'EOF'
{
  "registry-mirrors": ["https://******.mirror.aliyuncs.com"]
}
EOF
# systemctl daemon-reload
# systemctl restart docker
# systemctl enable docker

启动 MON

  1. 下载 ceph daemon 镜像
# docker pull ceph/daemon
  1. 启动第一个 mon
    在 node1 上启动第一个 mon,注意修改 MON_IP
# docker run -d \
        --net=host \
        -v /etc/ceph:/etc/ceph \
        -v /var/lib/ceph/:/var/lib/ceph/ \
        -e MON_IP=192.168.3.123 \
        -e CEPH_PUBLIC_NETWORK=192.168.3.0/24 \
        ceph/daemon mon

查看容器

# docker ps
CONTAINER ID        IMAGE               COMMAND                CREATED              STATUS              PORTS               NAMES
b79a02c40296        ceph/daemon         "/entrypoint.sh mon"   About a minute ago   Up About a minute                       sad_shannon

查看集群状态

# docker exec b79a02 ceph -s
    cluster 96ae62d2-2249-4173-9dee-3a7215cba51c
     health HEALTH_ERR
            no osds
     monmap e2: 1 mons at {node01=192.168.3.123:6789/0}
            election epoch 4, quorum 0 node01
        mgr no daemons active 
     osdmap e1: 0 osds: 0 up, 0 in
            flags sortbitwise,require_jewel_osds,require_kraken_osds
      pgmap v2: 64 pgs, 1 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                  64 creating
  1. 复制配置文件
    将 node1 上的配置文件复制到 node02 和 node03,复制的路径包含/etc/ceph和/var/lib/ceph/bootstrap-*下的所有内容。
# ssh root@node2 mkdir -p /var/lib/ceph
# scp -r /etc/ceph root@node2:/etc
# scp -r /var/lib/ceph/bootstrap* root@node2:/var/lib/ceph

# ssh root@node3 mkdir -p /var/lib/ceph
# scp -r /etc/ceph root@node3:/etc
# scp -r /var/lib/ceph/bootstrap* root@node3:/var/lib/ceph
  1. 启动第二个和第三个 mon
    在 node02 上执行以下命令启动 mon,注意修改 MON_IP
# docker run -d \
        --net=host \
        -v /etc/ceph:/etc/ceph \
        -v /var/lib/ceph/:/var/lib/ceph/ \
        -e MON_IP=192.168.3.124 \
        -e CEPH_PUBLIC_NETWORK=192.168.3.0/24 \
        ceph/daemon mon

在 node03 上执行以下命令启动 mon,注意修改 MON_IP

# docker run -d \
        --net=host \
        -v /etc/ceph:/etc/ceph \
        -v /var/lib/ceph/:/var/lib/ceph/ \
        -e MON_IP=192.168.3.125 \
        -e CEPH_PUBLIC_NETWORK=192.168.3.0/24 \
        ceph/daemon mon

查看在 node01 上集群状态

# docker exec b79a02 ceph -s
    cluster 96ae62d2-2249-4173-9dee-3a7215cba51c
     health HEALTH_ERR
            64 pgs are stuck inactive for more than 300 seconds
            64 pgs stuck inactive
            64 pgs stuck unclean
            no osds
     monmap e4: 3 mons at {node01=192.168.3.123:6789/0,node02=192.168.3.124:6789/0,node03=192.168.3.125:6789/0}
            election epoch 12, quorum 0,1,2 node01,node02,node03
        mgr no daemons active 
     osdmap e1: 0 osds: 0 up, 0 in
            flags sortbitwise,require_jewel_osds,require_kraken_osds
      pgmap v2: 64 pgs, 1 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                  64 creating

可以看到三个 mon 已经正确启动

启动 OSD

每台虚拟机准备了两块磁盘作为 osd,分别加入到集群,注意修改磁盘

# docker run -d \
        --net=host \
        -v /etc/ceph:/etc/ceph \
        -v /var/lib/ceph/:/var/lib/ceph/ \
        -v /dev/:/dev/ \
        --privileged=true \
        -e OSD_FORCE_ZAP=1 \
        -e OSD_DEVICE=/dev/sdb \
        ceph/daemon osd_ceph_disk
# docker run -d \
        --net=host \
        -v /etc/ceph:/etc/ceph \
        -v /var/lib/ceph/:/var/lib/ceph/ \
        -v /dev/:/dev/ \
        --privileged=true \
        -e OSD_FORCE_ZAP=1 \
        -e OSD_DEVICE=/dev/sdc \
        ceph/daemon osd_ceph_disk

按照同样方法将 node02 和 node03 的 sdb、sdc 都加入集群

查看集群状态

# docker exec b79a ceph -s
    cluster 96ae62d2-2249-4173-9dee-3a7215cba51c
     health HEALTH_OK
     monmap e4: 3 mons at {node01=192.168.3.123:6789/0,node02=192.168.3.124:6789/0,node03=192.168.3.125:6789/0}
            election epoch 12, quorum 0,1,2 node01,node02,node03
        mgr no daemons active 
     osdmap e63: 6 osds: 6 up, 6 in
            flags sortbitwise,require_jewel_osds,require_kraken_osds
      pgmap v157: 64 pgs, 1 pools, 0 bytes data, 0 objects
            212 MB used, 598 GB / 599 GB avail
                  64 active+clean

可以看到 mon 和 osd 都已经正确配置,切集群状态为 HEALTH_OK

创建 MDS

使用以下命令在 node01 上启动 mds

# docker run -d \
        --net=host \
        -v /etc/ceph:/etc/ceph \
        -v /var/lib/ceph/:/var/lib/ceph/ \
        -e CEPHFS_CREATE=1 \
        ceph/daemon mds

启动 RGW ,并且映射 80 端口

使用以下命令在 node01 上启动 rgw,并绑定 80 端口

# docker run -d \
        -p 80:80 \
        -v /etc/ceph:/etc/ceph \
        -v /var/lib/ceph/:/var/lib/ceph/ \
        ceph/daemon rgw

集群的最终状态

# docker exec b79a02 ceph -s
    cluster 96ae62d2-2249-4173-9dee-3a7215cba51c
     health HEALTH_OK
     monmap e4: 3 mons at {node01=192.168.3.123:6789/0,node02=192.168.3.124:6789/0,node03=192.168.3.125:6789/0}
            election epoch 12, quorum 0,1,2 node01,node02,node03
      fsmap e5: 1/1/1 up {0=mds-node01=up:active}
        mgr no daemons active 
     osdmap e136: 6 osds: 6 up, 6 in
            flags sortbitwise,require_jewel_osds,require_kraken_osds
      pgmap v1460: 136 pgs, 10 pools, 3829 bytes data, 223 objects
            254 MB used, 598 GB / 599 GB avail
                 136 active+clean

参考文章:
使用Docker部署Ceph
Demo: running Ceph in Docker containers

devstack dashboard 开启开发者选项 和 OpenStack Profiler

在ocata的版本中,引入了一个新的“openstack profiler”的面板,启用openstack profiler可以方便的看到访问horizon页面时的API调用情况。如下图所示:
image
下面介绍如何启用 openstack profiler,首先需要一个正常运行的devstack环境,启用方法如下

安装mongoDB

Horizon会将API调用过程的数据都保存到mongodb中,mongodb可以安装在本机,也可以在本机能够访问的任意一台机器上。

  1. 安装软件包

    # yum install mongodb-server mongodb -y
  2. 编辑文件 /etc/mongod.conf 并完成如下动作:
    • 配置 bind_ip 使用本机 ip 或者 0.0.0.0。
      bind_ip = 192.168.3.222
    • 默认情况下,MongoDB会在/var/lib/mongodb/journal 目录下创建几个 1 GB 大小的日志文件。如果你想将每个日志文件大小减小到128MB并且限制日志文件占用的总空间为512MB,配置 smallfiles 的值:
      smallfiles = true
  3. 启动MongoDB 并配置它随系统启动
    # systemctl enable mongod.service
    # systemctl start mongod.service

配置 horizon

  1. 复制文件
    $ cd /opt/stack/horizon
    $ cp openstack_dashboard/contrib/developer/enabled/_9001_developer.py openstack_dashboard/local/enabled/
    $ cp openstack_dashboard/contrib/developer/enabled/_9030_profiler.py openstack_dashboard/local/enabled/
    $ cp openstack_dashboard/contrib/developer/enabled/_9010_preview.py openstack_dashboard/local/enabled/
    $ cp openstack_dashboard/local/local_settings.d/_9030_profiler_settings.py.example openstack_dashboard/local/local_settings.d/_9030_profiler_settings.py
  2. 编辑 _9030_profiler_settings.py 文件,修改 mongoDB 相关配置
    修改 OPENSTACK_HOST 为mongoDB所在地址

    $ vim openstack_dashboard/local/local_settings.d/_9030_profiler_settings.py
    
    OPENSTACK_PROFILER.update({
      'enabled': True,
      'keys': ['SECRET_KEY'],
      'notifier_connection_string': 'mongodb://192.168.3.222:27017',
      'receiver_connection_string': 'mongodb://192.168.3.222:27017'
    })
  3. 重启 horizon,重新登录 dashboard ,会发现右上角有一个 Profile 下拉菜单,如下图:
    image
    如果要获取当前页面的API调用数据,点击 Profile Current Page 会重新刷新页面,加载完成后,到 Developer 下面的 OpenStack Profiler 页面就会看到页面加载过程的详细数据。

参考文章:
孔令贤-OpenStack Horizon Profiling
OpenStack Installation Guide for Red Hat Enterprise Linux and CentOS

ssh 无密码访问的问题

ssh 无密码登录失败

虚拟机 resize 需要配置计算节点之间 nova 用户无密码访问,但是在配置过程中有一台始终不能用密钥登录,对比了正常可以无密码登录的日志如下。

# 正常登录
debug2: we did not send a packet, disable method
debug3: authmethod_lookup publickey
debug3: remaining preferred: keyboard-interactive,password
debug3: authmethod_is_enabled publickey
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /var/lib/nova/.ssh/id_rsa
debug3: send_pubkey_test
debug2: we sent a publickey packet, wait for reply
debug1: Server accepts key: pkalg ssh-rsa blen 279
# 异常报错
debug2: we did not send a packet, disable method
debug3: authmethod_lookup publickey
debug3: remaining preferred: keyboard-interactive,password
debug3: authmethod_is_enabled publickey
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /var/lib/nova/.ssh/id_rsa
debug3: send_pubkey_test
debug2: we sent a publickey packet, wait for reply
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic,password
debug1: Trying private key: /var/lib/nova/.ssh/id_dsa
debug3: no such identity: /var/lib/nova/.ssh/id_dsa: No such file or directory
debug1: Trying private key: /var/lib/nova/.ssh/id_ecdsa
debug3: no such identity: /var/lib/nova/.ssh/id_ecdsa: No such file or directory
debug1: Trying private key: /var/lib/nova/.ssh/id_ed25519
debug3: no such identity: /var/lib/nova/.ssh/id_ed25519: No such file or directory
debug2: we did not send a packet, disable method
debug3: authmethod_lookup password
debug3: remaining preferred: ,password
debug3: authmethod_is_enabled password
debug1: Next authentication method: password

分析问题

  1. 找个一个类似报错的 CentOS SSH公钥登录问题 ,文中是由于seliunx导致的,我查看了本地的selinux发现已经关闭,不适用我的情况

  2. 使用 journalctl _COMM=sshd 命令查看日志,发现如下权限问题
May 10 17:11:11 compute01 sshd[26498]: pam_systemd(sshd:session): Failed to release session: Interrupted system call
May 10 17:11:11 compute01 sshd[26498]: pam_unix(sshd:session): session closed for user root
May 10 17:12:28 compute01 sshd[2297]: Authentication refused: bad ownership or modes for directory /var/lib/nova
May 10 17:13:09 compute01 sshd[2297]: Connection closed by 192.168.101.105 [preauth]
May 10 17:13:33 compute01 sshd[4103]: Authentication refused: bad ownership or modes for directory /var/lib/nova
May 10 17:25:21 compute01 sshd[23157]: Authentication refused: bad ownership or modes for directory /var/lib/nova
May 10 17:25:25 compute01 sshd[23157]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=compute02  user=nova
  1. 对比无问题主机的 /var/lib/nova 权限
正常主机
drwxr-xr-x   8 nova    nova     118 May 10 16:59 nova
异常主机
drwxrwxrwx. 11 nova           nova            4096 May 10 17:07 nova
  1. 解决办法
    修改 /var/lib/nova 目录权限为 755 后,可以正常无密码登录
# chmod -R 755 /var/lib/nova/

openstack HA模式下控制台无法访问的问题

控制台无法访问,多次刷新才能访问,nova有如下报错

2017-02-09 17:09:51.311 57467 INFO nova.console.websocketproxy [-] 192.168.170.41 - - [09/Feb/2017 17:09:51] "GET /websockify HTTP/1.1" 101 -
2017-02-09 17:09:51.312 57467 INFO nova.console.websocketproxy [-] 192.168.170.41 - - [09/Feb/2017 17:09:51] 192.168.170.41: Plain non-SSL (ws://) WebSocket connection
2017-02-09 17:09:51.313 57467 INFO nova.console.websocketproxy [-] 192.168.170.41 - - [09/Feb/2017 17:09:51] 192.168.170.41: Version hybi-13, base64: 'False'
2017-02-09 17:09:51.313 57467 INFO nova.console.websocketproxy [-] 192.168.170.41 - - [09/Feb/2017 17:09:51] 192.168.170.41: Path: '/websockify'
2017-02-09 17:09:51.382 57467 INFO nova.console.websocketproxy [req-f51929d9-8c9b-4df0-abeb-247ce6ef5d65 - - - - -] handler exception: The token '1dfc9af9-8a49-44b3-a955-5196197bc8f7' is invalid or has expired

原因分析

When running a multi node environment with HA between two or more controller nodes(or controller plane service nodes), nova consoleauthservice must be configured with memcached.  
If not, no more than one consoleauth service can berunning in active state, since it need to save the state of the sessions. Whenmemcached is not used, you can check that can connect to the vnc console only afew times when you refresh the page. If that occurs means that the connectionis handled by the consoleauth service that currently is issuing sessions.    
To solve your issue, configure memcached as backend tonova-consoleauth service.  
To solve your issue add this line to nova.conf:  
memcached_servers = 192.168.100.2:11211,192.168.100.3:11211  
This should work to solve your issue.  

解决

M版在增加memcached_servers选项

# vim /etc/nova/nova.conf

[DEFAULT]
# "memcached_servers" opt is deprecated in Mitaka. In Newton release oslo.cache
# config options should be used as this option will be removed. Please add a
# [cache] group in your nova.conf file and add "enable" and "memcache_servers"
# option in this section. (list value)
memcached_servers=controller01:11211,controller02:11211,controller03:11211

如果是N版的话,memcached_servers已经废弃,需要按照如下修改

[cache]
enabled=true
backend=oslo_cache.memcache_pool
memcache_servers=controller01:11211,controller02:11211,controller03:11211

vmware 扩展分区大小

add.png测试中遇到这样的情况,vmware虚拟机中增加的 sdb ,可用空间快要满了,需要增加空间,方法记录如下

  1. 在VMware上扩展磁盘
    在 vmware 虚拟机编辑页面,扩展磁盘的可用空间 (500GB -> 800GB)
    image

  2. 重启虚拟机
    重启后查看分区是否正确识别,可以看到已经正确识别新的分区大小

    # parted /dev/sdb
    GNU Parted 3.1
    Using /dev/sdb
    Welcome to GNU Parted! Type 'help' to view a list of commands.
    (parted) p
    Model: VMware Virtual disk (scsi)
    Disk /dev/sdb: 859GB
    Sector size (logical/physical): 512B/512B
    Partition Table: msdos
    Disk Flags: 
    
    Number  Start   End    Size   Type     File system  Flags
    1      1049kB  537GB  537GB  primary  xfs
    
    (parted) quit
  3. 重新分区
    删除原来分区,重新创建新的分区

    # fdisk /dev/sdb
    Welcome to fdisk (util-linux 2.23.2).
    
    Changes will remain in memory only, until you decide to write them.
    Be careful before using the write command.
    
    Command (m for help): p
    
    Disk /dev/sdb: 859.0 GB, 858993459200 bytes, 1677721600 sectors
    Units = sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk label type: dos
    Disk identifier: 0x634e8675
    
    Device Boot      Start         End      Blocks   Id  System
    /dev/sdb1            2048  1048575999   524286976   83  Linux
    
    Command (m for help): d                     # 删除分区
    Selected partition 1
    Partition 1 is deleted
    
    Command (m for help): n                     # 创建新的分区
    Partition type:
     p   primary (0 primary, 0 extended, 4 free)
     e   extended
    Select (default p): 
    Using default response p
    Partition number (1-4, default 1): 
    First sector (2048-1677721599, default 2048): 
    Using default value 2048
    Last sector, +sectors or +size{K,M,G} (2048-1677721599, default 1677721599): 
    Using default value 1677721599
    Partition 1 of type Linux and of size 800 GiB is set
    
    Command (m for help): w
    The partition table has been altered!
    
    Calling ioctl() to re-read partition table.
    
    WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
    The kernel still uses the old table. The new table will be used at
    the next reboot or after you run partprobe(8) or kpartx(8)
    Syncing disks.
  4. 重启系统,确保分区表被重新读取

  5. 扩展分区
    首先 mount 新分区,然后使用 xfs_growfs 命令扩展分区

    # mount /dev/sdb1 /opt/yum/sample
    # xfs_growfs /dev/sdb1
    meta-data=/dev/sdb1              isize=512    agcount=4, agsize=32767936 blks
           =                       sectsz=512   attr=2, projid32bit=1
           =                       crc=1        finobt=0 spinodes=0
    data     =                       bsize=4096   blocks=131071744, imaxpct=25
           =                       sunit=0      swidth=0 blks
    naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
    log      =internal               bsize=4096   blocks=63999, version=2
           =                       sectsz=512   sunit=0 blks, lazy-count=1
    realtime =none                   extsz=4096   blocks=0, rtextents=0
    data blocks changed from 131071744 to 209714944
    
    #如果分区是 ext4 格式需要使用 resize2fs 命令
  6. 验证操作
    新分区的大小已经变为800G,原来的文件也没有丢失

    # df -h
    Filesystem               Size  Used Avail Use% Mounted on
    ……
    /dev/sdb1                800G  433G  368G  55% /opt/yum/sample