Cephadm on Ubuntu 22.04

Intro

Mission: Deploy new Ceph poduction-ready cluster on 4 new hardware nodes.

Overview

I had zero experience with the current cephadm orchestrator for ceph. But time flies and ceph-ansible is under further deprecation, so let old dog learn new tricks.

Requirements

Nothing new here:

  • JBOD for osds
  • raid1 for 2 OS disks (mdraid is fine)
  • minimum 3 nodes (and maybe around 15 maximum for the sake of possible rebalance)
  • at least 2 network cards with 2×10 gbps ports (better 25 gbps) on each node
  • configured lacp on network switches
  • at least 2xOSD vcpu and 4xOSD ram on each OSD node
  • proxy docker registry to quay.io accessible from all nodes
  • 2 separate networks – ceph-internal / ceph-cluster (with jumbo frames and mtu 9000)
  • accessible corporate ntp/dns servers, IPMI access if something goes wrong

Preparation

Step 1.

  • provision all hosts, install Ubuntu 22.04
  • configure hostnames, /etc/hosts localhost records
  • configure chronyd and systemd-resolve. Here we configure resolv.conf to follow changes from netplansudo rm -f /etc/resolv.conf
    sudo ln -s /run/resolvconf/resolv.conf /etc/resolv.conf
  • configure netplan. Pay attention to dhcp-dns, mtu and other configuration options:network:
    bonds:
    bond0:
    interfaces:
    - enp180s0f0np0
    - enp180s0f1np1
    parameters:
    lacp-rate: fast
    mode: 802.3ad
    transmit-hash-policy: layer2+3
    bond1:
    interfaces:
    - enp179s0f0np0
    - enp179s0f1np1
    parameters:
    lacp-rate: fast
    mode: 802.3ad
    transmit-hash-policy: layer2+3
    ethernets:
    enp179s0f0np0:
    dhcp4: true
    mtu: 9000
    enp179s0f1np1:
    dhcp4: true
    mtu: 9000
    enp180s0f0np0:
    dhcp4: true
    dhcp4-overrides:
    use-dns: false
    enp180s0f1np1:
    dhcp4: true
    dhcp4-overrides:
    use-dns: false
    version: 2
    vlans:
    bond0.10:
    id: 10
    link: bond0
    addresses:
    - 10.10.10.10/24
    dhcp4-overrides:
    use-dns: false
    nameservers:
    addresses:
    - 10.10.10.200
    - 10.10.10.220
    search:
    - "mycompany.cloud"
    routes:
    - to: default
    via: 10.10.10.1
    bond1.20:
    id: 20
    link: bond1
    addresses:
    - 10.10.20.11/24
    mtu: 9000
  • apply netplan configuration, verify that everything is finenetplan apply
    ip -4 a; ip l;
    timedatectl status
    dig ceph-02.mycompany.cloud
    OR
    nslookup ceph-02
  • generate ssh id_rsa for the first host and copy it to all othersssh ceph-01
    ssh-keygen
    echo "ssh-rsa ....." >> /.ssh/authorized_keys
    ssh ceph-02; echo "ssh-rsa ....." >> /.ssh/authorized_keys
    ssh ceph-03; echo "ssh-rsa ....." >> /.ssh/authorized_keys
  • configure correct apt repositories on all nodesvi /etc/apt/sources.list
    apt update
  • configure correct docker-apt repository (add docker gpg key even if you use proxy repo)curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
    sudo chmod a+r /etc/apt/keyrings/docker.gpg
  • install docker.io and other dependenciesapt install -y docker.io lvm2 python3

Bootstraping cluster

We are going to bootstrap a new cluster starting from the first node. This node will be our “admin” node. So on the first ceph node do:apt install -y cephadm ceph-common
cephadm bootstrap --mon-ip 10.10.10.50 --log-to-file --registry-url 10.10.10.70:5002 --registry-username docker --registry-password password --cluster-network 10.10.20.0/24
Wait for some time and check the status of clusterceph -s
ceph orch host ls

Adding new hosts

First of all, let’s make new mons unmanaged. I prefer to know on which nodes my osds are located.ssh ceph-01
ceph orch apply mon --unmanaged
Next, lets add other hosts.
First, copy ssh-key of our adm node to them:ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph-01
ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph-02
ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph-03
Next, add new hosts via orchestrator. I recommend to wait a little time before proceeding to next host if you use single proxy registry. Note: it’s crucial to use correct existing hostnames of new nodes here. So they could be in uppercase, with special charact$rs etc..ceph orch host add ceph-02 10.10.10.51
ceph orch host add ceph-03 10.10.10.52
And now add monitorsceph orch daemon add mon ceph-02:10.10.60.51
ceph orch daemon add mon ceph-03:10.10.60.52
Check the statusceph -s
ceph orch host ls

Adding OSDs

It’s recommended to go easy path – just add all available devices as is. It’s working, sure, especially with more or less homogeneous setup.
All your future osds should be listed here as availableceph orch device ls
Apply dry-run firstceph orch apply osd --all-available-devices --dry-run
ceph orch apply osd --all-available-devices

Other way around.
You can add all devices manually, with some advanced configuration:ceph orch daemon add osd ceph-01:data_devices=/dev/sda,/dev/sdb,db_devices=/dev/sdc,osds_per_device=2
Or even use yml configuration for that purpose: docs.ceph.com/drivegroups

Disable auto-provisioning of OSDs

The same thing as with monitors – ceph orchestrator keeps recreating new osds on every ocassion – like, you wipe disk – it creates new osd; you add new drive to host – it creates new osd.
I don’t know why but I believe that quite many system administrators are NOT comfortable with this behavior. So let’s disable itceph orch apply osd --all-available-devices --unmanaged=true
After that, if you want to setup new osds, you will need to do:ceph orch daemon add osd *<host>*:*<path-to-device>*

Removing an OSD

To remove an OSD issue these commandsceph orch osd rm <osd_id(s)> [--replace] --force --zap
ceph orch osd rm status
Also you could manually zap device if you forgot to provide –zap flag:

  • determine your lvs/vgs for drive to zapcephadm shell --fsid <fsid> -- ceph-volume ls
  • zap device via orchceph orch device zap my_hostname /dev/sdx --force
  • OR via ceph-volumecephadm shell --fsid <fsid> -- ceph-volume lvm zap \
    ceph-vgid/osd-block-lvid --destroy

It’s possible that you’ll need also manually delete osd:

  • check if osd is still therecephadm node ls
  • remove osdceph osd rm osd.ID
  • if it’s not sufficient – you can try to delete it from crushmap manuallyceph osd crusn rm osd.31

What else

Straying daemons

Sometimes ceph orch got stuck – not sure why, some stray daemons that it couldn’t find etc.. the only solution that I’ve found is:ceph orch restart mgr
If you cannot perfome it because of only 1 mgr – deploy another one (even temporaly) through ceph orch daemon add mgr HOST:IP After restarting mgrs all duplicates should gone;

Auto-memory tuning

By default, cephadm set osd_memory_target_autotune=true, which is highly unsuitable for heterogeneous or hyperconverged infrastructures. You can check current memory consumption and limit withceph orch ps
You can either place the label on node to prevent memory autotune or set config optionceph orch host label add HOSTNAME _no_autotune_memory
OR
ceph config set osd.123 osd_memory_target_autotune false
ceph config set osd.123 osd_memory_target 16G

Getting logs

Get logs from daemonscephadm logs --name osd.34

Removing crash messages

ceph crash ls
ceph crash archive-all

redhat operations guide cephadm