Cephadm on Ubuntu 22.04

I had zero experience with the current cephadm orchestrator for ceph. But time flies and ceph-ansible is under further deprecation, so let old dog learn new tricks.

Requirements

Nothing new here:

JBOD for osds
raid1 for 2 OS disks (mdraid is fine)
minimum 3 nodes (and maybe around 15 maximum for the sake of possible rebalance)
at least 2 network cards with 2×10 gbps ports (better 25 gbps) on each node
configured lacp on network switches
at least 2xOSD vcpu and 4xOSD ram on each OSD node
proxy docker registry to quay.io accessible from all nodes
2 separate networks – ceph-internal / ceph-cluster (with jumbo frames and mtu 9000)
accessible corporate ntp/dns servers, IPMI access if something goes wrong

Preparation

Step 1.

provision all hosts, install Ubuntu 22.04
configure hostnames, /etc/hosts localhost records
configure chronyd and systemd-resolve. Here we configure resolv.conf to follow changes from netplansudo rm -f /etc/resolv.conf sudo ln -s /run/resolvconf/resolv.conf /etc/resolv.conf
configure netplan. Pay attention to dhcp-dns, mtu and other configuration options:network: bonds: bond0: interfaces: - enp180s0f0np0 - enp180s0f1np1 parameters: lacp-rate: fast mode: 802.3ad transmit-hash-policy: layer2+3 bond1: interfaces: - enp179s0f0np0 - enp179s0f1np1 parameters: lacp-rate: fast mode: 802.3ad transmit-hash-policy: layer2+3 ethernets: enp179s0f0np0: dhcp4: true mtu: 9000 enp179s0f1np1: dhcp4: true mtu: 9000 enp180s0f0np0: dhcp4: true dhcp4-overrides: use-dns: false enp180s0f1np1: dhcp4: true dhcp4-overrides: use-dns: false version: 2 vlans: bond0.10: id: 10 link: bond0 addresses: - 10.10.10.10/24 dhcp4-overrides: use-dns: false nameservers: addresses: - 10.10.10.200 - 10.10.10.220 search: - "mycompany.cloud" routes: - to: default via: 10.10.10.1 bond1.20: id: 20 link: bond1 addresses: - 10.10.20.11/24 mtu: 9000
apply netplan configuration, verify that everything is finenetplan apply ip -4 a; ip l; timedatectl status dig ceph-02.mycompany.cloud OR nslookup ceph-02
generate ssh id_rsa for the first host and copy it to all othersssh ceph-01 ssh-keygen echo "ssh-rsa ....." >> /.ssh/authorized_keys ssh ceph-02; echo "ssh-rsa ....." >> /.ssh/authorized_keys ssh ceph-03; echo "ssh-rsa ....." >> /.ssh/authorized_keys
configure correct apt repositories on all nodesvi /etc/apt/sources.list apt update
configure correct docker-apt repository (add docker gpg key even if you use proxy repo)curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg sudo chmod a+r /etc/apt/keyrings/docker.gpg
install docker.io and other dependenciesapt install -y docker.io lvm2 python3

Bootstraping cluster

We are going to bootstrap a new cluster starting from the first node. This node will be our “admin” node. So on the first ceph node do:apt install -y cephadm ceph-common cephadm bootstrap --mon-ip 10.10.10.50 --log-to-file --registry-url 10.10.10.70:5002 --registry-username docker --registry-password password --cluster-network 10.10.20.0/24Wait for some time and check the status of clusterceph -s ceph orch host ls

Adding new hosts

First of all, let’s make new mons unmanaged. I prefer to know on which nodes my osds are located.ssh ceph-01 ceph orch apply mon --unmanagedNext, lets add other hosts.
First, copy ssh-key of our adm node to them:ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph-01 ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph-02 ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph-03Next, add new hosts via orchestrator. I recommend to wait a little time before proceeding to next host if you use single proxy registry. Note: it’s crucial to use correct existing hostnames of new nodes here. So they could be in uppercase, with special charact$rs etc..ceph orch host add ceph-02 10.10.10.51 ceph orch host add ceph-03 10.10.10.52And now add monitorsceph orch daemon add mon ceph-02:10.10.60.51 ceph orch daemon add mon ceph-03:10.10.60.52Check the statusceph -s ceph orch host ls

Adding OSDs

It’s recommended to go easy path – just add all available devices as is. It’s working, sure, especially with more or less homogeneous setup.
All your future osds should be listed here as availableceph orch device lsApply dry-run firstceph orch apply osd --all-available-devices --dry-run ceph orch apply osd --all-available-devices

Other way around.
You can add all devices manually, with some advanced configuration:ceph orch daemon add osd ceph-01:data_devices=/dev/sda,/dev/sdb,db_devices=/dev/sdc,osds_per_device=2Or even use yml configuration for that purpose: docs.ceph.com/drivegroups

Disable auto-provisioning of OSDs

The same thing as with monitors – ceph orchestrator keeps recreating new osds on every ocassion – like, you wipe disk – it creates new osd; you add new drive to host – it creates new osd.
I don’t know why but I believe that quite many system administrators are NOT comfortable with this behavior. So let’s disable it ceph orch apply osd --all-available-devices --unmanaged=trueAfter that, if you want to setup new osds, you will need to do:ceph orch daemon add osd *<host>*:*<path-to-device>*

Removing an OSD

To remove an OSD issue these commandsceph orch osd rm <osd_id(s)> [--replace] --force --zap ceph orch osd rm statusAlso you could manually zap device if you forgot to provide –zap flag:

determine your lvs/vgs for drive to zapcephadm shell --fsid <fsid> -- ceph-volume ls
zap device via orchceph orch device zap my_hostname /dev/sdx --force
OR via ceph-volumecephadm shell --fsid <fsid> -- ceph-volume lvm zap \ ceph-vgid/osd-block-lvid --destroy

It’s possible that you’ll need also manually delete osd:

check if osd is still therecephadm node ls
remove osdceph osd rm osd.ID
if it’s not sufficient – you can try to delete it from crushmap manuallyceph osd crusn rm osd.31

What else

Straying daemons

Sometimes ceph orch got stuck – not sure why, some stray daemons that it couldn’t find etc.. the only solution that I’ve found is:ceph orch restart mgrIf you cannot perfome it because of only 1 mgr – deploy another one (even temporaly) through ceph orch daemon add mgr HOST:IP After restarting mgrs all duplicates should gone;

Auto-memory tuning

By default, cephadm set osd_memory_target_autotune=true, which is highly unsuitable for heterogeneous or hyperconverged infrastructures. You can check current memory consumption and limit withceph orch psYou can either place the label on node to prevent memory autotune or set config optionceph orch host label add HOSTNAME _no_autotune_memory OR ceph config set osd.123 osd_memory_target_autotune false ceph config set osd.123 osd_memory_target 16G

Getting logs

Get logs from daemonscephadm logs --name osd.34

Removing crash messages

ceph crash ls ceph crash archive-all

Links

redhat operations guide cephadm

Written on October 5, 2023

Cephadm on Ubuntu 22.04

Intro

Overview

Requirements

Preparation

Bootstraping cluster

Adding new hosts

Adding OSDs

Disable auto-provisioning of OSDs

Removing an OSD

What else

Straying daemons

Auto-memory tuning

Getting logs

Removing crash messages

Links

By kienvt