---
title: Kubernetes Persistent Volumes and RBD
date: 2017-12-03 15:00:00
---

Since the number of stuff I'm deploying on my small Kubernetes cluster is
increasing and manually managing the volumes is beginning to be a pain, I
decided to start learning about the Storage Classes, Persistent Volumes
and Volume claims.

Even if at first it seems to be intimidating, it was really easy to integrate
them with my small Ceph cluster that I also play with.

# Ceph
On the Ceph side, the configuration consists of creating a new pool and
user that will be used by our Kubernetes cluster.

* First create a new pool
`ceph osd pool create kubernetes 64 64`
* Then, to reduce compatibility problems, I decided to reduce the features
to the bare minimum
`rbd feature disable --pool kubernetes exclusive-lock object-map fast-diff deep-flatten`
* Once the pool is created, I created a new client key that will be used
to provision and claim volumes that will be stored in this pool
`ceph auth get-or-create-key client.kubernetes`
* We need to add the correct capabilities to this new client so that it
can create new images, handle the locks and retrieve the images. The `rbd`
profile automatically allow these operations.
`ceph auth caps client.kubernetes mon "profile rbd" osd "profile rbd pool=kubernetes`
* Then, we export the key in base64 to be inserted shortly in the Kubernetes
storage class configuration.
`ceph auth get client.kubernetes | grep key | awk '{print $3}' | base64`

That's all for the Ceph part of the storage configuration. Easy until now no ?

# Storage class
In Kubernetes, a Storage Class is a way to configure the storage that is
available and can be used by the Persistent Volumes. It's really an easy
way to describe the storage so that you don't have to worry about it when
creating new pods.

I created a new file that contains everything needed for the configuration
of a new `rbd` storage class in my cluster. I will describe it part by
part, but you can merge everything into one file to apply it with `kubectl`.

```yaml
kind: ServiceAccount
apiVersion: v1
metadata:
  name: rbd-provisioner
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: rbd-provisioner
subjects:
- kind: ServiceAccount
  name: rbd-provisioner
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: system:controller:persistent-volume-binder
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: rbd-provisioner
  namespace: kube-system
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: rbd-provisioner
    spec:
      containers:
      - name: rbd-provisioner
        image: "quay.io/external_storage/rbd-provisioner:v0.1.0"
      serviceAccountName: rbd-provisioner
```

A rbd provisioner pod and it's related service accounts, based on the
[RBD Volume Provisioner for Kubernetes 1.5+ incubator project]
(https://github.com/kubernetes-incubator/external-storage/tree/master/ceph/rbd/deploy/rbac).

Not much to add for now on this part. Let's look into the storage class
configuration.

```yaml
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: rbd
provisioner: ceph.com/rbd
parameters:
  monitors: 10.42.100.1:6789,10.42.100.2:6789,10.42.100.3:6789
  adminId: kubernetes
  adminSecretName: ceph-secret
  adminSecretNamespace: kube-system
  pool: kubernetes
  userId: kubernetes
  userSecretName: ceph-secret-user
reclaimPolicy: Retain
---
apiVersion: v1
kind: Secret
metadata:
  name: ceph-secret
  namespace: kube-system
type: kubernetes.io/rbd
data:
  key: QV[...]QPo=
---
apiVersion: v1
kind: Secret
metadata:
  name: ceph-secret-user
  namespace: default
type: kubernetes.io/rbd
data:
  key: QV[...]QPo=
```

This is the part where the storage is described. Update the `monitors` to
match your Ceph configuration and the secrets to match the key you got from
the last `ceph auth get client.kubernetes | grep key | awk '{print $3}' | base64`
command.

Here, I cheated a little and used the same client for both the administration
and the user part of the storage, in part because I didn't want to bother
with the capabilities needed for each.

Once everything seems correct, you can save the file or files and apply
the configuration on the Kubernetes cluster with `kubectl apply -f ceph-rbd.yaml`
(or the name of your file).

And that's all for the configuration ... We can check that everything is
working with `kubectl get sc,deploy,po -n kube-system`

```sh
NAME                 PROVISIONER
storageclasses/rbd   ceph.com/rbd

NAME                                      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
[...]
deploy/rbd-provisioner                    1         1         1            1           5m

NAME                                                   READY     STATUS    RESTARTS   AGE
[...]
po/rbd-provisioner-5cc5947c77-xdcn5                    1/1       Running   0          5m
```

There should be a `rbd-provisioner` deployment with everything as desired,
a `rbd-provisioner-...-...` pod running and a `storageclasses/rbd` storage
class with the correct provisioner.

# PersistentVolumeClaim and Volumes

Now to the usage in the deployments :

```yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: myservice-data-claim
spec:
  storageClassName: rbd
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
---
kind: Pod
apiVersion: v1
metadata:
  name: myservice-pod
spec:
  volumes:
    - name: myservice-data
      persistentVolumeClaim:
       claimName: myservice-data-claim
  containers:
    - name: myservice-cont
      image: nginx
      ports:
        - containerPort: 80
          name: "http-server"
      volumeMounts:
        - mountPath: "/usr/share/nginx/html"
          name: myservice-data
```

If the volume does not yet exist, a new image will be automatically created
on Ceph, this image will be formatted (by default in `ext4`) and mounted.
If it exists, it will simply be mounted.

All in all, two hours were sufficient to migrate from volume manually
created and managed to a storage class and volume claims. I learnt that
even though Kubernetes can really look hard and scary at first,
 verything is there to help you with your stuff.

# Possible errors
## Filesystem error - Access denied
By default, the pods will have access to the newly generated filesystems.
If you start them with `securityContext` parameters, you can put them in
a state where the user the container is running at does not have access
to the filesystem content, either as read or write.

## Image is locked by other nodes
If like me you battle with `rbd: image is locked by other nodes`
errors when a pod is migrated between nodes, it usually means that the client
you created doesn't have the capabilities to remove locks after detaching.
I fixed that simply by setting the caps to the profile instead of configuring
manually the `rwx` operations :
`ceph auth caps client.myclient mon "profile rbd" osd "profile rbd pool=mypool"`