--- title: Kubernetes Persistent Volumes and RBD date: 2017-12-03 15:00:00 --- Since the number of stuff I'm deploying on my small Kubernetes cluster is increasing and manually managing the volumes is beginning to be a pain, I decided to start learning about the Storage Classes, Persistent Volumes and Volume claims. Even if at first it seems to be intimidating, it was really easy to integrate them with my small Ceph cluster that I also play with. # Ceph On the Ceph side, the configuration consists of creating a new pool and user that will be used by our Kubernetes cluster. * First create a new pool `ceph osd pool create kubernetes 64 64` * Then, to reduce compatibility problems, I decided to reduce the features to the bare minimum `rbd feature disable --pool kubernetes exclusive-lock object-map fast-diff deep-flatten` * Once the pool is created, I created a new client key that will be used to provision and claim volumes that will be stored in this pool `ceph auth get-or-create-key client.kubernetes` * We need to add the correct capabilities to this new client so that it can create new images, handle the locks and retrieve the images. The `rbd` profile automatically allow these operations. `ceph auth caps client.kubernetes mon "profile rbd" osd "profile rbd pool=kubernetes` * Then, we export the key in base64 to be inserted shortly in the Kubernetes storage class configuration. `ceph auth get client.kubernetes | grep key | awk '{print $3}' | base64` That's all for the Ceph part of the storage configuration. Easy until now no ? # Storage class In Kubernetes, a Storage Class is a way to configure the storage that is available and can be used by the Persistent Volumes. It's really an easy way to describe the storage so that you don't have to worry about it when creating new pods. I created a new file that contains everything needed for the configuration of a new `rbd` storage class in my cluster. I will describe it part by part, but you can merge everything into one file to apply it with `kubectl`. ```yaml kind: ServiceAccount apiVersion: v1 metadata: name: rbd-provisioner namespace: kube-system --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: rbd-provisioner subjects: - kind: ServiceAccount name: rbd-provisioner namespace: kube-system roleRef: kind: ClusterRole name: system:controller:persistent-volume-binder apiGroup: rbac.authorization.k8s.io --- apiVersion: extensions/v1beta1 kind: Deployment metadata: name: rbd-provisioner namespace: kube-system spec: replicas: 1 template: metadata: labels: app: rbd-provisioner spec: containers: - name: rbd-provisioner image: "quay.io/external_storage/rbd-provisioner:v0.1.0" serviceAccountName: rbd-provisioner ``` A rbd provisioner pod and it's related service accounts, based on the [RBD Volume Provisioner for Kubernetes 1.5+ incubator project] (https://github.com/kubernetes-incubator/external-storage/tree/master/ceph/rbd/deploy/rbac). Not much to add for now on this part. Let's look into the storage class configuration. ```yaml --- kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: rbd provisioner: ceph.com/rbd parameters: monitors: 10.42.100.1:6789,10.42.100.2:6789,10.42.100.3:6789 adminId: kubernetes adminSecretName: ceph-secret adminSecretNamespace: kube-system pool: kubernetes userId: kubernetes userSecretName: ceph-secret-user reclaimPolicy: Retain --- apiVersion: v1 kind: Secret metadata: name: ceph-secret namespace: kube-system type: kubernetes.io/rbd data: key: QV[...]QPo= --- apiVersion: v1 kind: Secret metadata: name: ceph-secret-user namespace: default type: kubernetes.io/rbd data: key: QV[...]QPo= ``` This is the part where the storage is described. Update the `monitors` to match your Ceph configuration and the secrets to match the key you got from the last `ceph auth get client.kubernetes | grep key | awk '{print $3}' | base64` command. Here, I cheated a little and used the same client for both the administration and the user part of the storage, in part because I didn't want to bother with the capabilities needed for each. Once everything seems correct, you can save the file or files and apply the configuration on the Kubernetes cluster with `kubectl apply -f ceph-rbd.yaml` (or the name of your file). And that's all for the configuration ... We can check that everything is working with `kubectl get sc,deploy,po -n kube-system` ```sh NAME PROVISIONER storageclasses/rbd ceph.com/rbd NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE [...] deploy/rbd-provisioner 1 1 1 1 5m NAME READY STATUS RESTARTS AGE [...] po/rbd-provisioner-5cc5947c77-xdcn5 1/1 Running 0 5m ``` There should be a `rbd-provisioner` deployment with everything as desired, a `rbd-provisioner-...-...` pod running and a `storageclasses/rbd` storage class with the correct provisioner. # PersistentVolumeClaim and Volumes Now to the usage in the deployments : ```yaml kind: PersistentVolumeClaim apiVersion: v1 metadata: name: myservice-data-claim spec: storageClassName: rbd accessModes: - ReadWriteOnce resources: requests: storage: 5Gi --- kind: Pod apiVersion: v1 metadata: name: myservice-pod spec: volumes: - name: myservice-data persistentVolumeClaim: claimName: myservice-data-claim containers: - name: myservice-cont image: nginx ports: - containerPort: 80 name: "http-server" volumeMounts: - mountPath: "/usr/share/nginx/html" name: myservice-data ``` If the volume does not yet exist, a new image will be automatically created on Ceph, this image will be formatted (by default in `ext4`) and mounted. If it exists, it will simply be mounted. All in all, two hours were sufficient to migrate from volume manually created and managed to a storage class and volume claims. I learnt that even though Kubernetes can really look hard and scary at first, verything is there to help you with your stuff. # Possible errors ## Filesystem error - Access denied By default, the pods will have access to the newly generated filesystems. If you start them with `securityContext` parameters, you can put them in a state where the user the container is running at does not have access to the filesystem content, either as read or write. ## Image is locked by other nodes If like me you battle with `rbd: image is locked by other nodes` errors when a pod is migrated between nodes, it usually means that the client you created doesn't have the capabilities to remove locks after detaching. I fixed that simply by setting the caps to the profile instead of configuring manually the `rwx` operations : `ceph auth caps client.myclient mon "profile rbd" osd "profile rbd pool=mypool"`