One of the first steps when you work with data which must be stored is to estimate the disk size. This is really hard, especially when your application is new.
Working in a cloud environment gives us the flexibility of using the resources that we need in almost any moment without wasting our money in resources that we will use in the future, if everything goes right.
If your application is working in a cloud virtual machine and it needs to store some data, the approach would be to attach a new disk for this data, instead of using the default disk of the virtual machine. This new disk could be used by other virtual machine if the current one suffers any problem, without loosing the data.
Attaching new disks to our cloud virtual machines is very common but, what would happen if we work with containers?
Some new concepts are introduced when we work with containers, using for example Kubernetes, in a cloud environment. The application will run, directly, in containers, although the containers will be in the virtual machines, and the persistent data will be stored in persistent volumes, although this volumes will be in attached disks.
It is possible to resize a persistent volume in Kubernetes since version 1.11, but it is not supported by all the cloud providers. Therefore, what should we do if we need a bigger volume?
It is not possible in Azure AKS at this moment.
The first idea that comes up is to attach a new volume to the container, like we would do with a disk and a virtual machine. But it is not possible to attach new volumes to running containers.
So we think in stopping the application for a while and migrate the data between volumes. How can we do it?
The examples are based on Azure AKS.
Delete the Pod
The first step should be to stop the pod in order to avoid writes in the middle of the migration which could cause problems.
$ kubectl delete -f myapp.yml
I guess that the definition of your current deployment/replicaset/pod is in myapp.yml
Bigger Volume
The second step is to provide a bigger volume.
pvc.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-claim
spec:
accessModes:
- ReadWriteOnce
storageClassName: managed-premium
resources:
requests:
storage: 100Gi
---
+ apiVersion: v1
+ kind: PersistentVolumeClaim
+ metadata:
+ name: my-new-claim
+ spec:
+ accessModes:
+ - ReadWriteOnce
+ storageClassName: managed-premium
+ resources:
+ requests:
+ storage: 150Gi
A new disk will be provisioned by the cloud provider, Azure in this case.
$ kubectl apply -f pvc.yml
Start a helper Pod
The next step is to run a helper pod where we will attach the two volumes and it will be used to execute the migration of the data.
helper.yml
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: helper
spec:
replicas: 1
template:
metadata:
name: helper
labels:
app: helper
spec:
containers:
- name: helper
image: ubuntu
command:
- "/bin/sleep"
- "3600"
volumeMounts:
- name: my-new-pv
mountPath: /data/new
- name: my-pv
mountPath: /data/old
volumes:
- name: my-pv
persistentVolumeClaim:
claimName: my-claim
- name: my-new-pv
persistentVolumeClaim:
claimName: my-new-claim
A ubuntu container will be started and two volumes will be attached to it. The current data will be found in /data/old and the new one, /data/new will be empty. The main process is a sleep command which will take one hour. If you need more time to execute your tasks, increase it.
$ kubectl apply -f helper.yml
Once the container is up and running, we will connect to it:
$ kubectl exec -it <name of the pod> bash
And we will execute the migration, for example, copy all the data from old volume to the new one:
$ cp -r /data/old /data/new
Finally, we can delete the helper:
$ kubectl delete -f helper.yml
Run the application with the new volume
Once the migration has finished, we can start the application pointing to the new volume.
myapp.yml
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 1
template:
metadata:
name: my-app
labels:
app: my-app
spec:
containers:
- name: my-app
image: myapp
env:
- name: DATA
value: /var/data
volumeMounts:
+ - name: my-new-pv
+ mountPath: /var/data
- - name: my-pv
- mountPath: /var/data
ports:
- containerPort: 8888
name: my-port
volumes:
+ - name: my-new-pv
+ persistentVolumeClaim:
+ claimName: my-new-claim
- - name: my-pv
- persistentVolumeClaim:
- claimName: my-claim
$ kubectl apply -f myapp.yml
Remove the old Persistent Volume
The persistent volume claim requests a disk to the cloud provider and it costs money so, if we are not going to use the old volume anymore, we should delete it:
$ kubectl delete pvc my-claim
Conclusion
Working in the cloud is very flexible but we always have to take into account the limitations or the future problems that we can have, besides working with containers increases the flexibility but also increases the complexity of the environment.