Scaling Stateful Applications using Kubernetes Pet Sets and FlexVolumes with Datera Elastic Data Fabric

Editor’s note: today’s guest post is by Shailesh Mittal, Software Architect and Ashok Rajagopalan, Sr Director Product at Datera Inc, talking about Stateful Application provisioning with Kubernetes on Datera Elastic Data Fabric. IntroductionPersistent volumes in Kubernetes are foundational as customers move beyond stateless workloads to run stateful applications. While Kubernetes has supported stateful applications such as MySQL, Kafka, Cassandra, and Couchbase for a while, the introduction of Pet Sets has significantly improved this support. In particular, the procedure to sequence the provisioning and startup, the ability to scale and associate durably by Pet Sets has provided the ability to automate to scale the “Pets” (applications that require consistent handling and durable placement). Datera, elastic block storage for cloud deployments, has seamlessly integrated with Kubernetes through the FlexVolume framework. Based on the first principles of containers, Datera allows application resource provisioning to be decoupled from the underlying physical infrastructure. This brings clean contracts (aka, no dependency or direct knowledge of the underlying physical infrastructure), declarative formats, and eventually portability to stateful applications.While Kubernetes allows for great flexibility to define the underlying application infrastructure through yaml configurations, Datera allows for that configuration to be passed to the storage infrastructure to provide persistence. Through the notion of Datera AppTemplates, in a Kubernetes environment, stateful applications can be automated to scale. Deploying Persistent StoragePersistent storage is defined using the Kubernetes PersistentVolume subsystem. PersistentVolumes are volume plugins and define volumes that live independently of the lifecycle of the pod that is using it. They are implemented as NFS, iSCSI, or by cloud provider specific storage system. Datera has developed a volume plugin for PersistentVolumes that can provision iSCSI block storage on the Datera Data Fabric for Kubernetes pods.The Datera volume plugin gets invoked by kubelets on minion nodes and relays the calls to the Datera Data Fabric over its REST API. Below is a sample deployment of a PersistentVolume with the Datera plugin:  apiVersion: v1  kind: PersistentVolume  metadata:    name: pv-datera-0  spec:    capacity:      storage: 100Gi    accessModes:      – ReadWriteOnce    persistentVolumeReclaimPolicy: Retain    flexVolume:      driver: “datera/iscsi”      fsType: “xfs”      options:        volumeID: “kube-pv-datera-0″        size: “100″        replica: “3”        backstoreServer: “tlx170.tlx.daterainc.com:7717”This manifest defines a PersistentVolume of 100 GB to be provisioned in the Datera Data Fabric, should a pod request the persistent storage.[root@tlx241 /]# kubectl get pvNAME          CAPACITY   ACCESSMODES   STATUS      CLAIM     REASON    AGEpv-datera-0   100Gi        RWO         Available                       8spv-datera-1   100Gi        RWO         Available                       2spv-datera-2   100Gi        RWO         Available                       7spv-datera-3   100Gi        RWO         Available                       4sConfigurationThe Datera PersistenceVolume plugin is installed on all minion nodes. When a pod lands on a minion node with a valid claim bound to the persistent storage provisioned earlier, the Datera plugin forwards the request to create the volume on the Datera Data Fabric. All the options that are specified in the PersistentVolume manifest are sent to the plugin upon the provisioning request.Once a volume is provisioned in the Datera Data Fabric, volumes are presented as an iSCSI block device to the minion node, and kubelet mounts this device for the containers (in the pod) to access it.Using Persistent StorageKubernetes PersistentVolumes are used along with a pod using PersistentVolume Claims. Once a claim is defined, it is bound to a PersistentVolume matching the claim’s specification. A typical claim for the PersistentVolume defined above would look like below:kind: PersistentVolumeClaimapiVersion: v1metadata:  name: pv-claim-test-petset-0spec:  accessModes:    – ReadWriteOnce  resources:    requests:      storage: 100GiWhen this claim is defined and it is bound to a PersistentVolume, resources can be used with the pod specification:[root@tlx241 /]# kubectl get pvNAME          CAPACITY   ACCESSMODES   STATUS      CLAIM                            REASON    AGEpv-datera-0   100Gi      RWO           Bound       default/pv-claim-test-petset-0             6mpv-datera-1   100Gi      RWO           Bound       default/pv-claim-test-petset-1             6mpv-datera-2   100Gi      RWO           Available                                              7spv-datera-3   100Gi      RWO           Available                                              4s[root@tlx241 /]# kubectl get pvcNAME                     STATUS    VOLUME        CAPACITY   ACCESSMODES   AGEpv-claim-test-petset-0   Bound     pv-datera-0   0                        3mpv-claim-test-petset-1   Bound     pv-datera-1   0                        3mA pod can use a PersistentVolume Claim like below:apiVersion: v1kind: Podmetadata:  name: kube-pv-demospec:  containers:  – name: data-pv-demo    image: nginx    volumeMounts:    – name: test-kube-pv1      mountPath: /data    ports:    – containerPort: 80  volumes:  – name: test-kube-pv1    persistentVolumeClaim:      claimName: pv-claim-test-petset-0The result is a pod using a PersistentVolume Claim as a volume. It in-turn sends the request to the Datera volume plugin to provision storage in the Datera Data Fabric.[root@tlx241 /]# kubectl describe pods kube-pv-demoName:       kube-pv-demoNamespace:  defaultNode:       tlx243/172.19.1.243Start Time: Sun, 14 Aug 2016 19:17:31 -0700Labels:     <none>Status:     RunningIP:         10.40.0.3Controllers: <none>Containers:  data-pv-demo:    Container ID: docker://ae2a50c25e03143d0dd721cafdcc6543fac85a301531110e938a8e0433f74447    Image:   nginx    Image ID: docker://sha256:0d409d33b27e47423b049f7f863faa08655a8c901749c2b25b93ca67d01a470d    Port:    80/TCP    State:   Running      Started:  Sun, 14 Aug 2016 19:17:34 -0700    Ready:   True    Restart Count:  0    Environment Variables:  <none>Conditions:  Type           Status  Initialized    True  Ready          True  PodScheduled   TrueVolumes:  test-kube-pv1:    Type:  PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)    ClaimName:   pv-claim-test-petset-0    ReadOnly:    false  default-token-q3eva:    Type:        Secret (a volume populated by a Secret)    SecretName:  default-token-q3eva    QoS Tier:  BestEffortEvents:  FirstSeen LastSeen Count From SubobjectPath Type Reason Message  ——— ——– —– —- ————- ——– —— ——-  43s 43s 1 {default-scheduler } Normal Scheduled Successfully assigned kube-pv-demo to tlx243  42s 42s 1 {kubelet tlx243} spec.containers{data-pv-demo} Normal Pulling pulling image “nginx”  40s 40s 1 {kubelet tlx243} spec.containers{data-pv-demo} Normal Pulled Successfully pulled image “nginx”  40s 40s 1 {kubelet tlx243} spec.containers{data-pv-demo} Normal Created Created container with docker id ae2a50c25e03  40s 40s 1 {kubelet tlx243} spec.containers{data-pv-demo} Normal Started Started container with docker id ae2a50c25e03The persistent volume is presented as iSCSI device at minion node (tlx243 in this case):[root@tlx243 ~]# lsscsi[0:2:0:0]    disk    SMC      SMC2208          3.24  /dev/sda [11:0:0:0]   disk    DATERA   IBLOCK           4.0   /dev/sdb[root@tlx243 datera~iscsi]# mount | grep sdb/dev/sdb on /var/lib/kubelet/pods/6b99bd2a-628e-11e6-8463-0cc47ab41442/volumes/datera~iscsi/pv-datera-0 type xfs (rw,relatime,attr2,inode64,noquota)Containers running in the pod see this device mounted at /data as specified in the manifest:[root@tlx241 /]# kubectl exec kube-pv-demo -c data-pv-demo -it bashroot@kube-pv-demo:/# mount | grep data/dev/sdb on /data type xfs (rw,relatime,attr2,inode64,noquota)Using Pet SetsTypically, pods are treated as stateless units, so if one of them is unhealthy or gets superseded, Kubernetes just disposes it. In contrast, a PetSet is a group of stateful pods that has a stronger notion of identity. The goal of a PetSet is to decouple this dependency by assigning identities to individual instances of an application that are not anchored to the underlying physical infrastructure.A PetSet requires {0..n-1} Pets. Each Pet has a deterministic name, PetSetName-Ordinal, and a unique identity. Each Pet has at most one pod, and each PetSet has at most one Pet with a given identity. A PetSet ensures that a specified number of “pets” with unique identities are running at any given time. The identity of a Pet is comprised of:a stable hostname, available in DNSan ordinal indexstable storage: linked to the ordinal & hostnameA typical PetSet definition using a PersistentVolume Claim looks like below:# A headless service to create DNS recordsapiVersion: v1kind: Servicemetadata:  name: test-service  labels:    app: nginxspec:  ports:  – port: 80    name: web  clusterIP: None  selector:    app: nginx—apiVersion: apps/v1alpha1kind: PetSetmetadata:  name: test-petsetspec:  serviceName: “test-service”  replicas: 2  template:    metadata:      labels:        app: nginx      annotations:        pod.alpha.kubernetes.io/initialized: “true”    spec:      terminationGracePeriodSeconds: 0      containers:      – name: nginx        image: gcr.io/google_containers/nginx-slim:0.8        ports:        – containerPort: 80          name: web        volumeMounts:        – name: pv-claim          mountPath: /data  volumeClaimTemplates:  – metadata:      name: pv-claim      annotations:        volume.alpha.kubernetes.io/storage-class: anything    spec:      accessModes: [ “ReadWriteOnce” ]      resources:        requests:          storage: 100GiWe have the following PersistentVolume Claims available:[root@tlx241 /]# kubectl get pvcNAME                     STATUS    VOLUME        CAPACITY   ACCESSMODES   AGEpv-claim-test-petset-0   Bound     pv-datera-0   0                        41mpv-claim-test-petset-1   Bound     pv-datera-1   0                        41mpv-claim-test-petset-2   Bound     pv-datera-2   0                        5spv-claim-test-petset-3   Bound     pv-datera-3   0                        2sWhen this PetSet is provisioned, two pods get instantiated:[root@tlx241 /]# kubectl get podsNAMESPACE     NAME                        READY     STATUS    RESTARTS   AGEdefault       test-petset-0               1/1       Running   0          7sdefault       test-petset-1               1/1       Running   0          3sHere is how the PetSet test-petset instantiated earlier looks like:[root@tlx241 /]# kubectl describe petset test-petsetName: test-petsetNamespace: defaultImage(s): gcr.io/google_containers/nginx-slim:0.8Selector: app=nginxLabels: app=nginxReplicas: 2 current / 2 desiredAnnotations: <none>CreationTimestamp: Sun, 14 Aug 2016 19:46:30 -0700Pods Status: 2 Running / 0 Waiting / 0 Succeeded / 0 FailedNo volumes.No events.Once a PetSet is instantiated, such as test-petset below, upon increasing the number of replicas (i.e. the number of pods started with that PetSet), more pods get instantiated and more PersistentVolume Claims get bound to new pods:[root@tlx241 /]# kubectl patch petset test-petset -p'{“spec”:{“replicas”:”3″}}'”test-petset” patched[root@tlx241 /]# kubectl describe petset test-petsetName: test-petsetNamespace: defaultImage(s): gcr.io/google_containers/nginx-slim:0.8Selector: app=nginxLabels: app=nginxReplicas: 3 current / 3 desiredAnnotations: <none>CreationTimestamp: Sun, 14 Aug 2016 19:46:30 -0700Pods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 FailedNo volumes.No events.[root@tlx241 /]# kubectl get podsNAME                        READY     STATUS    RESTARTS   AGEtest-petset-0               1/1       Running   0          29mtest-petset-1               1/1       Running   0          28mtest-petset-2               1/1       Running   0          9sNow the PetSet is running 3 pods after patch application.When the above PetSet definition is patched to have one more replica, it introduces one more pod in the system. This in turn results in one more volume getting provisioned on the Datera Data Fabric. So volumes get dynamically provisioned and attached to a pod upon the PetSet scaling up.To support the notion of durability and consistency, if a pod moves from one minion to another, volumes do get attached (mounted) to the new minion node and detached (unmounted) from the old minion to maintain persistent access to the data.ConclusionThis demonstrates Kubernetes with Pet Sets orchestrating stateful and stateless workloads. While the Kubernetes community is working on expanding the FlexVolume framework’s capabilities, we are excited that this solution makes it possible for Kubernetes to be run more widely in the datacenters. Join and contribute: Kubernetes Storage SIG.Download KubernetesGet involved with the Kubernetes project on GitHub Post questions (or answer questions) on Stack Overflow Connect with the community on the k8s SlackFollow us on Twitter @Kubernetesio for latest updates
Quelle: kubernetes

Published by