Ok ... So that makes a little more sense. It really struck me that I just didn't know how to trust the persistence. In a world where the application could run on any virtual cluster and even across different sets of physical hardware. (Looking at CoreOS for example as a system designed to appear as "one machine" despite spanning multiple hardware instances).
So when I specify the VOLUME (in a docker file/kubernetes-pod-spec) this is managed by the system.
And with respect to ensuring uninterupted service on the database layer then, how are ongoing changes synched? Or is that something I must design for by adding a queueing layer?
To make that concrete, I have one pod with MySQL and the associated VOLUME is configured on KUBERNETES instance.
I tell K to scale MySQL to 2 instances.
It points the second instance to the same physical files and starts up?
So VOLUMES live outside of KUBERNETES and do not spin up or down in a transient nature?
Can 2 instances of a database server use the same files without stepping on each other?
How does the VOLUME scale without becoming the new single point of failure?
Maybe that is the point where you use massive RAID striping and replication to scale storage? (That's far from a new concept and is a pretty stable tech).
If I'm on the right track here I might almost be ready to say I understand and trust persistence!
Your limitation is your database. Getting MySQL or Postgres to run on multiple machines is not trivial. You can do Master/Slave with failover (manual or automated), or try to find a clustered SQL database (Galera, or what I was working on at FathomDB).
If you have a single node database, Kubernetes can make sure that only a single pod is running at any one time, attached to your volume. It will automatically recover if e.g. a node crashes.
If you have a distributed SQL database, Kubernetes can make sure the right volumes get attached to your pods. (The syntax is a little awkward right now, because you have to create N replication controllers, but PetSets will fix this in 1.3). Each pod will automatically be recovered by Kubernetes, but it is up to your distributed database to remain available as the pods crash and restart.
In short: Kubernetes does the right thing here, but Kubernetes can't magically make a single-machine database into a distributed one.
Kubernetes is integrated with a lot of IaaS providers to get you a block storage volume to persist your data on. Once you request a persistent volume in a container it will provision the volume, attach it to the node where the container is scheduled to. It is then mounted (formatted if empty) into the container. When the container is killed and restarted on another node the volume moves with the container to that node.
Now when you want clustering of things like mysql/mongo/elasticsearch/rabbitmq/etc it's a bit more complex, b/c they bring their own sharding/clustering concepts, which you have to implement on top of kubernetes. So you won't be able to simply scale mysql up via "kubectl scale rc --replicas=5", but you will have to implement a specific clustering solution, with five unique mysql-pods with their own volumes. For mysql there is "vitess" which is an attempt to build such an abstraction upon kubernetes.
So when I specify the VOLUME (in a docker file/kubernetes-pod-spec) this is managed by the system.
And with respect to ensuring uninterupted service on the database layer then, how are ongoing changes synched? Or is that something I must design for by adding a queueing layer?
To make that concrete, I have one pod with MySQL and the associated VOLUME is configured on KUBERNETES instance.
I tell K to scale MySQL to 2 instances.
It points the second instance to the same physical files and starts up?
So VOLUMES live outside of KUBERNETES and do not spin up or down in a transient nature?
Can 2 instances of a database server use the same files without stepping on each other?
How does the VOLUME scale without becoming the new single point of failure?
Maybe that is the point where you use massive RAID striping and replication to scale storage? (That's far from a new concept and is a pretty stable tech).
If I'm on the right track here I might almost be ready to say I understand and trust persistence!