Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Kubernetes: The Future of Deployment (bashton.com)
159 points by bashtoni on June 3, 2015 | hide | past | favorite | 76 comments


Worth remembering, Kubernetes was built to Google's needs, and Google runs with a shared network space on any given VM and assigns an entire /24 to the VM running docker. Each container gets one of those addresses. [1] This probably won't work for everyone - be sure to read into the fine grained details before drinking the koolade.

They're also at least two build versions behind Docker.[2]

[1] https://github.com/GoogleCloudPlatform/kubernetes/blob/maste...

[2] https://github.com/GoogleCloudPlatform/kubernetes/blob/maste...


Agreed. I have more specifics of Google-centralism in my comment here:

https://news.ycombinator.com/item?id=9330049


In that post, you asked "Do you see any other providers here? https://github.com/GoogleCloudPlatform/kubernetes/tree/maste...

Your implication was, at the time, there was only a GCE Persistent Disk provider.

Today, there's that, plus AWS EBS volume, git repo, GlusterFS, NFS, Ceph block device, iSCSI, as well as host path and empty directory.

Sounds like the product has evolved with a broad spectrum of support to me!


Yes, it has! Again, I don't think the design decisions were made to lock out other vendors. It's very reasonable to me that the first platform to be supported by Google engineers is Google's platform.

Side note: I should be a better citizen and link to specific commits next time.


Indeed the networking requirements of Kubernetes are very different from that of Docker.

However, it isn't too difficult to setup a Layer 2 unification overlay network. I've had success with flannel[1], it's pretty easy to set up and supports VxLAN.

[1] https://github.com/coreos/flannel


yes, flannel works very well in the VxLAN mode on AWS


I see Docker 1.6 in your [2] link, where do you see they are two versions behind Docker?


Current is Docker 1.6.2, released about 20 days ago. There also was a 1.6.1. That makes it two versions. Also, v1.6.0 was released almost two months ago. [0]

[0] https://github.com/docker/docker/blob/master/CHANGELOG.md


Understood, I was referring to major point releases vs minor. Thanks for clarifying.


Google runs docker in a VM? Why would you do that?

I understand you might want to play with it in a lab environment, but in production at scale (especially Google-scale) sounds very strange.


Users of the Google Cloud run Docker in VMs, since VMs are what the Google Cloud Platform sells.

(as does every public cloud provider [e.g. AWS])

For now, VMs are required to ensure a security barrier between different user's containers on the same physical machine. See some of Dan Walsh's posts on the subject (e.g. https://opensource.com/business/14/9/security-for-docker) for more context.


Google Container Engine runs containers that are in Docker format. The user does not have to deal with Docker or a VM.

https://cloud.google.com/container-engine/

There's also Amazon EC2 Container Service

http://aws.amazon.com/ecs/details/

So Google and Amazon don't just sell VMs. They sell "CMs" as well (Container Machines).


It's most likely that even the "CM"s from both providers are actually Virtual Machines running on a hypervisor running on bare metal. You just can't tell and don't need to care (for most workloads).


Yup, you can even SSH to them and poke around yourself.


How does that prove it's a VM? How do you know it's not cgroup isolation with a chroot jail? Also known as containers?


Because you're the one setting them up. Basically you run Amazon provided agent on an EC2 instance and ECS will see it as a host for ECS.

Also Amazon bills you for that EC2 instance as any other instance.

Personally I have hard time understanding the benefits of running docker in public cloud, you still run a VM you still pay for that VM. It just one extra abstraction layer which increases complexity of your infrastructure and also reduces performance.

I do understand the benefits of using containers in own data center, when you run it on bare hosts. There's simplicity and and lower costs (because you don't have VM) you have more resources which lets you run more containers than VMs on that host.


> Personally I have hard time understanding the benefits of running docker in public cloud, you still run a VM you still pay for that VM. It just one extra abstraction layer which increases complexity of your infrastructure and also reduces performance.

Simpler deployment and basically forcing "12-factor", as well as easier development environment setups. Nothing you can't achieve with other tooling, but it's nice to be able to guarantee that your dev environment is identical to your prod.


People use Docker in a public cloud (VM), primarily to simplify the deployment pipeline, not for LXC.

Given this, it actually makes sense to combine VM with Docker, check out www.hyper.sh


My problem is that I don't believe you can use docker without using containers. And if you want to simplify pipeline, why not just use rpm-maven-plugin[1] you can easily deploy including dependencies, it is fast, you can easily upgrade or downgrade. And no need to trying to figure complexities imposed due to involving LXC.

[1] http://mojo.codehaus.org/rpm-maven-plugin/ (the website does not seem to be available at this moment due to recent CodeHaus shutdown)


Only on their public cloud. For internal workloads, they run everything in containers. Mostly cgroups containers.


I'd love for someone to explain how Kubernetes compares to Mesos. Every article I find on the subject says they are mutually beneficial, not competitors — that you would typically run Kubernetes as a Mesos framework — yet Kubernetes also seems like it duplicates much of Mesos' functionality on its own.


Kubernetes (k8s) makes for an amazing developer story. Mesos is much more bare metal, but the scheduler scales a loooot better than the still relatively immature k8s scheduling component. One of the original authors of mesos wrote a paper on scheduling: https://www.cs.berkeley.edu/~alig/papers/drf.pdf. Mesos is one of the first "two level" schedulers. I very highly recommend that you also read this article for an idea of how this is a good idea: http://www.umbrant.com/blog/2015/mesos_omega_borg_survey.htm...

The k8s upstream was forward thinking enough to make the scheduler parts of it pluggable, which allow the (imo) holy grail of something like this https://github.com/mesosphere/kubernetes-mesos. This gives you the nice logical "pod" description for services, running on the battle tested mesos scheduler.

There are many 10k+ node bare metal mesos deployments (apple, twitter, etc). There aren't yet many kubernetes deployments of that scale. They truly are mutually beneficial. Mesos makes ops happy, and k8s makes devs happy. Together you have a relatively easy to setup internal PaaS (your own heroku with not a ton of work) more or less.

Disclaimer: I'm a heavy mesos and apache aurora user.


Thanks for the explanation. Sounds like Kubernetes should work just fine for small (<20 nodes) clusters, though.

I'm still not quite understanding what utility Kubernetes brings to the table if you can also use it with Mesos. If you use Mesos, why involve Kubernetes at all, and not some Mesos-specific framework like Marathon or Aurora? Is Kubernetes simply a competitor to those frameworks?

My concern about Mesos is mainly footprint and complexity. You need to run ZooKeeper, the master, the slaves, and then each framework. Only Mesos itself is written in C++, everything else is JVM, which is a pretty significant memory hog. By installing Mesos you just increased the complexity of the deployment/ops stack by a huge margin; you reap many benefits, of course, but Mesos is a lot more opaque and complex than a few daemons and some SSH-based scripts.


Kubernetes supports 100 nodes with ease, and we expect to handle much more than that very quickly. We just had to pick some target to start with.


Zookeeper is a hog, no one will disagree. But etcd is a very awesome yet very new technology. Even the protocol it implements (which is awesome), raft, is very new as a distributed consensus protocol. I'm not in any remote means throwing cold water on k8s, it is fantastic stuff. I only know that there are very very large production mesos clusters today, and the same can not be said (yet) for k8s. Read those two links I posted in the parent though if you have the time. It will make a ton more sense.

That being said, you k8s is sexy stuff, it just ties you to docker, and I believe soon to be rocket. When I first started evaluating both (around docker 1.2.x), docker was not super viable and was pretty buggy. With 1.6.x and newer, most of my original concerns cease to matter. They are both excellent technologies, use whatever works for your environment.


> (your own heroku with not a ton of work)

I've worked on Cloud Foundry, which is ostensibly a Heroku competitor.

The idea that you can replicate Heroku's full functionality "easily" is just silly.

Full-feature PaaSes do a lot of things, including a whole bunch of tedious nitty-gritty details.

We're well into the days of early maturity on PaaS products. You can install Cloud Foundry or OpenShift, or host on Heroku. Writing your own PaaS at this point is a bit like writing a custom operating system circa 1995. Unless you have a compelling reason to do so, you'd be utterly crazy to.


Can you suggest best resources (text/video) for learning about Kubernetes & Mesos? I use Docker & CoreOS all the time (love it) and I'm always trying to improve/learn something new. Can you tell how do you use Apache Aurora? What other interesting projects are worth learning about?


I recently spent some time playing with Mesos, Marathon (the web ui + api for scheduling long-running jobs / services) and Chronos (the web ui + api for scheduling cron / batch jobs).

I did this on my Macbook, using Virtualbox, Vagrant, and this:

https://github.com/mesosphere/playa-mesos

^ I started with that, and then installed Chronos with apt-get in addition.

Specifically, for launching Docker containers, this was useful:

https://docs.mesosphere.com/tutorials/launch-docker-containe...

I didn't try Aurora but it seems it'd be an alternative to Marathon + Chronos (Mesos calls all of these "frameworks").


how do you currently do service discovery w/ Docker and CoreOS without MESOS or Kubernetes?


Honest answer: I don't. In short: most of the things have fixed config that is loaded into etcd cluster and different services in Docker containers use it to communicate with other containers/services (something like {rabbitmq_host: "host address"}. In the project I'm working on right now I have just 10 boxes which will probably grow to 20-30 in the coming months. It's nothing, I know, and as you can tell from the hacky nature of my setup I'm learning as I go about this, but I'm trying to incrementally improve different parts. Something like Kubernetes/Mesos seems like a next step.


In case you're interested, the next version of Docker (1.7) supports multi-host networking and dynamic service discovery out of the box.

The whole thing is pluggable and can use various distributed state backends (etcd, zookeeper etc) or IP connectivity backends (veth, macvlan, vxlan, openvpn etc) without changing your application. Service discovery uses DNS so you don't need to modify your application to take advantage of it. It's probably the most significant change to Docker in the last year.

This will make the integration with Kubernetes smoother. Currently Google is forced to rip out Docker's native networking stack because it is not flexible enough for their opinionated networking model. This causes many Docker applications to break in Kubernetes today. That problem should go away with Docker 1.7+ because Google-style networking can be expressed as a Docker plugin, which Kubernetes can load programmatically as part of its orchestration. An added benefit is that you can augment Docker with Google-style networking even if you use Kubernetes competitors like Mesos, Swarm, Cloudfoundry etc.

(EDIT added details more relevant to Kubernetes)


I haven't seen any examples of apps that broke because of the kubernetes network model - can you point me at them? I want to understand.


Hi Solomon, do you have an ETA for Docker 1.7? I see that it's currently in RC1. How soon will we be able to take advantage of libnetwork via Compose? Is there any documentation yet on how it will be done in Compose? Is there currently an easy way to try it via Boot2Docker?

Sorry about the barrage of questions. As you can probably guess, I'm very interested in trying this out.


As far as an ETA goes, it looks like they're shooting for 06/16/2015, as found on their project page here: https://github.com/docker/distribution/wiki/docker-1.7-Proje...


(Not the same person)

On a past project I did service discovery with Docker and CoreOS using SkyDNS with etcd. Services would register their network location in etcd and SkyDNS would translate those entries into DNS records. SkyDNS ran on every host and the app-level containers linked to the SkyDNS container.

If a container was moved across hosts, the etcd entry would be updated automatically and eventually the updated DNS entry for that service would propagate across the cluster.


> the battle tested mesos scheduler.

Well, let's not get ahead of ourselves.


Thanks for sharing all that terrific information!

The only thing I'll add is that k8s isn't targeting the same scale as Mesos. Their current goal is to support up to 400-500 nodes, max.

Source: One of the core k8s developers I met at a CoreOS meetup in SF earlier this year. They said if I needed to go beyond 500 nodes that I should probably look at something else.


This is categorically incorrect. K8S will ultimately scale to N number of nodes. Within 2015, it will scale to 1K+ nodes, as per the roadmap. Being modeled after Google's Borg system, I encourage curious/interested folks to read at the recent Borg paper [0] which also outlines lessons learned in running Borg at Google for nearly 15 years and managing many millions of machines.

[0] http://research.google.com/pubs/pub43438.html

Disclaimer: I work for Kismatic.


Calling that categorically incorrect is pretty disingenuous when we have Google engineers who are working on the project saying that it currently supports 100 nodes with ease, and that they /expect/ it to handle more in the (very near) future.

It might not be correct for much longer, but if it is the case now, how can you say it's categorically incorrect?


To clarify, this entire statement is categorically incorrect: "The only thing I'll add is that k8s isn't targeting the same scale as Mesos. Their current goal is to support up to 400-500 nodes, max."


My apologies -- I should have phrased as "as recently as 2-3 months ago".


> mesos and apache aurora

ohh i bet we're in the same building right now


(disclaimer: i work at Google and was one of the founders of the project)

when we were looking at building k8s our mission was to help the world move forwards to a more cloud native approach to development. by cloud native i mean container packaged, dynamically scheduled, micro-services oriented. we figured that in the end our data centers are going to be well suited to run cloud native apps, since they were designed from the ground up for this approach to management, and will offer performance and efficiency advantages over the alternatives. we also however recognized that no matter how cheap, fast and reliable the hosting offering is, most folks don't want to be locked into a single provider and Google in particular. we needed to do what we were doing in the open, and the thing that we built needed to be pattern compatible with our approach to management and quite frankly address some of the mistakes we had in previous frameworks (Borg mostly as a first system).

we looked really closely at Apache Mesos and liked a lot of what we saw, but there were a couple of things that stopped us just jumping on it. (1) it was written in C++ and the containers world was moving to Go -- we knew we planned to make a sustained and considerable investment in this and knew first hand that Go was more productive (2) we wanted something incredibly simple to showcase the critical constructs (pods, labels, label selectors, replication controllers, etc) and to build it directly with the communities support and mesos was pretty large and somewhat monolithic (3) we needed what Joe Beda dubbed 'over-modularity' because we wanted a whole ecosystem to emerge, (4) we wanted 'cluster environment' to be lightweight and something you could easily turn up or turn down, kinda like a VM; the systems integrators i knew who worked with mesos felt that it was powerful but heavy and hard to setup (though i will note our friends at Mesosphere are helping to change this).

so we figured doing something simple to create a first class cluster environment for native app management, 'but this time done right' as Tim Hockin likes to say everyday.

now we really like the guys at Mesosphere and we respect the fact that Mesos runs the vast majority of existing data processing frameworks. by adding k8s on mesos you get the next-generation cloud native scheduler and the ability to run existing workloads. by running k8s by itself you get a lightweight cluster environment for running next gen cloud native apps.

-- craig


Thanks for this, it cleared up some confusion in my mind. A blogpost capturing these thoughts would be great.


"Adding k8s on mesos you get the next-generation cloud native scheduler and the ability to run existing workloads. by running k8s by itself you get a lightweight cluster environment for running next gen cloud native apps." @cmcluck

More references:

[1] https://mesosphere.com/blog/2015/04/22/making-kubernetes-a-f...

[2] http://blog.kubernetes.io/2015/04/kubernetes-and-mesosphere-...

[3] http://thenewstack.io/mesosphere-now-includes-kubernetes-for...


They are competitors, but life isn't simple. The reality is many frameworks currently only run on Mesos or YARN, and Kubernetes has not reached V1, so larger installations typically need multiple frameworks. Mesos is a proven way to run multiple frameworks side-by-side. Ebay's YARN on Mesos is another example. But where this all leads remains to be seen.


Does anyone have resources about security/isolation best practices for running multiple applications on Kubernetes (or Mesos or similar)?

For instance in a non cloud-native app that runs in VM's, you might have one app per VM and have firewalls between different VM's that don't need to talk to each other. Then if a non-critical app got compromised and an attacker got remote execution or SQL injection or something they can't get to your other app servers or databases.

If all your apps are in a cluster, the non-critical compromised app might be running on the same host as a critical app, in which case the only thing keeping the attacker from your database credentials or other secrets is the docker container isolation which if I understand correctly is not assumed to be secure the way VM isolation is.

What are people doing to address this? Or are my assumptions wrong and it's not actually a problem to worry about? My initial impression with mesos was that you'd only use it if you're at big enough scale that you're running a huge number of instances of the same app or you're running a lot of different data processing tasks that all access the same data so no isolation is needed between them. Now I feel like I see Kubernetes being discussed frequently as a great way to run all your different microservices at any scale (e.g. "The Future of Deployment"), but I've never seen this aspect of security discussed.


You might prefer Cloud Foundry, which is switching its underlying container scheduling fabric to Lattice[1].

In particular, Cloud Foundry has more advanced security groups features, because it's mostly being marketed to enterprise customers.

Disclaimer: I have worked on CF and I work for a company which is a major contributor to CF.

[1] http://lattice.cf/


Lattice looks interesting, looking forward to checking it out more


Please check out the Secrets object in Kubernetes:

https://github.com/GoogleCloudPlatform/kubernetes/blob/maste...

which is designed to address some of this.


A slightly off-topic comment, but being an early-stage PhD in theoretical CS with my thesis topic on approximation algorithms for scheduling, I would like to know whether there are some theoretical problems related to these VM schedulers used in practice. If there is somebody knowledgeable about what is theoretically open (unknown tight approximation ratio, for instance) AND very useful to people building Kubernetes et al, I would be really happy to learn more.

(The natural advice is to "hit the books", actually read the papers related to Kubernetes and find out what is both theoretical and useful to this area. I intend to do that soon, but sifting through "practical papers" and looking for something interesting in theory is a lot of work, and I just hoped there might be somebody who could provide a shortcut.)


This may not be the level of specificity that you're looking for, but http://research.google.com/pubs/pub43438.html and http://research.google.com/pubs/pub41684.html are what I know google has published about their cluster management systems.


Lattice[1] "cheats" by turning the bin packing problem into an auction. Execution agents perform a simple local calculation and send in their bids. Then jobs are placed according to those bids.

This is because while you can try to produce an optimal distribution, it's a fool's errand in a distributed system. A perfect solver, given imperfect data, produces nonsense. GIGO, as they used to say.

Amit Gupta wrote some notes last year about it[2].

Disclaimer: I work for Pivotal.

[1] http://lattice.cf/

[2] http://blog.pivotal.io/pivotal-cloud-foundry/products/app-pl...


It was more of an amusement, but I used integer programming at a company hackathon to build a better image scheduler:

https://engineering.opendns.com/2015/05/06/docker-container-...

With a large, fairly homogenous environment it didn't outperform random assignment that well, though. It worked best with small, inhomogeneous loads.


Right, ILP is a great tool for solving NP-complete problems relatively fast (depending on the solver, but there are some very good ones out there). However, as a theoretical tool it probably is not that exciting unless you're ready to tackle P vs. NP this way. (Unless you move to semidefinite programming and the SDP hierarchies, where the progress is very exciting but not yet that applicable to scheduling, to the best of my knowledge.)

> With a large, fairly homogenous environment it didn't outperform random assignment that well, though. It worked best with small, inhomogeneous loads.

Yes, that's probably a piece of the puzzle that I don't have yet -- to know a theoretical model that is both useful in practice and at the same time greedy/randomized assignment is not "good enough" for practical uses.


Interesting, I didn't know "inhomogeneous" was a word: http://english.stackexchange.com/questions/194906/heterogene...


An interesting sidenote. The Meteor development group is contributing to Kubernetes and will be using it to help scale Meteor with their upcoming paid service Galaxy.


This is great to see independent software companies like Meteor embrace Kubernetes as the platform on which to build their next-generation services [0].

[0] http://info.meteor.com/blog/meteor-and-a-galaxy-of-container...


What's the deal with the name "Kubernetes"? Does it mean anything, or have some tech significance, or is it really just because it basically means "ruler" in Greek?


It means "Helmsman" in ancient Greek. Similarly it's related to the word "Governor"

e.g: "kubernan" in ancient greek means to steer "kubernetes" is helmsman

"gubernare" means to steer or to govern in Latin "gubernator" is "governor" in Latin

Which then leads into the modern word "Gubernatorial", et al.


It's also a pun on Borg Cubes.


Um... in modern Greek too, not just ancient Greek :)


It is also related to the source of the word "cybernetics".


Correct, which is how it slants to the "Borg cube" pun. Cybernetics was the term chosen by Norbert Wiener in the book "Cybernetics," and he traced the word's origin to the greek "kubernetes;" it related to his first example of a cybernetic system, the self-correcting steam-controlled rudder on a ship [http://en.wikipedia.org/wiki/Steering_engine].

(Why the pun? Kubernetes was heavily inspired / guided by Google's internal scheduling tool, which was named Borg (http://blog.kubernetes.io/2015/04/borg-predecessor-to-kubern...).)


According to the IO 2014 talk about docker/kubernetes, it was chosen because it means something like pilot/helmsman.


"κυβερνήτης" (kubernetes) is Greek for "pilot" or "helmsman of ship".


Pretty sure it's due to it meaning in a literal sense, "Helmsman".


(disclosure: i work at Google and picked the name)

comments above are right -- we wanted to stick to the nautical theme that was emerging in containers and 'kubernetes' (or helmsmen is greek) seemed about right. the fact that the word has strong roots in modern control theory was nice also.

fun fact: we actually wanted to call it 'seven' after seven-of-nine (a more attractive borg) but for obvious reasons that didn't work out. :)


The GFS cell used back in 2004 for staging Borg binaries to production was /gfs/seven/, for the same reason :-)


bundling with (unholy-ly immature) SDN is the most damning things for its adoption. It is thought to be needed for "live migration", but I don't see me needing that anytime soon because we run on virtual machines anyway?

Iaas provider is not going away,paying for the cost of SDN now for features that doesn't even exists yet, is insane.


(kubernetes contributor here)

SDN isn't required for k8s, what is required is that each Pod (group of containers) get it's own IP address, and that the IP address is routeable in the cluster. In many cases, the easiest way to achieve this is via an SDN, but it is also achievable by programming traditional routers.

The reason for wanting an IP address per pod is that it eliminates the need for port mangling, which dramatically simplifies wiring applications together.


All applications was already desinged to be port based. I don't see how this would drastically change that.


the problem with port mangling is that your application starts running on random ports, so in addition to requiring discovery for IP addresses, you now also have to do discovery for ports, which pretty much requires custom code and infrastructure linked into your binaries (how do you convince nginx/redis/... to use your lookup service for ports?)

And ports are different between different replicas of your service, since they're chosen at random during scheduling.

It also makes ACLs and QoS harder to define for the network, since you don't have a clean network identity (e.g IP Address) for each application.


Wow, you could totally s/kubernetes/juju/ in this article and still be 100% correct. (https://jujucharms.com for those not aware of juju)


Not cool. At least disclose that you're one of the devs behind Juju.


Sorry, I didn't think of it. I've mentioned it multiple times on HN, but you're right, I should have added a disclosure.

Seriously didn't mean to be pushing Juju, I was just surprised at how similar it was to Juju. I had always sort of assumed it was Google cloud only, and/or containers only, etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: