Friday, June 27, 2014

How is a multi-host container service different from a multi-host VM service?

Warning:  I am writing this blog post without knowing the answer to the question I am asking in the title.  I am writing this post to force myself to articulate a question I've personally been struggling with as we move towards what we all want - containers with standard formats changing how we handle many cases in the cloud.  Also, I know there are folks that have thought about this for FAR longer than myself and I hope they comment or write alternative blogs so we can all learn together.

That said, I have seen throughout the time leading up to Dockercon and since what seems to be divergent thoughts that when I step back aren't so divergent.  Or maybe they are?  Let's see.

On one hand, we have existing systems on IaaS clouds using virtual machines that have everything controlled by API's with cloud infrastructural services that help build up a IaaS++ environment.  I have specifically avoided using the word PaaS as I define PaaS as something that tends to abstract IaaS to a point where IaaS concepts can't be directly seen and controlled.  I know that everyone doesn't accept such a definition of PaaS, but I use it as a means to help explain my thoughts (please don't just comment exclusively on this definition as it's not the main point of this blog post).  By IaaS++ I mean an environment that adds to IaaS offering services like continuous delivery workflows, high availability fault domains/automatic recovery, cross instance networking with software defined networking security, and operational visibility through monitoring.  And by not calling it PaaS, I suggest that the level of visibility into this environment includes IaaS concepts such as (VM) instances through ssh or other commonly used *nix tools, full TCP network stack access, full OS's with process and file system control, etc.

On the other hand, we have systems growing around resource management systems and schedulers using "The Datacenter as a Computer" that are predominantly tied to containers.  I'll admit that I'm only partially through the book on the subject (now in 2nd edition).  Some of the systems in open source to implement such datacenter as the computer/warehouse scale machines are Yarn (for Hadoop), CoreOS/Fleet, Mesos/Marathon and Google Kubernetes.

At Dockercon, IBM (and yours truly) demoed a Docker container deployment option for the IBM SoftLayer cloud.  We used our cloud services fabric (partially powered by NetflixOSS technologies) on top of this deployment option as the IaaS++ layer.  Given IBM SoftLayer and its current API doesn't support containers as a deployment option, we worked to implement some of ties to the IaaS technologies as part of the demo reusing the Docker API.  Specifically, we showcased an autoscaling service for automatic recovery, cross availability zone placement, and SLA based scaling.  Next we used the Docker private registry along side the Dockerhub public index for image management.  Finally we did specific work to natively integrate the networking from containers into the SoftLayer network.  Doing this networking work was important as it allowed us to leverage existing IaaS provided networking constructs such as load balancers and firewalls.

Last night I watched the Kubernetes demo at Google I/O by Brendan Burns and Craig McLuckie.  The talk kicks off with an overview of the Google Compute Engine VM optimized for containers and then covers the Kubernetes container cluster management open source project which includes a scheduler for long running processes, a labeling system that is important for operational management, a replication controller to scale and auto recover labeled processes, and a service abstraction across labeled processes.

I encourage you to watch the two demo videos before proceeding, as I don't want to force you into thinking only from my conclusions.  Ok, so now that you've watched the videos yourself, let me use the two videos to look at use case comparison points (the links now jump to the right place in each video that are similar):

Fast development and deployment at scale



Brendan demonstrated rolling updates on the cloud.  In the IBM demo, we showed the same, but as an initial deployment on a laptop.  As you see later in the demo, due to the user of Docker, running on the cloud is exactly the same as the laptop.  Also, the IBM cloud services fabric devops console - NetflixOSS Asgard also has the concept of rolling updates as well as the demonstrated initial deployment.  Due to Docker, both demos use essentially the same approach to image creation/baking.

Automatic recovery


I like how Brendan showed through a nice UI the failure and recovery as compared to me watching log files of the health manager.  Other than presentation, the use case and functionality was the same.  The system discovered a failed instance and recovered it.

Service registration

Brendan talked about how Kubernetes offers the concept of services based on tagging.  Under the covers this is implemented by a process that does selects against the tagged containers updating an etcd service registry.  In the cloud services fabric demo we talked about how this was done with NetflixOSS Eureka in a more intrusive (but maybe more app centric valuable) way.  I also have hinted about how important it is to consider availability in your service discovery system.

Service discovery and load balancing across service implementations

Brenda talked about in Kubernetes how this is handled by, currently, a basic round robin load balancer.  Under the covers each Kubernetes node starts this load balancer and any defined service gets started on the load balancer across the cluster with information being passed to client containers via two environment variables, one for the address for the Kubernetes local node load balancer, and one for the port assigned to a specific service.  In the cloud services fabric this is handled by Eureka enabled clients (for example NetflixOSS Ribbon for REST), which does not require a separate load balancer and is more direct and/or the similar NetflixOSS Zuul load balancer in cases where the existing clients can't be used.

FWIW, I haven't seen specifically supported end to end service registration/discovery/load balancing in non-Kubernetes resource managers/schedulers.  I'm sure you could build something similar on top of Mesos/Marathon (or people already have) and CoreOS/etcd, but I think Kubernetes concept of labels and services (much like Eureka) are right in starting to integrate the concept of services into the platform as they are so critical in microservices based devops.

I could continue to draw comparison points for other IaaS++ features like application centric metrics, container level metrics, dynamic configuration management, other devops workflows, remote logging, service interaction monitoring, etc, but I'll let that to the reader.  My belief is that many of these concepts will be implemented in both approaches, as they are required to run an operationally competent system.

Also, I think we need to consider tougher points like how this approach scales (in both demos, under the covers networking was implemented via a subnet per Docker host, which wouldn't necessarily scale well), approach to cross host image propagation (again, both demos used a less than optimal way to push images across every node), and integration with other important IaaS networking concepts (such as external load balancers and firewalls).

What is different?

The key difference that I see in these systems is terminology and implementation.

In the IBM demo, we based the concept of a cluster on what Asgard defines as a cluster.  That cluster definition and state is based on multiple separate, but connected by version naming, auto scaling groups.  It is then, the autoscaler that decides placement based on not only "resource availability", but also high availability (spread deployments across distinct failure domains) and locality policies.  Most everyone is available with the concept of high availability in these policies in existing IaaS - in SoftLayer we use Datacenters or pods, in other clouds the concept is called "availability zones".  Also, in public clouds, the policy for co-location is usually called "placement groups".

Marathon (a long running scheduler on top of the Mesos resource manager), offers these same concepts through the concept of constraints.  Kubernetes today doesn't seem, today, to offer these concepts likely due to its initial focus on smaller scenarios.  Given its roots in Google Omega/Borg, I'm sure there is no reason why Kubernetes couldn't eventually expose the same policy concepts within its replication controller.  In fact, at the end of the Kubernetes talk, there is a question from the crowd on how to make Kubernetes scale across multiple Kubernetes configurations which could have been asked from a more high-availability.

So to me, the concept of an autoscaler and its underlying implementation seems very similar to the concept of a resource manager and scheduler.  I wonder if public cloud auto scalers were open sourced if they would be called resource managers and long running schedulers?

The reason why I ask all of this is as we move forward with containers, I think we might be tempted to build another cloud within our existing clouds.  I also think the Mesos and Kubernetes technologies will have people building clouds within clouds until cloud providers natively support containers as a deployment option.  At that point, will we have duplication of resource management and scheduling if we don't combine the concepts?  Also, what will people do to integrate these new container deployments with other IaaS features like load balancers, security groups, etc?

I think others are asking the same question as well.  As shown in the IBM Cloud demo, we are thinking through this right now.  We have also experimented internally with OpenStack deployments of Docker containers as the IaaS layer under a similar IaaS++ layer.  The experiments led to a similar cloud container IaaS deployment option leveraging existing OpenStack approaches for resource management and scheduling as compared to creating a new layer on top of OpenStack.  Also, there is a public cloud that has likely considered this a long time ago - Joyent.  Joyent has had SmartOS zones which are similar to containers under its IaaS API for a long time without the need to expose the formal concepts of resource management and scheduling to its users.  Also, right at the end of the Kubernetes demo, someone in the crowd asks the same question.  I took this question to ask, when will the compute engine support container deployment this way without having a user setup their own private set of Kubernetes systems (and possibly not have to consider resource management/scheduling with anything more than policy).

As I said in the intro, I'm still learning here.  What are your thoughts?