Monday, May 5, 2014

Cloud Services Fabric (and NetflixOSS) on Docker

At IBM Impact 2014 last week we showed the following demo:

Direct Link (HD Version)

The demo showed an end to end NetflixOSS based environment running on Docker on a laptop.  The components running in Docker containers shown included:
  1. Acme Air Web Application - The front end web application that is NetflixOSS enabled.  In fact, this was run as a set of containers within an auto scaling group.  The web application looks up (in Eureka) ephemeral instances of the auth service micro-service and performs on-instance load balancing via Netflix Ribbon.
  2. Acme Air Auth Service - The back end micro-service application that is NetflixOSS enabled.  In fact, this was run as a set of containers within an auto scaling group.
  3. Cassandra - This was the Acme Air Netflix port that runs against Cassandra.  We didn't do much with the data store in this demo, other than making it into a container.
  4. Eureka - The NetflixOSS open source service discovery server.  The ephemeral instances of both the web application and auth service automatically register with this Eureka service.
  5. Zuul - The NetflixOSS front end load balancer.  This load balancer looks up (in Eureka) ephemeral instances of the front end web application instances to route all incoming traffic across the rest of the topology.
  6. Asgard - The NetflixOSS devops console, which allows an application or micro-service implementer to configured versioned clusters of instances.  Asgard was ported to talk to the Docker remote API as well as the Auto scaler and recovery service API.
  7. Auto scaler and recovery service.  Each of the instances ran an agent that communicates via heartbeats to this service.  Asgard is responsible for calling API's on this Auto scaler to create clusters.  The auto scaler then called Docker API's to create instances of the correct cluster size.  Then if any instance died (stopped heartbeating), the auto scaler would create a replacement instance.  Finally, we went as far as implementing the idea of datacenters (or availability zones) when launching instances by tagging this information in a "user-data" environment variable (run -e) that had an "az_name" field.
You can see the actual setup in the following slides:

Docker Demo IBM Impact 2014 from aspyker

Once we had this setup, we can locally test "operational" scenarios on Docker including the following scenarios:
  1. Elastic scalability.  We can easily test if our services can scale out and automatically be discovered by the rest the environment and application.
  2. Chaos Monkey.  As shown in the demo, we can test if killing single instances impacted overall system availability and if the system auto recovered a replacement instance.
  3. Chaos Gorilla.  Given we have tagged the instances with their artificial data center/availability zone, we can kill all instances within 1/3 of the deployment emulating a datacenter going away.  We showed this in the public cloud SoftLayer back at dev@Pulse.
  4. Split Brain Monkey.  We can use the same datacenter/availability tagging to isolate instances via iptables based firewalling (similar to Jepsen).
We want to use this setup to a) help our Cloud Service Fabric users understand the Netflix based environment more quickly b) allow our users to do simple localized "operational" tests as listed above before moving to the cloud and c) use this in our continuous integration/delivery pipelines to do mock testing on a closer to production environment than possible on bare metal or memory hungry VM based setups.  More strategically, this work shows that if clouds supported containers and the Docker API we could move easily between a NeflixOSS powered virtual machine and container based approach.

Some details of the implementation:

The Open Source

Updated 2014/06/09 - This project is now completely open source.  For more details see the following blog entry.

The Auto Scaler and agent

The auto scaler and on instance agents talking to the auto scaler being used here are working prototypes from IBM research.  Right now we do not have plans to open source this auto scaler which makes open sourcing the entire solution impossible.  The work to implement an auto scaler is non-trivial and was a large piece of work.

The Asgard Port

In the past, we had already ported Asgard to talk to IBM's cloud (SoftLayer) and its auto scaler (RightScale).  We extended this porting work to instead talk to our Auto scaler and Docker's remote API.  The work was pretty similar and therefore easily achieved in a week or so of work.

The Dockerfiles and their containers

Other than the aforementioned auto scaler and our Asgard port, we were able to use the latest CloudBees binary releases of all of the NetflixOSS technologies and Acme Air.  If we could get the auto scaler and Asgard port moved to public open source, anyone in the world could replicate this demo themselve easily.  We have a script to compile all of our Docker files (15 in all, including some base images) and it takes about 15 minutes on a decent Macbook.  This time is spent mostly in download time and compile steps for our autoscaler and agent.

Creation of these Dockerfiles took about a week to get the basic functionality.  Making them work with the autoscaler and required agents took a bit longer.

We choose to run our containers as "fuller" OS's vs. single process.  On each node we ran the main process for the node, a ssh daemon (to allow more IaaS like access to the filesystem) and the auto scaling agent.  We used supervisord to allow for easy management of these processes inside of Ubuntu on Docker.

The Network

We used the Eureka based service location throughout with no changes to the Eureka registration client.  In order to make this easy to humans (hostnames vs. IP's) we used skydock and skydns to give each tier of the application it's own domain name using --dns and --name options when running containers to associate incremental names for each cluster.  For example, when starting two cassandra nodes, they would show up in skydns as and  We also used routing and bridging to make the entire environment easy to access from the guest laptop.

The Speed

The fact that I can start this all on a single laptop isn't the only impressive aspect.  I ran this with my virtual box being set to three gigs of memory for the boot2docker VM.  Running the demo spins the cooling fan as this required a good bit of CPU, but in terms of memory it was far lighter than I've seen in other environments.
The really impressive aspect is that I can in 90 seconds (including a 20 second sleep waiting for Cassandra to peer) restart an entire environment including two auto scaling clusters of two nodes each and the other five infrastructural services.  This includes all the staggered starts required for starting the database, loading it with data, starting service discovery and dns, starting an autoscaler and defining the clusters to the auto scaler and the final step of them all launching and interconnecting.

Setting this up in a traditional cloud would have taken at least 30 minutes based on my previous experience.

I hope this explanation will be of enough interest to you to consider future collaboration.  I also hope to get up to Dockercon in June in case you also want to talk about this in person.

The Team

I wanted to give credit where credit is due.  The team of folks working on this included folks across IBM developer and research including Takahiro Inaba, Paolo Dettori, and Seelam Seetharami.