Tuesday, January 7, 2014

Experiments With Docker For Acme Air Dev

Over the holidays, I decided to do a side project to learn a few things:

1) Strategies and technology for continuous delivery
2) Cassandra and Astyanax
3) Docker for local laptop development

I did all of these projects together with the goal of having a locally developed and tested "cloud" version of Acme Air (NetflixOSS enabled) that upon code commit produced wars that could be put into a animator like build or chef scripts and immediately deployed into "production". I wanted to learn Cassandra for some time now and it was good to do in tandem to continuous delivery when using CloudBees for continuous build as the branch for Cassandra could use all openly available (OSS on maven central). For this blog post, I'll be focusing on Docker local laptop cloud development but I thought it would be good for you to understand that I did these all together.  If you're interested in picking up from this work, the CloudBees CI builds are here.

I decided to make the simplest configuration of Acme Air which would be a single front end web app tier (serving Web 2.0 requests to browsers/mobile) connecting to a single back end micro service for user session creation/validation/management (the "auth" service). Then I connected it to a pretty simple Cassandra ring of three nodes.

Here you can see the final overall configuration as run on my Macbook Pro:


This is all running on my Macbook Pro.  The laptop runs vagrant as a virtual machine.  Vagrant runs Docker containers for the actual cluster nodes.  In a testing setup, I usually have five Docker containers (three Cassandra nodes, one node to run the data loader part of Acme Air and ad hoc Cassandra cqlsh queries, one node to run the auth service micro service web app, and one node to run the front end web app).  Starting up this configuration takes about three minutes and about ten commands.  I could likely automate this all down to one command and would suggest anyone following this for their own development shops perform such automation.  If automated the startup time would be less than 30 seconds.

As a base Docker image, I went with the Nicolas Favre-Felix's Docker Cassandra automation.  He put together a pretty complete system to allow experimentation with Cassandra 1.2.x and 2.0.x.  In making this work, I think he created a pretty general purpose networking configuration for Docker that I'll explain.  Nicolas used pipework and dnsmasq to provide a Docker cluster with well known static hostnames and IP addresses.  When any Docker container is started, he forced it to have a hostname of "cassX" with X being between 1 and 254 (using the docker run -h option).  He did that so he could have the Cassandra ring always start at "cass1" and have all other nodes (and clients) known the "cass1" hostname is the first (seed) Cassandra node.  Then he used pipeworks to add an interface to each node with the IP address of 192.168.100.X.  In order to make these host names resolve to these hostnames across all nodes, he used dnsmasq with every 192.168.100.X address mapped to every cassX hostname.  Further, to make non-Docker hostnames resolveable dnsmasq was configured to resolve other hostnames from the well known Google nameservers and the container itself was configured to use the dnsmasq nameserver locally (using the docker -d 127.0.0.1 option).  With all of this working it is easy to start any container on any IP address/hostname with all other containers being able to address the hostname statically.

I'd like to eventually generalize this setup with hostnames of "host1", "host2", etc. vs. "cass1", "cass2".  In fact, I already extended Nicolas' images for my application server instances knowing that I'd always start the auto service on "cass252" and the web app front end on "cass251".  That meant when the front end web app connected to the auth service, I hardcoded Ribbon REST calls to http://cass252/rest/api/... and I knew it would resolve that to 192.168.100.252.  Eventually I'd like to startup a Eureka Docker container on a well known hostname which would allow me to remove the REST call hardcoding (I'd just hardcode for this environment the Eureka configuration).  Further, I can image a pretty simple configuration driven wrapper to such a setup that said startup n node types on hosts one through ten, m node types on hosts eleven through twenty, etc.  This would allow me to have full scale out testing in my local development/testing.

This networking setup gave me a "VLAN" accessible between all nodes in the cluster of 192.168.100.X that was only accessible inside of the Vagrant virtual machine.  To complete the networking and allow testing, I needed to expose the front end web app to my laptop browser.  I did this by using port mapping in Docker and host only networking in Vagrant.  To get the port exposed from the Docker container to Vagrant I used the docker run -p option:
docker run -p 8080:8080
At this point I could curl http://localhost:8080 on the Vagrant virtual machine.  To get a browser to work from my laptop's desktop (OS X), I needed to add a "host only" network to the Vagrantfile configuration:
Vagrant.configure("2") do |config|
  config.vm.network "private_network", ip: "192.168.50.4"
end
At that point, I could load the full web application by browsing locally to http://192.168.50.4:8080/.  I could also map other ports if I wanted (the auto service micro service, a JDWP debug port, etc.).  Being able to map the Java remote debug port to a "local cloud" with local latency is game changer.  Even with really good connectivity "remote cloud" debugging is still a challenge due to latency.

So, returning to the diagram above, the summary of networking is (see blue and grey lines in diagram starting at bottom right):

1) Browser to http://192.168.50.4:8080/ which forwards to
2) Vagrant port 8080 which forwards to
3) VLAN 192.168.100.251:8080 which forwards to
4) Docker image for the front end web app listening on port 8080 which connects to
5) VLAN 192.168.100.252:8080 which is
6) Docker image for the auth service listening on port 8080

Both #4 and #6 connect to 192.168.100.1/2/3:9042 which is

7) Cassandra Dockers nodes listening on ports 9042.

I also wanted to have easier access to my Macbook Pro filesystem from all Docker containers.  Specifically I wanted to be able to work with the code on my desktop with Eclipse and compile within an OS X console, but then I wanted to be able to easily access the wars, Cassandra loader Java code and Cassandra DDL scripts.  I was able to "forward" the filesystem through each of the layers as follows (see black lines in diagram starting at middle bottom):

When starting Vagrant, I used:
Vagrant.configure("2") do |config|
  config.vm.synced_folder "/Users/aspyker/work/", "/vagrant/work"
end
When starting each Docker container, I used:
docker run ... -v /vagrant/work:/root/work ...
Once configured, any Docker instance has a shared view of my laptop filesystem at /root/work.  Being able to "copy" war files to Docker containers was instantaneous (30 meg file copies were done immediately).  Also, any change on my local system is immediately reflected in every container.  Again, this is game changing as compared to working with a remote cloud.  Remote cloud file copies are limited by bandwidth and most virtual instances do not allow shared filesystems so copies need to be done many times for the same files (or rsync'ed).

In the end, this left me with a very close to "production" cloud environment locally on my laptop with local latency for debugging and file manipulation regardless of by networking speed (think coffee shop and/or airplane hacking).  With a bit more automation, I could extend this environment to every team member around me ensuring a common development that was source controlled under git.

I have seen a few blogs that mention setting up "local cloud" development environments like this for their public cloud services.  I hope this blog post showed some tricks to make this possible.  I am now considering how to take this Acme Air sample and extend it to IBM public cloud services we are working on at IBM.  I expect by doing this development cloud spend will decrease and development will speed up as it will allow developers to work locally with faster turn around on code/compile/test cycles.  I do wonder how easy the transition will be to integration tests in the public cloud given the Docker environment won't be exactly the same as the public cloud, but I think the environments are likely close enough that the trade-off and risk is justified.

I need to find a way to share these Docker images and Dockerfile scripts.  Until then, if you have questions feel free to ask.