Monday, August 18, 2014

A NetflixOSS sidecar in support of non-Java services

In working on supporting our next round of IBM Cloud Service Fabric service tenants, we found that the service implementers came from very different backgrounds.  Some were skilled in Java, some Ruby and others were C/C++ focused and therefore their service implementations were just as diverse.  Given the size of the team of the services we're on-boarding and timeframe for going public, recoding all of these services to use NetflixOSS Java libraries that bring the operational excellence (like Archaius, Karyon, Eureka, etc) seemed pretty unlikely.

For what it is worth, we faced a similar challenge in earlier services (mostly due to existing C/C++ applications) and we created what was called a "sidecar".  By sidecar, what I mean is a second process on each node/instance that did Cloud Service Fabric operations on behalf of the main process (the side-managed process).  Unfortunately those sidecars all went off and created one-offs for their particular service.  In this post, I'll describe a more general sidecar that doesn't force users to have these one-offs.

Sidenote:  For those not familiar with sidecars, think of the motorcycle sidecar below.  Snoopy would be the main process with Woodstock being the sidecar process.  The main work on the instance would be the motorcycle (say serving your users' REST requests).  The operational control is the sidecar (say serving health checks and management plane requests of the operational platform).


Before we get started, we need to note there are multiple types of sidecars.  Predominantly there are two main types of sidecars.  There are sidecars that manage durable and or storage tiers.  These sidecars need to manage things that other sidecars do not (like joining a stateful ring of servers, or joining a set of slaves and discovering masters, or backup and recovery of data).  Some sidecars that exist in this space are Priam (for Cassandra) and Exhibitor (for Zookeeper).  The other type is for managing stateless mid-tier services like microservices.  An example of this is AirBNB's Synapse and Nerve.  You'll see that in the announcement of Synapse and Nerve on AirBNB's blog that they are trying to solve some (but not all) of the issues I will mention in this blog post.

What are some things that a microservice sidecar could do for a microservice?

1. Service discovery registration and heartbeat

This registration with service discovery would have to happen only after the sidecar detects the side-managed process as ready to receive requests.  This isn't necessarily the same as if the instance is "healthy" as an instance might be healthy well before it is ready to handle requests (consider an instance that needs to pre-warm caches, etc.).  Also, all dynamic configuration of this function (where and if to register) should be considered.

2.  Health check URL

Every instance should have a health check url that can communicate out of band the health of an instance.  The sidecar would need to query the health of the side-managed process and expose this url on behalf of the side-managed process.  Various systems (like auto scaling groups, front end load balancers, and service discovery queries) would query this URL and take sick instances out of rotation.

3.  Service dependency load balancing

In a NetflixOSS based microservice, routing can be done intelligently based upon information from service discovery (Eureka) via smart client side load balancing (Ribbon).  Once you move this function out of the microservice implementation, as AirBNB noted as well, it is likely unneeded and problematic in some cases to move back to centralized load balancing.  Therefore it would be nice if the sidecar would perform load balancing on behalf of the side-managed process.  Note that Zuul (on instance in the sidecar) could fill this role in NetflixOSS.  In AirBNB's stack, the combination of service discovery and this item is done through Synapse.  Also, all dynamic configuration of this function (states of routes, timeouts, retry strategy, etc) should be considered.

One other area to consider here (especially in the NetflixOSS space) would be if the sidecar should provide for advanced devops filters in load balancing that go beyond basic round robin load balancing.  Netflix has talked about the advantages of Zuul for this in the front/edge tier, but we could consider doing something in between microservices.

4.  Microservice latency/health metrics

Being able to have operational visibility into the error rates on calls to dependent services as well as latency and overall state of dependencies is important to knowing how to operate the side-managed process.  In NetflixOSS by using the Hystrix pattern and API, you can get such visibility through the exported Hystrix streams.  Again, Zuul (on instance in the sidecar) can provide this functionality.

5.  Eureka discovery

We have found service implementation in IBM that already have their own client side load balancing or cluster technologies.  Also, Netflix has talked about other OSS systems such as Elastic Search.  For these systems it would be nice if the sidecar could provide a way to expose Eureka discovery outside of load balancing.  Then the client could ingest the discovery information and use it however it felt necessary.  Also, all dynamic configuration of this function should be considered.

6.  Dynamic configuration management

It would nice if the sidecar could expose to the side-managed process dynamic configuration.  While I have mentioned the need to have previous sidecar functions items dynamically configured, it is important that the side-managed process configuration to be considered as well.  Consider the case where you want the side-managed process to use a common dynamic configuration management system but all it can do is read from property files.  In NetflixOSS this is managed via Archaius but this requires using the NetflixOSS libraries.

7.  Circuit breaking for fault tolerance to dependencies

It would nice if the sidecar could provide an approximation of circuit breaking.  I believe this is impossible to do as "cleanly" as using NetflixOSS Hystrix natively (as this wouldn't require the user to write specific business logic to handle failures that reduce calls to the dependency), but it might be nice to have some level of guarantee of fast failure of scenarios using #3.  Also, all dynamic configuration of this function (timeouts, etc) should be considered.

8.  Application level metrics

It would be nice if the sidecar provided could allow the side-managed process to more easily publish application specific metrics to the metrics pipeline.  While every language likely already has a nice binding to systems like statsd/collectd, it might be worth making the interface to these systems common through the sidecar.  For NetflixOSS, this is done through Servo.

9. Manual GUI and programmatic control

We have found the need to sometimes quickly dive into a specific instance with human eyes.  Having a private web based UI is far easier than loading up ssh.  Also, if you want to script access to the functions and data collected by the sidecar, we would like a REST or even JMX interface to the control offered in the sidecar.

This all said, I started a quick project last week to create a sidecar that does some of these functions using NetflixOSS so it integrated cleanly into our existing IBM Cloud Services Fabric environment.  I decided to do it in github, so others can contribute.

By using Karyon as a base for the sidecar, I was able to get a few of the items on the list automatically (specifically #1, #2 partially and #9).  I started with the most basic sidecar in the trunk project.  Then I added two more things:


Consul style health checks:


In work leading up to this work Spencer Gibb pointed me to the sidecar agents checks that Consul uses (which they said they based on Nagios).  I based a similar set of checks for my sidecar.  You can see in this archaius config file how you'd configure them:

com.ibm.ibmcsf.sidecar.externalhealthcheck.enabled=true
com.ibm.ibmcsf.sidecar.externalhealthcheck.numchecks=1

com.ibm.ibmcsf.sidecar.externalhealthcheck.1.id=local-ping-healthcheckurl
com.ibm.ibmcsf.sidecar.externalhealthcheck.1.description=Runs a script that curls the healthcheck url of the sidemanaged process
com.ibm.ibmcsf.sidecar.externalhealthcheck.1.interval=10000
com.ibm.ibmcsf.sidecar.externalhealthcheck.1.script=/opt/sidecars/curllocalhost.sh 8080 /
com.ibm.ibmcsf.sidecar.externalhealthcheck.1.workingdir=/tmp

com.ibm.ibmcsf.sidecar.externalhealthcheck.2.id=local-killswitch
com.ibm.ibmcsf.sidecar.externalhealthcheck.2.description=Runs a script that tests if /opt/sidecarscripts/killswitch.txt exists
com.ibm.ibmcsf.sidecar.externalhealthcheck.2.interval=30000
com.ibm.ibmcsf.sidecar.externalhealthcheck.2.script=/opt/sidecars/checkKillswitch.sh

Specifically you define a check as an external script that the sidecar executes and if the script returns a code of 0, the check is marked as healthy (1 = warning, otherwise unhealthy).  If all checks defined come back as healthy for greater than three iterations, the instance is healthy.  I have coded up some basic shell scripts that we'll likely give to all of our users (like curllocalhost.sh and checkkillswitchtxtfile.sh).  Once I had these checks being executed by the sidecar, it was pretty easy to change the Karyon/Eureka HealthCheckHandler class to query the CheckManager logic I added.


Integration with Dynamic Configuration Management


We believe most languages can easily register events based on files changing and can easily read properties files.  Based on this, I added another feature configured this archiaus config file:

com.ibm.ibmcsf.sidecar.dynamicpropertywriter.enabled=true
com.ibm.ibmcsf.sidecar.dynamicpropertywriter.file.template=/opt/sidecars/appspecific.properties.template
com.ibm.ibmcsf.sidecar.dynamicpropertywriter.file=/opt/sidecars/appspecific.properties

What this says is that a user of the sidecar puts all of the properties they care about in the file.template properties file and then as configuration is dynamically updated in Archaius the sidecar sees this and writes out a copy to the main properties file with the values filled in.

With these changes, I think we now have a pretty solid story for #1, #2, #6 and #9.  I'd like to next focus on #3, #4, and #7 adding a Zuul and Hystrix based sidecar process but I don't have users (yet) pushing for these functions.  Also, I should note that the code is a proof of concept and needs to be hardened as it was just a side project for me.

PS.  I do want to make it clear that while this sidecar approach could be used for Java services (as opposed to languages that don't have NetflixOSS bindings), I do not advocate moving these functions to external to your Java implementation.  There are places where offering this function in a side-car isn't as "excellent" operationally and more close to "good enough".  I'll let it to the reader to understand these tradeoffs.  However, I hope that work in this microservice sidecar space leads to easier NetflixOSS adoption in non-Java environments.

PPS.  This sidecar might be more useful in the container space as well at a host level.  Taking the sidecar and making it work across multiple single process instances on a host would be an interesting extension of this work.


Wednesday, August 6, 2014

Sidecars and service registration

I have been having internal conversations on sidecars to manage microservices.  By sidecar, I mean a separate process on each instance node that performs things on behalf of the microservice instance like service registration, service location (for dependencies), dynamic configuration management, service routing (for dependencies), etc.  I have been talking about how an in-process (vs. sidecar) approach to provide these functions while intrusive (requires every microservice to code to or implement a certain framework) is better.  I believe it is hard for folks to understand why things are "better" without actually running into nasty things that happen in real world production scenarios.

Today I decided to simulate a real world scenario.  I decided to play with Karyon which is the NetflixOSS in process technology to manage bootstrap and lifecycle of microservices.  I did the following:

  1. I disabled registry queries which Karyon by default does for the application assuming it might need to look up dependencies (eureka.shouldFetchRegistry=false).  I did this just to simply the timing of pure service registration.
  2. I "disabled" heartbeats for service registration (eureka.client.refresh.interval=60).  Again, I did this just to simplify the timing of initial service registration.
  3. I shortened the time for the initial service registration to one second (eureka.appinfo.initial.replicate.time=1).  I did this to be able to force the registration to happen immediately.
  4. I added a "sleep" to my microservice registration (@Application initialize() { .. Thread.sleep(1000*60*10) } ).  I did this to simulate a microservice that takes some time to "startup".
Once I did this, I saw the following:

The service started up and immediately called initialize, but of course this stalled.  The service also then immediately registered itself into the Eureka service discovery server.  At this point, a query of the service instance in the service registry returns a status of "STARTING".  After 10 minutes, the initialization finishes.  At this later point, the query of the service instance returns a status "UP".  Pretty sensible, no?

I then started to think if a sidecar could somehow get this level of knowledge by poking it's side-managed process.  If you look at Airbnb Nerve (a total sidecar based approach) it does exactly this.  I could envision a Eureka sidecar that was similar to Nerve that pinged the "healthcheck URL" already exposed by Karyon.

This got me thinking of if a health check URL returning 200 (OK) would be a sufficient replacement for deciding on service registration status.  Specifically if healthcheck returns OK for three or so checks, have the sidecar put the service into service discovery as "up".  Similarly if three or so checks return != 200.

I started up a twitter question on this idea and received great feedback from Spencer Gibb.  His example was a service that needed to do database migration before starting up.  In that case, while the service is healthy, until the service is up it shouldn't tell others that it was ready to handle requests.  This is especially true if the health manager of your cluster is killing off instances that aren't "healthy", so you can't solve the issue as just reporting "unhealthy" until the service is ready to handle requests.

This said, if a sidecar is to decide on when a service should be marked as ready to handle traffic, it would seem to reason that every side managed process needs a separate URL (from health check and/or the main microservice interface) for state of boot of the service.  Also, this would imply the side managed process likely needs a framework to consistently decide on the state to be exposed by that url.  In NetflixOSS that framework is Karyon.

I will keep thinking about this, but I find it hard to understand how a pure sidecar based approach with zero changes to a microservice (without a framework embedded into the side managed process) when a service is really "UP" and ready to handle requests vs. "STARTING" vs. "SHUTTING DOWN", etc.  I wonder if AirBNB asks its service developers to define a "READYFORREQUESTS" url's and that is what they pass as configuration to Nerve?


Thursday, July 24, 2014

Multitenancy models (and frats/sororities, toilets, and kids)

I have had more than a few discussions lately with various IBM teams as we move forward with some of our internal cloud technologies leveraging NetflixOSS technology.  I have found that one of the conversations that is hard to talk through is multitenancy.

Let's set the stage with the definition of multitenancy and how this affects cloud computing.

Wikipedia definition of multitenancy:

"Multitenancy refers to a principle in software architecture where a single instance of the software runs on a server, serving multiple client-organizations (tenants). Multitenancy contrasts with multi-instance architectures where separate software instances (or hardware systems) operate on behalf of different client organizations. With a multitenant architecture, a software application is designed to virtually partition its data and configuration, and each client organization works with a customized virtual application."

Wikipedia further explains multitenancy in the context of cloud computing:

Multitenancy enables sharing of resources and costs across a large pool of users thus allowing for:
centralization of infrastructure in locations with lower costs (such as real estate, electricity, etc.),
peak-load capacity increases (users need not engineer for highest possible load-levels),

utilisation and efficiency improvements for systems that are often only 10–20% utilised.

It seems like everyone comes with their own definition of multitenancy and from what I can see they are all shades of the same definition.  Specifically as you see below the differences are in the "designed to virtually partition its data and configuration" and to what extend that partitioning is possible to be affected by other users.

In an effort to have more meaningful conversations, I propose the following poor analogies based on humans inhabiting space.

The "Big Room"


Consider a really big room with one door.  That door is where all people living in the room enter and exit.  Also, there is a single toilet in the middle of the room.  Also consider that we allow anyone to come in and one of the inhabitants is a bit crazy and loves to run around the room at full speed randomly bouncing off the walls.  While this "big room" is multi-tenant (more than one human could live there) it doesn't well partition or protect inhabitants from each other.  Also, the use of the toilet (a common resource) might be a bit more than embarrassing to say the least.  I think most people I have talked to would consider this environment to lack even the weakest definitions of multi-tenancy.

The "Doorless Single Family Home"


Consider a typical North American single family home but remove all the internal doors.  In this new analogy, we might still have crazy inhabitants (I have two and they are called kids).  We can start to partition them from the rest of us by putting then in a door and they only once and a while escape into the areas affecting others.  Now the toilet and other shared resources are easier to use safely, but still not as safe as you'd want.  One other big change is the type of inhabitants and their ability to share.  In a family, likely they all have semi-common goals and one won't destroy common resources (the toilet) and if they do the family works to ensure that it doesn't happen again.  Finally, there is benefit of this family of living together as they likely share their services freely.

The improved "Single Family Home with Doors".


Consider the previous example, but add back in the typical doors - doors that can be opened and closed, but likely not locked.  Now our private moments are improved.  Also, the crazy kids won't bounce out of their rooms as frequently.  The doors are there to help mistaken bad interactions, but the doors can be opened freely to help achieve family goals more quickly than with the doors closed.

The "Fraternity/Sorority house"


Continuing the poor analogy, what if we make the inhabitants have similar, but more divergent goals than a single family.  Of course all of the inhabitants of a fraternity or sorority house want to graduate college and they might be working on similar subjects that they could share information and learning amongst themselves, but sometimes you really don't want your co-inhabitant to enter your part of the house.  When that co-inhabitant is drunk (never happens in college, right?), you really would like a locked door between you and them.  The co-inhabitant isn't really meaning to cause you harm, but they could cause you harm none-the-less, so you probably added a locked door just in case.  Ok, I'll admit I now have totally lose the part of the analogy of the toilet, but likely there are still shared resources that the house works to protect and shared responsibly.

The "Apartment Building"


Now we finish the analogies with what I think most people I talk about multitenancy consider from the start.  Consider an apartment where every tenant gets his or her own lockable front door.  Also, all of their toilets or important resources are protected to just them.  The inhabitants don't have any common shared goals.  Therefore, the apartment living conditions make sense.  However, these living conditions can be problematic in two ways.  First, this is a more costly way to live and operate both for each inhabitant and their non-shared resources.  Second, if any of the inhabitants have any shared goals, their lack of internal doors means a much slower communication channel and forward progress will be slower.

The Wrap-Up


Now going back to non-analogy aspect.  Many of the NetflixOSS projects (Eureka/Asgard/etc) come from a model that I think best is described by "Doorless Single Family Home".  There is nothing wrong with that for the type of organization Netflix is and likely when deployed inside of Netflix there are more doors added beyond the public OSS.  At IBM, in our own usage I believe we need at least "Single Family Home with Doors" mostly to add some doors that protect us from new users of the cloud technology from accidentally impacting others.  Some have argued that we need the "Fraternity/Sorority house" adding in locked doors until people are confident that people won't even with unlockable doors impact others.  Adding locking of doors means things like adding owner writable only namespaces to Eureka, locking down which clusters can be changed in Asgard, providing segmented network VLAN's, etc.  Finally, if we ever looked to run this fabric across multiple IBM customers (say Coke and Pepsi), it is hard to argue that we wouldn't need the full "Apartment Building" approach.

I hope this helps others in discussing multitenancy.  I hope my own team won't get tired of these new analogies.




Friday, June 27, 2014

How is a multi-host container service different from a multi-host VM service?

Warning:  I am writing this blog post without knowing the answer to the question I am asking in the title.  I am writing this post to force myself to articulate a question I've personally been struggling with as we move towards what we all want - containers with standard formats changing how we handle many cases in the cloud.  Also, I know there are folks that have thought about this for FAR longer than myself and I hope they comment or write alternative blogs so we can all learn together.

That said, I have seen throughout the time leading up to Dockercon and since what seems to be divergent thoughts that when I step back aren't so divergent.  Or maybe they are?  Let's see.

On one hand, we have existing systems on IaaS clouds using virtual machines that have everything controlled by API's with cloud infrastructural services that help build up a IaaS++ environment.  I have specifically avoided using the word PaaS as I define PaaS as something that tends to abstract IaaS to a point where IaaS concepts can't be directly seen and controlled.  I know that everyone doesn't accept such a definition of PaaS, but I use it as a means to help explain my thoughts (please don't just comment exclusively on this definition as it's not the main point of this blog post).  By IaaS++ I mean an environment that adds to IaaS offering services like continuous delivery workflows, high availability fault domains/automatic recovery, cross instance networking with software defined networking security, and operational visibility through monitoring.  And by not calling it PaaS, I suggest that the level of visibility into this environment includes IaaS concepts such as (VM) instances through ssh or other commonly used *nix tools, full TCP network stack access, full OS's with process and file system control, etc.

On the other hand, we have systems growing around resource management systems and schedulers using "The Datacenter as a Computer" that are predominantly tied to containers.  I'll admit that I'm only partially through the book on the subject (now in 2nd edition).  Some of the systems in open source to implement such datacenter as the computer/warehouse scale machines are Yarn (for Hadoop), CoreOS/Fleet, Mesos/Marathon and Google Kubernetes.

At Dockercon, IBM (and yours truly) demoed a Docker container deployment option for the IBM SoftLayer cloud.  We used our cloud services fabric (partially powered by NetflixOSS technologies) on top of this deployment option as the IaaS++ layer.  Given IBM SoftLayer and its current API doesn't support containers as a deployment option, we worked to implement some of ties to the IaaS technologies as part of the demo reusing the Docker API.  Specifically, we showcased an autoscaling service for automatic recovery, cross availability zone placement, and SLA based scaling.  Next we used the Docker private registry along side the Dockerhub public index for image management.  Finally we did specific work to natively integrate the networking from containers into the SoftLayer network.  Doing this networking work was important as it allowed us to leverage existing IaaS provided networking constructs such as load balancers and firewalls.

Last night I watched the Kubernetes demo at Google I/O by Brendan Burns and Craig McLuckie.  The talk kicks off with an overview of the Google Compute Engine VM optimized for containers and then covers the Kubernetes container cluster management open source project which includes a scheduler for long running processes, a labeling system that is important for operational management, a replication controller to scale and auto recover labeled processes, and a service abstraction across labeled processes.

I encourage you to watch the two demo videos before proceeding, as I don't want to force you into thinking only from my conclusions.  Ok, so now that you've watched the videos yourself, let me use the two videos to look at use case comparison points (the links now jump to the right place in each video that are similar):

Fast development and deployment at scale



Brendan demonstrated rolling updates on the cloud.  In the IBM demo, we showed the same, but as an initial deployment on a laptop.  As you see later in the demo, due to the user of Docker, running on the cloud is exactly the same as the laptop.  Also, the IBM cloud services fabric devops console - NetflixOSS Asgard also has the concept of rolling updates as well as the demonstrated initial deployment.  Due to Docker, both demos use essentially the same approach to image creation/baking.

Automatic recovery


I like how Brendan showed through a nice UI the failure and recovery as compared to me watching log files of the health manager.  Other than presentation, the use case and functionality was the same.  The system discovered a failed instance and recovered it.

Service registration

Brendan talked about how Kubernetes offers the concept of services based on tagging.  Under the covers this is implemented by a process that does selects against the tagged containers updating an etcd service registry.  In the cloud services fabric demo we talked about how this was done with NetflixOSS Eureka in a more intrusive (but maybe more app centric valuable) way.  I also have hinted about how important it is to consider availability in your service discovery system.

Service discovery and load balancing across service implementations

Brenda talked about in Kubernetes how this is handled by, currently, a basic round robin load balancer.  Under the covers each Kubernetes node starts this load balancer and any defined service gets started on the load balancer across the cluster with information being passed to client containers via two environment variables, one for the address for the Kubernetes local node load balancer, and one for the port assigned to a specific service.  In the cloud services fabric this is handled by Eureka enabled clients (for example NetflixOSS Ribbon for REST), which does not require a separate load balancer and is more direct and/or the similar NetflixOSS Zuul load balancer in cases where the existing clients can't be used.

FWIW, I haven't seen specifically supported end to end service registration/discovery/load balancing in non-Kubernetes resource managers/schedulers.  I'm sure you could build something similar on top of Mesos/Marathon (or people already have) and CoreOS/etcd, but I think Kubernetes concept of labels and services (much like Eureka) are right in starting to integrate the concept of services into the platform as they are so critical in microservices based devops.

I could continue to draw comparison points for other IaaS++ features like application centric metrics, container level metrics, dynamic configuration management, other devops workflows, remote logging, service interaction monitoring, etc, but I'll let that to the reader.  My belief is that many of these concepts will be implemented in both approaches, as they are required to run an operationally competent system.

Also, I think we need to consider tougher points like how this approach scales (in both demos, under the covers networking was implemented via a subnet per Docker host, which wouldn't necessarily scale well), approach to cross host image propagation (again, both demos used a less than optimal way to push images across every node), and integration with other important IaaS networking concepts (such as external load balancers and firewalls).

What is different?

The key difference that I see in these systems is terminology and implementation.

In the IBM demo, we based the concept of a cluster on what Asgard defines as a cluster.  That cluster definition and state is based on multiple separate, but connected by version naming, auto scaling groups.  It is then, the autoscaler that decides placement based on not only "resource availability", but also high availability (spread deployments across distinct failure domains) and locality policies.  Most everyone is available with the concept of high availability in these policies in existing IaaS - in SoftLayer we use Datacenters or pods, in other clouds the concept is called "availability zones".  Also, in public clouds, the policy for co-location is usually called "placement groups".

Marathon (a long running scheduler on top of the Mesos resource manager), offers these same concepts through the concept of constraints.  Kubernetes today doesn't seem, today, to offer these concepts likely due to its initial focus on smaller scenarios.  Given its roots in Google Omega/Borg, I'm sure there is no reason why Kubernetes couldn't eventually expose the same policy concepts within its replication controller.  In fact, at the end of the Kubernetes talk, there is a question from the crowd on how to make Kubernetes scale across multiple Kubernetes configurations which could have been asked from a more high-availability.

So to me, the concept of an autoscaler and its underlying implementation seems very similar to the concept of a resource manager and scheduler.  I wonder if public cloud auto scalers were open sourced if they would be called resource managers and long running schedulers?

The reason why I ask all of this is as we move forward with containers, I think we might be tempted to build another cloud within our existing clouds.  I also think the Mesos and Kubernetes technologies will have people building clouds within clouds until cloud providers natively support containers as a deployment option.  At that point, will we have duplication of resource management and scheduling if we don't combine the concepts?  Also, what will people do to integrate these new container deployments with other IaaS features like load balancers, security groups, etc?

I think others are asking the same question as well.  As shown in the IBM Cloud demo, we are thinking through this right now.  We have also experimented internally with OpenStack deployments of Docker containers as the IaaS layer under a similar IaaS++ layer.  The experiments led to a similar cloud container IaaS deployment option leveraging existing OpenStack approaches for resource management and scheduling as compared to creating a new layer on top of OpenStack.  Also, there is a public cloud that has likely considered this a long time ago - Joyent.  Joyent has had SmartOS zones which are similar to containers under its IaaS API for a long time without the need to expose the formal concepts of resource management and scheduling to its users.  Also, right at the end of the Kubernetes demo, someone in the crowd asks the same question.  I took this question to ask, when will the compute engine support container deployment this way without having a user setup their own private set of Kubernetes systems (and possibly not have to consider resource management/scheduling with anything more than policy).

As I said in the intro, I'm still learning here.  What are your thoughts?

Friday, June 20, 2014

Quick notes on a day of playing with Acme Air / NetflixOSS on Kubernetes

I took Friday to play with the Kubernetes project open sourced by Google at Dockercon.

I was able to get a basic multi-tier Acme Air (NetflixOSS enabled) application working. I was able to reuse (for the most part) containers we built for Docker local (laptop) from the IBM open sourced docker port. By basic, I mean the front end Acme Air web app, back end Acme Air authentication micro service, Cassandra node and Acme Air data loader, and the NetflixOSS Eureka service discovery server. I ran a single instance of each, but I believe I could have pretty easily scaled up each instance of the Acme Air application itself easily.

I pushed the containers to Dockerhub (as Kubernetes by default pulls all container images from there). This was as pretty easy using these steps:

1. Download and build locally the IBM Acme Air NetflixOSS Docker containers
2. Login to dockerhub (needed once I did a push) via 'docker login'
3. Tag the images - docker tag [imageid] aspyker/acmeair-containername
4. Push the containers to Dockerhub - docker push aspyker/acmeair-containername

I started each container as a single instance via the cloudcfg script:

cluster/cloudcfg.sh -p 8080:80 run aspyker/acmeair-webapp 1 webapp

I started with "using it wrong" (TM, Andrew 2014) with regards to networking. For example, when Cassandra starts, it needs to know about what seed and peer nodes exist and Cassandra wants to know what IP addresses these other nodes are at. For a single Cassandra node, that means I needed to update the seed list to the IP address of the Cassandra container's config file to itself. Given our containers already listen on ssh and run supervisord to run the container function (Cassandra in this case), I was able to login to the container, stop Cassandra, update the config file with the container's IP address (obtained via docker inspect [containerid] | grep ddr), and restart Cassandra. Similarly I needed to update links between containers (for how the application/micro service found the Cassandra container as well how the application/micro service found Eureka). I could ssh into those containers and update routing information that exists in NetflixOSS Archaius config files inside of the applications.

This didn't perfectly work as the routing in NetflixOSS powered by Ribbon and Eureka use hostnames by default. The hostnames currently assigned to containers in Kubernetes are not resolvable by all other containers (so when the web app tried to route to the auth service based on the hostname registered and discovered in Eureka, it failed with UnknownHostException). We hit this in our SoftLayer runs as well and had patched Eureka client to never register the hostname.  I had asked about this previously on the Eureka mailing list and discovered this is something that Netflix fixes internally in Ribbon. I ended up writing a patch for this for Ribbon to just use IP addresses and patched the ribbon-eureka module in Acme Air.

At this point, I could map the front end web app instance to the Kubernetes minion host via cloudcfg run -p 8080:80 port specification and access Acme Air from the Internet in my browser.

My next steps are to look are running replicationControllers around the various tiers of the application as well as making them services so I can use the Kubernetes built in service location and routing.  I can see how to do this via the guestbook example.  In running that example I can see how if I "bake" into my images an idea of a port for each service, I can locate the port via environment variables.  Kubernetes will ensure that this port is routing traffic to the right service implementations on each Kubernetes host via a load balancer.  That will mean that I can start to route all eureka traffic to port 10000, all web app traffic to port 10001, all Cassandra traffic to port 10002, all auth micro service traffic to port 10003 for example.  This approach sounds pretty similar to an approach used at Netflix with Zuul.

Beyond that I'll need to consider additional items like:

1.  Application data and more advanced routing in the service registration/location

2.  How available the service discovery is, especially as we consider adding availability zones/fault domains.

3.  How do I link this into front facing (public internet) load balancers?

4.  How would I link in the concept of security groups?  Or is the port exposure enough?

5.  How I could start to do chaos testing to see how well the recovery and multiple fault domains works.

I do want to thank the folks at Google that helped me get through the newbie GCE and Kubernetes issues (Brendan, Joe and Daniel).

Tuesday, June 10, 2014

Docker SoftLayer Cloud Talk at Dockercon 2014

The overall concept



Today at Dockercon, Jerry Cuomo went over the concept of borderless cloud and how it relates to IBM's strategy.  He talked about how Docker is one of the erasers of the lines between various clouds with regards to openness.  He talked about how, regardless of vendor, deployment option and location, we need to focus on the following things:

Fast


Especially in the age of devops and continuous delivery how lack of speed is a killer.  Even worse, actually unforgivable, having manual steps that introduce error is not acceptable any longer.  Docker helps with this by having layered file systems that allow for just updates to be pushed and loaded.  Also, with its process model it starts as fast as you'd expect your applications to start.  Finally, Docker helps by having a transparent (all the way to source) description model for images which guarantees you run what you coded, not some mismatch between dev and ops.

Optimized


Optimized means not only price/performance but also optimization of location of workloads.  In the price/performance area IBM technologies (like our IBM Java read-only memory class sharing) can provide for much faster application startup and less memory when similar applications are run on a single node.  Also, getting the hypervisor out of the way can help I/O performance significantly (still a large challenge in VM based approaches) which will help data oriented applications like Hadoop and databases.

Open


Openness of cloud is very important to IBM, just like it was for Java and Unix/Linux.   Docker can provide the same write once, run anywhere experience for cloud workloads.  It is interesting how this openness combined with the fast/small also allows for advances in devops not possible before with VM's.  It is now possible to now run production like workload configurations on premise (and on developer's laptops) in almost the exact same way as deployed in production due to the reduction in overhead vs. running a full virtual machine.

Responsible


Moving fast isn't enough.  You have to most fast with responsibility.  Specifically you need to make sure you don't ignore security, high availability, and operational visibility when moving so fast.  With the automated and repeatable deployment possible with Docker (and related scheduling systems) combined with micro-service application design high availability and automatic recovery becomes easier.  Also, enterprise deployments of Docker will start to add to the security and operational visibility capabilities.

The demo - SoftLayer cloud running Docker



After Jerry covered these areas, I followed up with a live demo.

On Monday, I showed how the technology we've been building to host IBM public cloud services, the Cloud Services Fabric (CSF), works on top of Docker.  We showed how the kernel of the CSF, based in part on NetflixOSS, and powered by IBM technologies was fully open source and easily run on a developer's laptop.  I talked about how this can even allow developers to Chaos Gorilla test their micro-service implementations.

I showed how building the sample application and its microservice was extremely fast.  Building an update to the war file took more time than containerizing the same war for deployment.  Both were done in seconds.  While we haven't done it yet, I could imagine eventually optimizing this to container generation as part of an IDE auto compile.


In the demo today, I followed this up with showcasing how we could take the exact same environment and marry it with the IBM SoftLayer public cloud.  I took the exact same sample application container image and instead of loading locally, pushing through a Docker registry to the SoftLayer cloud.  The power of this portability (and openness) is very valuable to our teams as it will allow for local testing to mirror more closely production deployment.

Finally, I demonstrated how adding SoftLayer to Docker added to the operational excellence.  Specifically I showed how once we told docker to use a non-default bridge (that was assigned a SoftLayer portable subnet attached to the host private interface), I could have Docker assign IP's out of a routable subnet within the SoftLayer network.  This networking configuration means that the containers spun up would work in the same networks as SoftLayer bare metal and virtual machine instances transparently around the global SoftLayer cloud.  Also, advanced SoftLayer networking features such as load balancers and firewalls would work just as well with the containers.  I also talked about how we deployed this across multiple hosts in multiple datacenters (availability zones) further adding to the high availability options for deployment.  To prove this, I unleashed targeted chaos army like testing.  I showed how I could emulate a failure of a container (by doing a docker rm -f) and how the overall CSF system would auto recover by replacing the container with a new container.

Some links



You can see the slides from Jerry's talk on slideshare.

The video:

Direct Link (HD Version)

Saturday, June 7, 2014

Open Source Release of IBM Acme Air / NetflixOSS on Docker

In a previous blog, I discussed the Docker "local" (on laptop) IBM Cloud Services Fabric powered in part by NetflixOSS prototype.

One big question on twitter and my blog went unanswered.  The question was ... How can someone else run this environment?  In the previous blog post, I mentioned how there was no plan to make key components open source at that point in time.

Today, I am pleased to announce that all of the components to build this environment are now open source and anyone can reproduce this run of IBM Acme Air / NetflixOSS on Docker.  All it takes is about an hour, a decent internet connection, and a laptop with VirtualBox (or boot2docker, or vagrant) installed.

Specifically, the aspects that we have added to open source are:

  1. Microscaler - a small scale instance health manager and auto recovery/scaling agent that works against the Docker remote API.  Specifically we have released the Microscaler service (that implements a REST service), a CLI to make calling Microscaler easier, and a Microscaler agent that is designed to manage clusters of Docker nodes.
  2. The Docker port of the NetflixOSS Asgard devops console.  Specifically we ported Asgard to work against the Docker API for managing IaaS objects such as images and instances as well as the Microscaler API for clusters.  The port handles some of the most basic CRUD operations in Asgard.  Some scenarios (like canary testing, red/black deployment) are yet to be fully implemented.
  3. The Dockerfiles and build scripts that enable anyone to build all of the containers required to run this environment.  The Dockerfiles build containers of the Microscaler, the NetflixOSS infrastructural servers (Asgard, Eureka and Zuul), as well as the full microservices sample application Acme Air (web app, microservice and cassandra data tier).  The build scripts help you build the containers and give easy commands to do the end to end deployment and common administration tasks.
If you want to understand what this runtime showcases, please refer to the previous blog entry.  There is a video that shows the Acme Air application and basic chaos testing that proves the operational excellence of the environment.

Interesting compare:


It is interesting to note that the scope of what we released (the core of the NetflixOSS cloud platform + the Acme Air cloud sample/benchmark application) is similar to we previously released back at the Netflix Cloud Prize in the form of Amazon EC2 AMI's.  I think it is interesting to consider the difference when using Docker in this release as our portable image format.  Using Docker, I was able to easily release the automation of building the images (Dockerfiles) in source form which makes the images far more transparent than an AMI in the Amazon marketplace.  Also, the containers built can be deployed anywhere that Docker containers can be hosted.  Therefore, this project is going to be valuable to far more than a single cloud provider -- likely more on that later as Dockercon 2014 happens next week.

If you want to learn how to run this yourself, check out the following video.  It shows building the containers for open source, starting an initial minimal environment and starting to operate the environment.  After that go back to the previous blog post and see how to perform advanced operations.


Direct Link (HD Version)