This blog post is part one of a post about the work we've been doing in the performance space around applications scalability. While the second part will talk to the results we can share on Acme Air, this first part will cover some of the history and context.
Over the last year, I have been looking at the benchmarking space and how current benchmarks are useful to show scaling of the technologies. If you look at some of the most popular standardized benchmarks, the scaling aspects are less relevant than they used to be. In the past it was interesting to prove that certain middleware and data technologies could scale up to larger SMP systems and across clusters of SMP systems. However, as the hardware systems have gotten larger some of the current approaches to scaling applications have become less relevant (pinning processes to network cards, CPU sockets, memory subsystems, etc.) as most single applications don't need to scale to the size of modern large scale systems and most users can't afford to replicate such complex tuning. These complex tunings are very error prone for general users and if not applied properly can lead to worse performance than not using them at all.
Also, from a middleware perspective high availability isn't usually tested fully. Many of the benchmarks do not require realistic high availability in the middle tier, which allows for stateful implementations that wouldn't work under failure. Meanwhile the data tier remains monolithic and non-distributed. This means the benchmark competition isn't based on technical merit, but who can buy and optimize the largest database. In fact some of the most popular standardized benchmarks have documented costs ranging from a few million dollars to in one case thirty million dollars.
For these reasons, we have chosen for existing middleware benchmarks to focus on number of transactions per processing processor core as (a) this reflects true engineering performance work in our products regardless of scale and (b) this is directly related to how people pay for products - any improvement in this metric means cost savings for end users. Keep this history in mind as we discuss the performance of Acme Air and most specifically the scaling work on Acme Air. We call the scale up work using Acme Air "Project Scale".
When I looked externally to see who was gauging scale in the world, I found the programmable web's "Billionaire's Club". This study documents the trend of supporting Web API calls of what used to be browser facing clients and now more and more is not only browsers, but also mobile and business to business Web API's. If you look at the web and cloud today, the list of use cases and billionaires presented in this presentation better document the scaling that matters today than standardized benchmarks. I agree with the motivating factors listed in the presentation for why this modern scaling is occurring - mobile and partner web API enablement many times driven by consumer facing applications. I would add to that the sensors and the internet of things based use cases. Consider the number of sensors in automobiles, those tracking your bags during airline flights, traffic monitoring, streams of point of sales purchases, etc. When you put mobile together with the internet of things, the scale of applications growth with new bounds not yet known in typical enterprise applications. Also, you will note that almost every current billionaire is a "born on the cloud" application. Cloud is the more typical approach to deployment these days in large scale systems. Deployment on the cloud changes the approach to running and tuning workloads dramatically over approaches in standardized benchmarks. Finally, if you look at some of the top billionaires they are documenting through blogs and open source how they are getting there (NetflixOSS and Linkedin for example).
As I looked to see what it takes to scale middleware to the levels shown by this billionaire's club my experience told me it would be important to consider typical enterprise qualities of service - high availability, security, transactionality (when required). There are many real world examples and benchmarks that did not consider those aspects and when the world discovered the end result was usually a black eye to any existing performance claims. First on the "billionaire's club" side, many of the companies born on the cloud found success greater than their initial expectations and failed under increased load at some point in time. On the technology side, a great example was some of the initial MongoDB benchmarking which were not run with more advanced durability and replication configurations. A funny (and NSFW) video that documents this is the MongoDB is Web Scale video. To be fair to MongoDB, since those times the benchmarks have been re-run with better levels of durability and availability. In any benchmark of large scale, the benchmark and technologies employed must consider high availability, security, and transactionality.
Given all of this my team set out to create a workload (that is embodied in the OSS released recently called Acme Air) and run it to levels of scale that would put it in the billionaire's club reaching a few billion mobile and browser requests per day. We decided early on to do the scaling work in the cloud. We also made sure we continuously looked at the requirements for security, high availability, and transactionality. We also decided to do the work in the open (through open sources, discussion forums and blogs) to get the same level of worldwide understanding that the similar billionaire's club members have.
In the next post, I'll document our scaling results of Project Scale. I hope this context will give you enough background to understand why we are doing this work in the way we are.