Thursday, July 18, 2013

Acme Air Goes To The (Streaming) Movies - The AcmeAir / NetflixOSS Port

I had the opportunity to present some interesting work at the latest Netflix OSS meetup last night.

I presented the following slides:



If you walk through the slides, it covers the following work.

1.  We started from the Acme Air OSS project that is currently monolithic in design.  Specifically you will note that the user authentication service, which is called on every request, is basically a local library call.  Slide 1.

2.  We then, to take advantages of the micro-services architecture, split this authentication service off into its own separate application/service.  To ensure correct load balancing we could have naively bounced out through the front end nginx load balancing tier.  By splitting the application this allows for better scalability of subsystems and better fault tolerance of the main web application with regards to dependent services.  Slide 2.

3.  Next we started to re-implement both the web application and authentication service using runtime technologies from Netflix OSS, specifically Karyon, Eureka, Hystrix, and Ribbon.  By using these technologies we added more elastic scaling, better HA and increased performance and operational visibility.  You can checkout both open source projects (the original and the NetflixOSS enabled version) and do a diff to see the changes required in the application.  Slide 3.

4.  Finally, we deployed the web app and auth service and our data tier of WebSphere eXtreme Scale through Asgard into auto scaling groups / clusters.  This allows us easier scaling control, integration of application concepts like security groups and load balancers, and ability to roll code changes with zero downtime.  Slide 4.

We then ran this benchmarking framework around a small (1X) configuration.  This run included a single JMeter driver, a single instance of the web application, a single instance of the auth service, and a single data grid member.  You can see the performance results here.  You can see the results were pretty solid in throughput with zero errors across the entire run.

We then scaled up the workload.  Scaling it was pretty simple using Asgard by adjusting the minimum size of the cluster.  I didn't record these results, as we wanted to move the workload to a larger instance type.

After some experimentation we decided that we could get around 1 billion operations / day (which is about 12K requests per second) with 18X m1.large instances for the web application tier.  We put together a run with that size of instance and 20 JMeter instances, 18 web app instances, 20 auth service instances, and 16 data service instances.  You can see the performance results here.

The results show two things.  First, the workload peaks out at the expect overall throughput (around 13K requests per second which means 1.1 billion requests per day.  Second, there is a fall off after the peak is achieved.  I am currently working to see if this is as a result of poor tuning or something throttling the environment.

Here is a screen shot that shows the various consoles during the run.  At the left top you'll see the Eureka server.  At the bottom you'll see the Asgard server.  At the right, you'll see a Hystrix console monitoring a single web app instance.


Thanks to Ruslan and Adrian and team from Netflix for putting this event together.  It was very valuable to myself personally.  I can't wait to continue this work and for the next meetup.

Thursday, July 11, 2013

After a week or two with my Macbook Pro

I got a Macbook Pro a while back so I could run the Apple tools to be able to do final packaging of my hybrid mobile applications for Acme Air created with IBM Worklight.  While IBM Worklight gave me the full set of developer tools to create the application, I still needed the Apple tools to create Apple packages for installation on iOS.  As far as I knew such tools aren't available outside of Mac OS X so I was stuck having to use a Mac.

That said, a few weeks back my new Thinkpad or its SSD drive started to fail (I haven't had time to debug which).  I used this as an opportunity to move over the Macbook for my development.  I wanted to take a few minutes to quickly net out the things I like and dislike:

Like
  • Most people know that I'm an Android fan and dislike the entire Apple ecosystem of iCloud, iTunes, etc.  I was worried in moving that I'd be forced to use this ecosystem.  I can happily say that I'm using the Mac and not touching any of these Apple community tools.
  • Unix shell.  This is *THE* reason as a developer to move to Mac.  Being able to automate things with a real Unix shell is amazingly helpful as a developer.  If you are a developer and not working a majority of your time on Unix, I'd be surprised.
  • I, many years ago, ran Linux on my desktop in college and on my first Thinkpads at IBM.  I stopped doing that about seven years ago as while it was great for doing my development, my interaction through social tools (our required IBM Lotus Notes, the world required MS Office, etc) suffered.  I can say that on Mac OS they have struck a good balance and most of the non-development tools work fine there (one great example is PokerStars - not a great work example, but it shows a program that likely is never going to work well natively on Linux, but works just fine on MacOS).  One glaring missing tool for Mac is BeyondCompare (or a decent developer centric complex visual diff tool).
  • Virtual desktops (Mission Control).  As a developer of large scale systems utilizing multiple monitors and multiple virtual desktops is a must - something that has been common on Linux for a long time.  On Windows I was pretty happy with Dexpot, but on Mac this is baked in.
  • Gestures.  The magic trackpad and its gestures are amazing.  I am used to some of these from my tablets, but Mac OS takes this to a whole new level.  I have to say I missed this over the years and didn't know it even existed and have to give alot of credit to Apple for having this innovation likely well before tablets.
  • Stability.  There are weeks where I never reboot the system.  I just close the lid at work and then reopen at home and repeat.  I could have never run Windows this long without the system getting wonky.  When you have tons of windows open and have a nice setup you remember across virtual desktops for your work having the ability to get going in seconds after opening the lid is important and you can only do this effectively if you avoid shutdown/restart cycles.
  • The magnet based plug on the side.  It is really slick engineering.  All other power connections look stupid to me now.
Dislike
  • The magic trackpad and clicking/pointing.  I have to say I'm still better with a mouse for accuracy and after a long day of coding my pointer finger hurts from clicking.  I had some Apple friends advise me here, but I have to say I'm almost thinking about having both a mouse (for point/click/drag) and magic trackpad (for gestures) connected to be able to switch between them.
  • The laptop shell.  I find the lack of a rounded bottom edge of the keyboard annoying on my wrists.  It looks pretty, but I don't think it's very ergonomic.  I also feel like the heat tends to transfer worse to my lap than I remember with plastic'ish Thinkpads.
  • The lack of the equivalent function to windows Windows-Left, Windows-Right and drag to sides to snap to full screen/half right/half left screen.  Again, when you work with many windows being able to quickly organize them is important.  Most Linux window managers had this functionality years ago and when they added this in Windows 7 I was very happy.  It seems like Mac OS still doesn't have this support.  I was able to get a utility called TileWindowsLite that added this as keyboard shortcuts, but it's annoying that this isn't build into Mac OS.
  • People will now assume that I'm an Apple fanboy.  I'm not.  I'm a developer whole wants a laptop that is developer suited.  I have an "Android guy eating Apple" lid sticker coming to help explain that nuance. :)
  • Lack of keys on the keyboard.  I'm not sold that I needed dedicated keys for eject, volume, etc more than I needed a backspace, home, end, page up, page down key.  Maybe this is a coder thing vs. a general user, but I can't wait for the full keyboard to arrive that I have on order.  I constantly have to stop and remember how to do backspace, home/end, etc.
  • Command vs. control.  Having another meta key is cool given it opens up far more keyboard shortcuts.  However, the interesting thing is many of the systems I connect to still assume control is a key used frequently.  Due to this I tend to again spend alot of time stopping to remember if it's control or command for shortcuts.
  • The fact that there seems to be a bug in how Mac OS deals with TV monitors used as monitors.  Using the HDMI adapter the display looked terrible (grainy) when I knew it worked fine from another laptop.  It turns out this is a known problem and I have to instead use the VGA connection so the Mac doesn't see that it's a TV and downsample.
  • The lack of a docking station.  I hate reconnecting five wires every time I sit somewhere.  There is a kickstarter project to create a single connector that I'll likely buy once it's ready.  However, I really dislike the idea of not being able to buy a port replicator/docking station.
  • The inability to easily drive two external monitors while the laptop is shut.  My setup at home is a rack where I used to dock my thinkpad (off to the side of my desk) and then two "monitors", one a huge TV and another 17" monitor to the side.  I used to dock my laptop with its lid shut and then drive both monitors with the TV being my main one.  My understanding is I could do something similar with the Thunderbolt driving the TV and a USB adapter for the second monitor but I believe the USB solution is sub standard and likely requires more CPU burn and will likely be poor video quality.  With my thinkpad and docking station I had two DVI outputs that easily drive HDMI to the TV and VGA to the second monitor.
  • The cost of the apple peripherals and the lack of third party options.  In the "PC" world there were many cheap and good alternatives to buying thinkpad branded options.  I think I've spent about $300 now just on Apple keyboards, trackpads, and display adapters and I haven't bought the much needed second power cord yet.
All this said, I'm sticking with the Mac.  The UNIX shell for programming and cloud development makes the Mac much better than any other option at this point regardless of the dislikes.  If you have experience with any of my dislikes let me know.  I'd love to smooth some of the edges (not just the keyboard edge) of my Mac experience.

Tuesday, July 9, 2013

Acme Air List Of Links

Acme Air is a open source sample application and benchmark designed to be cloud native, mobile enabled, and has been run at Web Scale (over 4 billion mobile and browser client requests per day).

Acme Air is currently implemented in both Java and Node with a variety of NoSQL data tier implementations.  The workload has been run across a few IaaS clouds, PaaS platforms and bare metal deployments.  Feel free to learn from the code, documentation and the user community.   We encourage you to contribute your own implementations of different runtimes, data tiers, and cloud providers.

Below is the information that continues to be updated over time.  If you have further questions, add some comments to the blog or jump into the user group discussion.

Blogs


Introduction and Open Source Release - This blog cover the basic ideas of Acme Air.

Web Scale Part One - Historical Perspective and Functional Thoughts - This blog covers how this benchmark differs from other standardized benchmarks in scaling and performance.

Web Scale Part Two - Performance and Scaling Results - This blog covers a single "Web Scale" run of over 50,000 requests per secong (over 4 billion requests per day).

Acme Air Goes To The (Streaming) Movies - The AcmeAir / NetflixOSS Port - This blog covers a side project of Acme Air where we ported the application and operational approach to the NetflixOSS cloud platform.

Main Project Resources


Acme Air Open Source
Acme Air Wiki Documentation
User Group Discussion

Related works


Netflix OSS version of Acme Air
Netflix OSS version of Acme Air ported to WebSphere Liberty Profile
Netflix OSS Acme Air AMI's
Netflix OSS Acme Air running on the IBM SoftLayer cloud


Last Updated:  2013-07-14

Monday, July 8, 2013

Web Scale with Acme Air (Project Scale) - Part Two

In the last blog post, I talked about the thoughts that went into why we created a new workload called Acme Air and the fundamental functional thoughts that went into the performance and scaling work of the benchmark we called internally Project Scale.

Now I want to share results.  You can view the summary report of one of our large scale runs we talked about at Impact 2013.  This report was created by a project called "acmeair-reporter" (not yet open sourced) that takes all the log files across all the instances and processes to calculate throughput, number of errors, latency and per node performance metrics.

Now, let me summarize the results.  The results were collected for a time span of 10 minutes.  While it would be better to run for an hour or two to ensure this is no variability in results over longer runs, we have proven that internally the throughput is sustainable and we wanted to simplify the report.  Also note that (a) the throughput over the 10 minutes is very constant (ignore the beginning and end which is an artificial artifact of the JMeter tool and our reporter) and (b) this was after thirty minutes of warm-up and a second run of 10 minutes repeated the results.

As a top line number we achieved roughly 50,000 requests / second.  These requests are end-to-end mobile client and Web 2.0 browser requests, which means that each request has mobile enablement aspects, REST endpoints, business logic, and data tier interactions involved.  The lightest request requires at least one deep data tier read (to validate the user's session) along with a flight query cache query and all the REST enablement to expose this to a browser and/or mobile client.  When calculated out to 24 hours, this is roughly 4.3 billion requests per day.  As mentioned in the previous blog post, this puts this benchmark run in the "Billionaire's Club".  This means on average for every second of the benchmark run, the application was serving the same order of magnitude previously documented by Google, Facebook, and Netflix.

While many internet companies are talking about the technology they are using to serve such throughput at Web Scale, I don't believe there are any publicly available workloads in open source that someone can run and repeat web scale results.  With the open source application source code, the documented results should be repeatable by someone willing to work with the application and enough operational expense to try the workload on a cloud.  To be fair to the internet companies, their applications are likely far more complex.  That said, the Acme Air application is as complicated as most other traditional benchmarks and far more complete in breadth (especially in cloud first and mobile architectures).  We believe this balance between repeatable benchmark and real world sample application should be beneficial to the world and help us start to focus open community conversation around web scale and cloud workload benchmarking.

We released the results of the "Java" implementation which is based on the WebSphere Liberty Profile application server (a light weight and nimble app server that is well suited to cloud deployments), WebSphere eXtreme Scale (an advanced data grid product that is also suited well to cloud), IBM Worklight (which gave us not only the development and deployment platform for our natively deployed Android and iOS mobile applications, but also gave us a server to mobile enable our application services), and nginx (a popular cloud http proxy with load balancing features).  We used 51 WebSphere Liberty Profile servers, 46 WebSphere eXtreme Scale data containers, 28 IBM Worklight servers, and 10 nginx servers.  We ran the 50,000 requests/sec using 49 JMeter load driver instances all coordinated to run emulated mobile and desktop browser traffic.  We ran all of this on the IBM Smart Cloud Enterprise cloud mostly in the Germany cell.  This is a sum total of 185 "copper" instances.  While the numbers are likely far smaller than the number of servers used in any internet company, for a web scale enterprise application this was a significantly sized application.

Digging deeper into the results, it is worth noting that the report covers many aspects of the performance of the systems (not only overall throughput but also latency distribution, per node system performance, error rate, workload distribution).

Some cool numbers taken from the report:
  1. The run had 30 million requests.  Of those 30 million requests, only 4 requests reported errors.  That is 100 thousandths of a percent of all requests.
  2. The 90th percentile response time never peaked over 300 ms for these end-to-end requests.  On average we were averaging around 70 ms per request.
  3. The main workhorse in the benchmark (the application server) averaged 91% CPU utilization across all 51 application servers.  Likely CPU utilization was actually higher as some outliers at the run start/end are averaged in.  The other workhorse tiers (the data grid containers and the mobile servers) averaged 83% and 62% respectively which means with cluster resizing we could have used fewer servers.
  4. We ran both browser traffic and mobile traffic.  In this run we decided to focus more on the browser traffic emulating an application that was just starting to take on mobile users.  On average we ran 5 to 1 more browser traffic than mobile traffic.

One last final cool number - PRICE!


In the previous blog post I talked about the cost of some of the largest scale standardized benchmarks.  I mentioned how one of the largest documented standardized benchmark runs cost approximately thirty million dollars to procure the base system needed to run the workload.  That specific publication ran an order of magnitude larger number of transactions per second than shown in this Acme Air performance result, but the transactions were purely database focused.  The standardized benchmark did not contain the end-to-end mobile app and browser enablement.  The standardized benchmark was also monolithic in nature meaning the failure of the database would mean zero throughput (all or nothing high availability).  With the cloud architecture and breath of features of the Acme Air application, it does not suffer from these legacy shortcomings.  In comparison to this $30 million for the existing standardized benchmarks, our cost was a few thousand dollars in cloud spend per month to produce this Acme Air performance number.  We didn't get an exact cost as we tore down the workload once the performance tests were over, but you can (and should) do your own math of the number of instances we used running 24/7 for a month.  We also achieved this with a team of five people at IBM.  I believe this dramatic reduction in expenses and required human resources is significant and this is the key reason why performance work needs to focus on cloud first deployments going forward.  The cloud and cloud first application architectures have fundamentally changed the world of computing.

Next up .. some words about our availability and transactionality.  Stay tuned!




Tuesday, July 2, 2013

Some tips/tricks with gradle builds

Blog post is under construction ...

Dependencies


Understanding dependencies ...

./gradlew -q acmeair-webapp:dependencyInsight --dependency javax.servlet

I looked at the output and then saw the exact package it was bringing in:

org.eclipse.jetty.orbit:javax.servlet:2.5.0.v201103041518

And added this to providedRuntimes in my packaging:


providedRuntime 'org.eclipse.jetty.orbit:javax.servlet:2.5.0.v201103041518'

And finally, the specific jar was gone from my war

By adding something to the providedRuntime, you can break things as this instructs the packager to treat the runtime and its dependencies as not required.  I ran into the issue when I told it not to include aws-java-sdk which depends on commons-logging-1.1.1.  Without being more specific, I ran into Spring not starting with the infamous error which was actually this.

You can be more specific in exclusion by adding .jar like:


providedRuntime 'com.amazonaws:aws-java-sdk:1.3.27@jar'

Plugins


Using the application plugin

To use the application plugin here are all four steps for getting it to work with arguments in a multi-project build:

project(':somesubproject') {
  apply plugin: 'application'
  mainClassName = 'org.spyker.MainClass'
  run {
    args = ['someargument', 'asecondargument]
  }
}

./gradlew :somesubproject:run

Notes:  If you only have a single argument remember to put it in an array args = ['singlearg'].  I eventually need to find out how to pass in args via a -P parameter.