Tuesday, August 14, 2012

Measuring performance of platform focused private clouds

I had the opportunity to present to some of our partners yesterday what we're doing to measure the performance of our Pure Application System yesterday and wanted to write down what I presented.

Previously Jerry Cuomo blogged on how we are looking at benchmarking clouds. He covered the concepts of time to deploy, density, elastic scale, resiliency, runtime performance and time to genesis. While I think all of these matter what has become clear is they matter in different ways when we're looking at the kinds of clouds that Pure Application Systems enables.

Due to be being platform focused (vs. infrastructure focused)

When looking at the performance of setting up a typical web to database solution you must consider the entire platform that is part of "deployment". If you consider a typical configuration of this sort of application that provides elastic scaling and high availability, the "application" could easily involve up to ten servers (multiple application servers, session state / caching offload servers, load balancers, databases, etc.). When one of "applications" gets started, each one of these servers needs to be configured, interlinked, and started in concert. It does no good to allow your application server to handle requests without the database or session caching server being known and interconnected first. In measuring the performance of this time to deployment, the end to end solution must be considered. We are looking at the time from when a user clicks deploy on their application until the *entire* set of servers is ready to handle the first request to the entire platform. In contrast infrastructural benchmarks look only at the time to get a single virtual machine up and running. Typically such performance measurements don't consider the time for the servers on top of the VM's to be ready to handle a request and certainly don't consider the time it takes for a group of servers associated with a more complex platform take to be ready to work together.

The Pure Application System provides common multi-tenant services for things like session caching and elastic load balancing. Such services aren't usually considered by infrastructural performance measurements. These shared services are just as critical as the non-shared platform components. Given these shared services are multi-tenant careful performance work must be done to ensure they scale across applications within the platform while providing resiliency and isolation.

Due to being private cloud focused (vs. public)

There exist many frameworks to measure the performance of public clouds (EC2, Rackspace, etc.). However, due to the opaqueness of using a public cloud, these frameworks can't focus on key aspects that can be analyzed in private cloud offerings.

Resiliency is one reason why people consider private vs. public clouds. With resiliency, you want to consider not only high availability/fail over scenarios, but scenarios that ensure performance is maintained in the face of "bad" applications that end up being co-located on a multi-tenant cloud. "Bad" applications could be intentionally bad applications (where isolation for security is key) or non-intentionally bad applications (where isolation for common resource protection is key). On a public cloud, such resiliency would be hard to test as it's impossible to force co-location of workloads. With a private cloud it's far easier to force scenarios to test resilency. This isn't to say that resiliency and isolation aren't important to measure on public clouds, but it should be clear that it's far easier to ensure on private clouds.

With private clouds, you are investing in infrastructure on site. Due to this, ensuring that the system is fully optimized for the platform being provided is critically important. In public cloud you pay for an instance and you can't tell how optimal that instance is during it's own operations or when co-located with other instances. In private cloud you'll know very clearly how many applications and management operations can be supported per unit of on-site infrastructure (density). Therefore when comparing private cloud systems, it's very important to ensure you are getting the most top to bottom optimized software and hardware stack. This has been one of the top focus areas of Pure Application Systems.

With private clouds, the owner is in charge of getting them ready to handle requests. Jerry talked about time to genesis in his blog. This measures the time to get the system from power on state to the first deployment is available. Any operations that occur in this phase time not only time, but introduce the possibility for mistakes that could negatively impact the system well into the operational life of the cloud. While this isn't a typical "performance measurements" it is very important in private cloud infrastructures. As Jerry stated, you have to be very careful in "when to stop the watch". With Pure Application Systems we stop the clock when the platform is ready for applications to be deployed. I have seen other measurements (again related to be platform focused vs. infrastructural focused) that stop the clock when the operation systems or virtual machines are started. You can see our timed genesis process already in the video on the Pure Application Systems homepage.

Friday, July 27, 2012

Good article on crowdsourced benchmarks

In a recent article, Derrick Harris talked about how crowdsourcing could be the future of benchmarking. As someone who participates deeply in standardized benchmarks (TPC and SPEC), I wanted to comment on some of the important messages in his blog.

Derrik talks about benchmarking within the context of Hadoop, but in general the article applies to benchmarking across multiple technologies. While SPEC and TPC benchmarks have incredible industry credibility, its hard to ignore the fact that Hadoop, NoSQL, and many open source projects have long since played a different game. I read blogs all the time that talk about simple developer laptop performance tests. While these benchmarks (more realistically performance experiments) aren't what would matter in a datacenter in enterprise application performance, they usually after some review and adjustment tend to have good bits of performance knowledge. I also see single vendor performance results and claims that give very little information.

I have, in the past, talked about the value of standardized benchmarks. I talked about why doing such benchmarks at SPEC lead to unbiased and trusted results. I think the key reason is the rigor and openness by which the review is done and the focus on scenarios that matter within enterprise computing. Also SPEC has years of experience in benchmarking to leverage to avoid pretty common performance testing mistakes. It's impossible to compare a developer laptop performance experiment to any SPEC benchmark result. The result from SPEC is likely far more credible. With SPEC, benchmark results are usually submitted from a large number of vendors meaning the benchmark matters to the industry. With performance experiments, until there is community review and community participation, there is only one vendor which leads to "one off" tests that have less long standing industry value. The scenario where I wrote about this - a Microsoft "benchmarketing" single vendor result - is a very good example of how results from a single vendor don't have much value.

But there is a problem with some SPEC benchmarks - the community by which results are disclosed and benchmarks designed is a closed community. It's great that SPEC is an un-biased third party to the vendors, but that doesn't mean the review is a community of the consumers of the results. I think Derrik reflects on this by talking about how "big benchmarks" aren't running workloads anyone runs in product. I disagree, but do believe due to the lack of open community it's harder for the consumers to understand how the results compare to their own workloads. I personally will attest to how SPECjEnterprise 2010 and its predecessors have improved Java based application servers for all customer applications. While it might not be clear how that specific benchmark matters to a specific customer's use of a Java based application server, it is not true that improvements shown via the benchmark don't benefit the customer's applications over the long haul. In contrast to Derrik's views, this is why customers benefit from vendors participating themselves in such benchmarking - I don't think this have occurred if all benchmarking done was done without vendor involvement.

BTW, full disclosure of the performance experiment and results is critical. You can see in the recent Oracle ad issue, that the whole industry loses without such disclosure. Any performance data should be explained within the context of publicly available tests and methodology, tuning etc.

I think if you put the some of these views together (Derrick's and my standardized benchmark views), you'll start to see some possible common threads. Here I think are the key points:

1) We need open community based (crowdsourced is one option, more open standardized benchmarking is another) benchmarking in this day and age. By doing this, the results should be seen as not only trustable but also understandable.

2) Any benchmark, to have value, must have multiple participants actively engaged in publishing results and actively discussing the results and technologies that led to the results. By doing this, the benchmark will have long standard industry value.

I hope this post generates discussion (positive and negative). I'd love to take action and start to figure out how the industry can move forward in open and community based benchmarking.

Thursday, July 26, 2012

Mobile Development on Resume - Check

In the last few weeks, I've been working on a mobile application as a side project. I used IBM Worklight Studio (get the free for developers version here) to design and package the application. Today I used the Android SDK to deploy that application to my Motorola Photon (Android) Smart Phone.

Not really a performance oriented post, but I wanted to quickly talk about how easy this was. I was able to, without any knowledge of Android programming specifics, get this application written and deployed.

Worklight allows you to use open and portable HTML5/JavaScript and popular AJAX widget libraries (jQuery/DOJO) to implement applications that look to be native on each device you target all the while allowing you to access device specific features. So far, I've only deployed to my personal Android (note I'm not a Apple fan). If my wife lets me, I might deploy to her iPhone (she unfortunately is an Apple fan). The cool thing about Worklight is I should be able to take the same HTML5/JS codebase and re-target to iPhone. My guess is now that I have one version done, moving to iPhone shouldn't take more than a few hours.

With HTML5/JavaScript/JavaScript Mobile Widgets and embedded browsers becoming ubiquitous, it really does seem like this environment is becoming what Java is to servers. Write "once", run "everywhere". I did run into small issues (like Date formatting), so its not perfect yet, but its getting darn close. Tends to feel like Applets on the client years and years ago. I wonder if this development paradigm will, over time, make the mobile development experience as easy and open as Java has to writing server applications.

Thursday, July 12, 2012

Basic WebSphere Liberty Profile Tuning

I was in need of going beyond out of the box tuning for the WebSphere Liberty Profile recently.  I was working with a system that was front ending services hosted in Liberty.  We wanted to make sure that Liberty wasn't the bottleneck in the overall system.  Turns out that the tuning proved that Liberty wasn't an issue even with the out of the box tuning.  However, since the tuning isn't yet documented, I wanted to put out what I learned.  There will be more tuning information coming in a refresh to the InfoCenter, so I'll update this post with that link when that formal documentation exists. [EDIT]The infocenter now has tuning information - see this topic[/EDIT].  Here is what I tuned:

JVM Heap Sizing:

I won't advise you on heap size tuning as there is a wealth of information on JVM tuning that basically applies to WAS or Liberty and is mostly affected by your application memory needs. Of course, Liberty requires a lower server cost of memory footprint, but beyond that the tuning is similar. In order to do basic heap tuning, create a file in the same directory as the server called jvm.options with the following contents.
-Xms1024m
-Xmx1024m

Thread Pool Sizing:

Thread pool tuning is always interesting. It's easy to say that you should create roughly the same number of threads or slightly more than the number of server processor threads if the application has no I/O. Unfortunately, no application is void of I/O and therefore needs to wait sometimes. Therefore, you usually want to allocate 4 to 5 times the number of threads than can execute with no I/O. Based on this (assuming a server that has 16 cores), add the following to your server.xml. Of course this totally depends on the I/O your application does. I suggest tuning this to a lower value and do a performance run. If you can't saturate the application, tune it higher and repeat. If you set this value over the optimal value, it won't hurt you tremendously, but you will be more efficient if you get closer to the optimal value due to context/thread switching.
<executor name="LargeThreadPool" id="default" coreThreads="40" maxThreads="80" keepAlive="60s" stealPolicy="STRICT" rejectedWorkPolicy="CALLER_RUNS" />

HTTP Keep-Alive Tuning:

As my application was services based, I wanted the clients to be able to send multiple requests using HTTP Keep-Alive to keep latency down. Otherwise, without this tuning, the connection would close and I'd have to endure HTTP/TCP setup/teardown cost on every request which can be slow and burn up ephemeral ports on a load client. If you want to set the keep alives to be controlled by the client and more infinite from the server side, set the following option in server.xml (make sure this is under a httpEndpoint stanza):
<httpOptions maxKeepAliveRequests="-1" />

Monitoring:

I was pinged by an IBM'er trying to monitor the performance of Liberty. There is a good overview of the monitor feature you can add to a server to allow JMX based monitoring in the infocenter.

Updated 2012-08-09 - Added monitoring section.

Tuesday, July 10, 2012

Web Scale and Web 2.0/Mobile Changes to App Server Performance

Recently, I've been working on two performance projects.  The first relates to a Web 2.0 and Mobile application designed for web scale.  The second relates to recent performance improvements we made in SPECjEnterprise 2010 for   the WebSphere Application Server 8.5 which is based upon Servlet/JSP MVC presentation technology designed to be run on a clustered application server configuration for scale out.  I wanted to write about the how the app server behaves differently between these applications based on the inherently different approaches to application architecture.

A few years ago, I remember discussing how Web 2.0 would change the performance profile of a typical application server handling requests from browsers.  It was pretty simple to see that Web 2.0 would increase the number of "service" (JAX-RS doing http/xml or http/json) requests.  Other less obvious changes to the performance profile of an application server that result are documented below.

Static content goes away completely saving app server cycles.

It should already be a well known practice that html pages and images and stylesheets which people consider static shouldn't be served by an application server and instead moved to a http server or content distribution network (CDN).  A full blown application server just isn't the fastest server for serving basic http requests for files that don't change.

If you look at a typical Servlet/JSP (Web 1.0 server side MVC) approach, you'll see JSP pages stored on the server with substantial static content that has scriptlets or tag libs that mix in some dynamic content.  If you look at a typical web page (let's say twitter for example), my guess is the static content on any dynamic pages (the html markup to organize the table containing the tweets) is like 70% of the page content, with 30% being actual dynamic data.  We have spent alot of time making our app server perform well sending out this mixed static and dynamic data from our JSP engine.  The output of such data includes handling dynamic includes, character set conversions, processing of tag libraries, etc.  This action of outputting the JSP content is the aggregation of basically static content and true dynamic content from back end systems like databases.

Once you move to Web 2.0 and Mobile, you can treat that static content as truly static moving the static content to a web server or CDN.  Now the browser can do the page aggregation leveraging AJAX calls to get only the dynamic content from the application server via JAX-RS services, static content in the form of HTML and JavaScript served from web servers or CDN's, and JavaScript libraries to combine the two sets of content.  Now all that work that used to be done in the JSP engine is removed from the application server freeing up cycles for true dynamic service computing.

Sessions/Authentication change in Web Scale applications offering easier scale out.

As customers are starting to implement Mobile solutions, they are finding that the load the mobile solutions drive ends up being "Web Scale" or scalable beyond the traffic generated by browser traffic alone due to the always accessible apps offered to their customers.

In SPECjEnterprise or any full blow JEE application that uses HttpSession, sessions are typically sticky with a load balancer out front redirecting the second and following web request from any client back to the server which last serviced the request based on a http session cookie that identifies the preferred server for following requests.  Additionally, this session data is typically replicated across a cluster in the case the primary server for any user fails, so the user can be redirected to a server that has a copy of the stateful session data.  These architectures simply assume if the session isn't loadable locally or replicated that the user must not be logged in yet.

If one wants to write an application that scales to web scale, this approach isn't sufficient.  You will find most services of such web scale (Amazon S3, Twitter, etc.) force the user to login before accessing any AJAX services.  In doing so they then associate a cookie based token for that browser that acts as an authorization token that each service can double check before allowing access.  They can check this token against a central authority no matter which application server the user comes through.  This allows the infrastructure to stay stateless and scale in ways that the clustered HttpSession/sticky load balancer doesn't allow.

This approach changes the performance profile of each request as it means each service call needs to first authenticate the token before performing the actual service work.  I'm still experimenting with ways of optimizing this (using application centric data grids such as eXtreme Scale can help), but it seems like this trade-off to peak request latency for the benefit of easier horizontal scale out is to be expected in Web Scale architectures.

I think both of these changes are quite interesting when considering the performance of Web 2.0 / Mobile and Web Scale applications and aren't obvious until you start implementing such an architecture.  I think both show simplifying approaches to web development that both embrace web concepts and help performance and scale of your applications.

Have you seen any other changes to your application infrastructure as you have moved to Web 2.0 and Mobile?  If so, feel free to leave a comment.

Thursday, June 14, 2012

WebSphere Liberty Profile (Quick Performance View)

I've had the pleasure of developing a new application on the WebSphere Liberty Profile for the last week.  I wanted to take the time to write a quick summary of my experience.  While there is very cool and interesting blogs proving the point of the small nature of the server documented by running the profile on Raspberry Pi, I wanted to take some time to document my experience developing on the profile to talk in terms that matter to the typical developer.

Some "performance" points:

  1. At one point, I got the server into a non-recoverable state.  Later, Alasdair told me how I could have fixed the issue, but the server creation time made it pretty much a non issue.  I was able to delete the server directory for the server I was working on, run the server creation command (server.bat create server1), re-run my ant script to repopulate the dropins directory with my application and restart the new server in about two minutes.
  2. Speaking of repopulating the dropins directory with my application, the normal reployment of an application is FAST.  My war file is about 5 megs, of which 4.5 is library jar files under WEB-INF\lib.  I typically see a redeployment time of about four seconds.  To make this truly interesting, this is the server started with debugging.  If I'm not debugging, it usually takes around 3 seconds for redeploys.  I pretty much call this instantaneous as it takes just about that long to swap from eclipse ant build back to my browser for testing.  Every once and a while I'll catch the application in deployment getting a 404 error, but a refresh always is all it takes to see the app show up.  I used to think about checking Twitter, Facebook between app deployments but now I can't.  Now I can keep working though basically an immediate compile, deploy, test life cycle.  Darn it, no productivity distractions any more.
  3. I don't restart the server at all during the day, but it's worth stating server start up time.  With this application deployed, it takes about seven seconds to start the server.
  4. I haven't yet tuned the server at all.  I just point the server to my IBM Java for JAVA_HOME and start the server.  That means I'm not tuning the heap or thread pools or anything like that.  Granted, so far, I'm just doing single user testing of my application.  But that said, here is what my process explorer reports about the instance:
    • Virtual memory size = 1.5G, peak private bytes = 144M, peaking working set = 160M
    • I didn't specify any Java command line options, but it looks like the start-up script sets -XX:MaxPermSize=256m
All of these numbers are from my laptop which is a Thinkpad W500 which is a  2 core x 2.53GHz Intel Core Duo, with 4G of memory (2.5 usable as I didn't install 64-bit Windows), with a SSD Disk (btw, get a SSD Disk, spinning disks are horrible!).

All in all, this server is a joy to develop applications on.  Congrats to the WebSphere Liberty team!

Tuesday, June 5, 2012

New Blog

I have reluctantly decided to move my main blogging from the WebSphere Community Blog to here. I have chosen to do so after much thought. Over the last few years, I have spent a fair amount of time encouraging multiple authors to blog on the WebSphere Community blog seeing as the best place to have the most relevant content for WebSphere customers. Even so, I have decided to move my blogging here for the following reasons:

1. I have returned my focus to performance at IBM (I can't seem to get away from performance and I do truly love doing performance related work). While performance is interesting to all WebSphere customers, I doubt that as I start to ramp up performance discussions that all topics will be just WebSphere focused. Therefore, I don't want to put such blog posts on the WebSphere Community Blog.

2. WebSphere is such a broad portfolio these days. Much of the WebSphere Community Blog is focused on what we refer to as the WebSphere Foundation (the application servers, the data/caching grids, etc). In my current role I will be focusing across the entire WebSphere portfolio and beyond.

3. More and more, IBM has started to release solutions that aren't tied to a specific brand (WebSphere). I am working on such products and solutions and again not a great fit for the WebSphere blog.

4. I have changed focus and jobs within IBM four or so times within the last ten years. I would like this blog to stay around forever, regardless of what focus I have.

5. To be brutally honest, one of the best parts of being on the WebSphere Community blog was the traffic. When there is a search for WebSphere blog on Google, the WebSphere Community Blog is the first entry. We see a steady stream of traffic there. I think over time, people reading blogs starting from the top down will change. I know I have a steady stream of followers on Twitter and expect that no matter where my content is, my followers will see it and if its the best content on the topic, specific posts will show in Google searches.

6. All the above points have discouraged me posting in the last year or so. I'm hoping making the move will encourage me to start blogging again.

Let's see how active I become on this blog. I will still try to cross post WebSphere related posts, but for now, lets hope iSpyker.blogspot.com thrives all on it's own.