Friday, October 17, 2014

Kafka Part 1

Just a blog post to document some things I learned as I started to play with Kafka.  Hopefully others are doing similar work and can google-able benefit ...

I wanted trying to repeat (as a starting point) the performance blog post on Kafka.  Here are the issues I ran into and how I fixed them.

I first started to play with Kafka on a simple instance and choose m3.medium.  While this gave me a place to play with Kafka functionally if I was going to push 75M/sec to the Kafka logs, given the on-instance storage is 4GB, I was going to run out of storage in 4*1024/75 = 55 seconds.  I decided to move up to i2.xlarge which gives me 800G which is 800*1024/75 = 11000 seconds = 3 hours.

After I got basic functional testing, I started to look for the performance testing tools.  The fore-mentioned blog shows a new performance test client available in > 0.8.1 Kafka.  Unfortunately 0.8.2 isn't available pre-compiled yet and the performance client uses the new Java client.  So off I went to compile Kafka from source (using trunk).

Kafka builds using gradle -- something I'm used to.  Weirdly, Kafka doesn't have the gradle wrapper checked in.  This means you have to pay attention to the readme which says clearly you need to install gradle separately and run gradle once to get the gradle wrapper.  Unfortunately, I tried to "fix" the lack of a wrapper without reading the docs (doh!).  I just added another version of gradle wrapper to kafka/gradle/wrapper.  This let me build, but I immediately, got the error:

Execution failed for task ':core:compileScala'.
> com.typesafe.zinc.Setup.create(Lcom/typesafe/zinc/ScalaLocation;Lcom/typesafe/zinc/SbtJars;Ljava/io/File;)Lcom/typesafe/zinc/Setup;

Running gradle with --info showed me that there was a class mismatch error launching the zinc plugin.  I was able to get rid of the error by changing the dependency in build.gradle for the latest zinc compiler.  Doing this made me think if there was a dependency issue between gradle and the zinc plugin.  Once I realized this, I re-read the readme where it asks you to install gradle first and then run gradle once to get the correct wrapper.  Oops.  After following the directions (TM), trunk compiles worked.

I then did some basic single host testing and I was up and running on Kafka trunk.  I then tried to create multiple servers and separated the client from the server and created replicated topics.  This fails horribly if you have issues with hostnames.  Specifically, you will see upon creating a replicated topic the following error hundreds of times before a server crashes and then the next will do the same thing.

[ReplicaFetcherThread-0-0], Error in fetch Name: FetchRequest; Version: 0; CorrelationId: 4; ClientId: ReplicaFetcherThread-0-0; ReplicaId: 2; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: [my-replicated-topic,0] -> PartitionFetchInfo(0,1048576). Possible cause: java.nio.channels.ClosedChannelException (kafka.server.ReplicaFetcherThread)

Upon looking at the more detailed logs, I noticed it was trying to connect to the other servers by the hostname they knew themselves (vs. DNS).  I'm guessing that the server upon startup does a inetaddress.getlocalhost().gethostname() and then registers that name in Zookeeper.  When the other servers then try to connect unless that hostname is resolvable (mine wasn't), it just won't work.  You'll see similar issues with the client to server connection saying certain servers aren't resolvable.  For now, I have solved this by adding the hostname mapping to ip address in all clients and servers /etc/hosts.  I'll certainly need to see how this can be fixed automatically without the need for DNS (hopefully leveraging Eureka discovery eventually).

The fore-mentioned blog says there was no tuning done addition to "out of the box", but upon inspection of the, it seems like there are either changes that were trivial or have changed in trunk since blog was written.  Something to come back to eventually, but for not I decided to just use the out-of-box so I could get a baseline and then start tweaking.

I was then able to spin up five c3.xlarge instances as clients which should have enough CPU to push the servers.

I was able to get the following results:

Single producer thread, no replication - 657431 records/sec @ 62.70 MB/sec.

This is 80% of the original blog.  I'm guessing I'll be able to tune a bit to get there as I've seen initial peaks hit 710K.  First performance results are always low or wrong.

Then I see Three producer threads, no replication:

client 1 - 679412 records/sec @ 64.79 MB/sec
client 2 - 678398 records/sec @ 64.70 MB/sec
client 3 - 678150 records/sec @ 64.67 MB/sec
total ~ 2035000 records/sec @ 94MB/sec

This is about 3X the previous result as expected.  I also did vmstat and netstat -a -n |grep 9092 to confirm what was going on to help me understand the clustering technology.

At five clients, things dropped off in an unexpected way.  I've hit some bottleneck.  I basically get lower results per client that in total add up to the 3 node result.

I have also played with replicated topics, but won't post those numbers yet.

Things I'm thinking I need to investigate (in order of priority):

1.  Networking speed.  It looks like I am limited by network (or disk, but doubt it) at this point.  I am new to Amazon, so when the table says moderate network performance, I'm not sure how much that means.  I'm more used to 100M or 1G, etc.

2.  Enhanced networking performance.  While I launched HVM images, I'm not sure yet in the base image I used has the additional HVM drivers.

3.  Tuning of the settings for Kafka.

4.  Java tuning.

5.  Different instance types.  HS1 would give me a way to split the traffic between volumes.

6.  Kafka code.

All this said, I need to also play with message sizes (larger and mixed) and look at some of the expected application access patterns - expanding into consumers as well.

More in part 2 coming soon.

Sunday, October 12, 2014

Performance Testing via "Microservice Tee" Pattern

In my last blog post, I warned that I might be too busy to blog with the new job.  I received feedback from many folks that they valued my blog and wanted me to keep blogging in the new job.  Ask and you shall receive.

This blog post actually started with a cool story I heard in Gene Kim's talk at Dockercon.  He discussed how Facebook used existing user's sessions to send test messages to their new messenger service well before it's actual release.  It's well worth listening to that part of the video as it gives you some idea of what I'm about to discuss.  Net net, the approach described here is the only way Facebook could have been confident in releasing the new messenger service without risk of performance being a problem on day one.

I had actually heard something somewhat similar when Netflix open sourced Zuul.  I heard stories of how you could use Zuul to direct a copy (or in Unix terms "tee") of all requests to a parallel cluster that is running active profiling.  The real request goes down the main path to the main production cluster, but at the same time a copy goes down a duplicate path and performance instrumented cluster.  The request to the non-critical performance cluster would be asynchronous and the response should be ignored (fire and forget basically).  As a performance engineer that has helped many IBM customers I started to salivate about such an approach.  I was excited given how this would mean that you could then do performance analysis on services with exact production traffic with very minimal impact on the upstream system.

Many folks I've helped in the past needed such an approach.  To see why, let's discuss what happens typically without following this approach.  As a good performance engineer I suggest getting in early while code is being developed.  What that means is creating load testing scripts with virtual users and running these while the code is being developed, focusing on what function is available at the time.  The problem here is rooted in the word "virtual".  Not only are the users in this approach artificially modeled, but also the timing between requests, data sent, etc is at best estimated.  While this approach works with careful consideration most of the time, many times I've seen a service go into production that fails in performance.  Why?  Simply put -- bad assumptions in the artificial load testing failed to model something that wasn't obvious about how users were going to interact with the service.

So far, none of this relates to what I've seen since I joined Netflix.  Interestingly, in the four days I've worked thus far, I've heard a similar approach for two major projects at Netflix.  Two technologies that would have a large impact on Netflix are following the same approach.  The idea is to deploy a new version of a technology and make sure initially all requests, while serviced by the older version, get sent in duplicate to the new technology.  That way, Netflix can assure the new technology is behaving as expected with exactly the production traffic it will eventually receive.  You can use such an approach to do basic functional testing of your new code.  You can then do performance testing much like the performance instrumented tee described above.  Finally, you can even take this beyond performance testing by doing chaos testing (high availability) and scale up/down (capacity and elastic scaling) testing on the new technology implementation without fear of what it would do to the parallel production path.

I'm not sure if there is already a name for this approach.  If you find other descriptions of this formally, let me know.  For now, I think I'll call it the "Performance testing via microservices tee" pattern.

Ok.  Time to go back to going deep at Netflix.

PS.  I have now heard the type of testing called "shadow traffic".  Doesn't describe how to achieve it, but still a good word.

Monday, October 6, 2014

Red Is the New Blue

I am excited to announce that I will be starting a new step in my career tomorrow.  After over 15 years with IBM, I will be joining Netflix (hence the Orange is the New Black homage in the title) starting Tuesday.

In my time at IBM I've had the honor to work with incredible colleagues, on amazing projects, and through exciting times.  The work on IBM Host On-Demand (the first commercial Java product in the world), IBM WebSphere Application Server function and performance (starting with V4), WebSphere's SOA (WS-*, Business Process Management and Enterprise Service Bus) and IBM's XML (XSLT2/XQuery) technology over the years has been challenging, interesting, and most importantly fun.  More recently in the time spent in the IBM's Emerging Technology Institute I was able to work on performance and then scale and then cloud and then on the creation of the Cloud Services Fabric (CSF).  This focus on cloud was very exciting to be a part of and I believe the technology I was a part of will continue at IBM for some time, being part of what helps IBM public cloud services have rock solid operational capabilities empowering development to deliver function quickly to customers with the insight needed to adjust quickly to opportunities that present themselves.  I'd like to personally thank all of those people who have worked beside me over the years.  Unfortunately with the length of my career at IBM, I cannot thank each individual person here.  Just know that every team I worked on is near and dear to my heart.

This said, it was time in my career to make a change.  I will be joining the cloud platform engineering team at Netflix is Los Gatos, California.   I will further explain the role as appropriate in the near future. I have had the opportunity, through NetflixOSS, to work with many of my soon to be colleagues already.  The technology, teams, and challenges ahead of me are all very exciting.  Below are some points of comparison that all factored into my decision to make a change in my career:
  1. Big company to less big company
    • Except for a short stints in college and in the Raleigh/Durham startup scene in 2000, I have worked for one of the largest companies in the world for the rest of my career.  It is amazing how many people I know across the world and across the various business units of IBM.  The scope of projects that can be done with such resources is great.  I hope to replace the same scope at Netflix with the intense technology culture and collaboration with similar companies in the valley.
    • Some stats for the compare:  Employees (~400,000 IBM, ~ 2,000 Netflix), Market cap ($189B IBM, $27B Netflix), Countries served (~170 IBM, ~50 Netflix - and growing every day)
  2. Enterprise to consumer
    • Enterprise is a challenging market to succeed in.  The expectations are high and the legacy is strong.  Keeping the world's governments, banks, insurance companies, hospitals, and manufacturers running is no easy task.  I am proud to have been a part of powering many of those critical businesses.
    • In moving to the consumer side, I expect to learn a new view on computing which was tough to focus on in the enterprise market.  I think the size and scale of the Netflix platform and service will be a huge change and learning experience for me personally.  I doubt Netflix is stopping to 50 million subscribers and it is clear that they will continue to help drive Internet bandwidth policy discussions across the globe.
    • I believe technology to power the consumer offerings is contributing significantly to technology innovation.  You don't have to look far beyond the cloud vendors to see the consumer companies and system of engagement applications that drove the initial creation of cloud.  The same holds true for mobile.  Other aspects of technology like NoSQL, containers, and large scale distributed computing also were born out of the necessity to implement solutions to consumer company problems only due to their mobile scale and always on requirements.
  3. Research Triangle Park (North Carolina) to Bay Area (California)
    • This was not only the hardest aspect of my decision to make a change, but was a key motivating factor to make a change.
    • In my career it is easy for me to make personal decisions.  The decision to move to the Bay area seems easy to someone that wants to create technology.  In the past, I have considered areas like Boston, New York City, or Texas.  All of these areas create more innovative technology than Research Triangle Park these days.  In the last year or so, I have witnessed the amazing collaboration that exists between companies in the Bay Area.  I believe the collaboration possible in the Bay area exceeds that of other the tech areas.  In the end, I came to the decision that there is no better place in my personal career in technology.
    • In my personal and family life moving from the east to the west coast is a huge change.  Our family is well established on the Atlantic coast.  We have amazing friends and neighbors in the Raleigh/Durham area.  Without going into personal details, I do think while the change will be hard, there are still very good reasons for a family to settle in the Bay area.  With kids that are interested in technology I know of Bay area opportunities to share in learning with them.  Also, I hope the access to great California outdoor space with moderate temperatures will be a reason to spend more time with the family enjoying the surroundings.
So today I am sitting on a plane with the first one way ticket I've had in a long time.  I've got my new Netflix red shoes on (see below) ready to take a first step on a huge new journey.  I'm sure I'll be swamped for the first month or so, so don't expect too many blog updates.  Once I get my feet firmly back on the ground, I'll let you know more about what I am doing.

Thanks!  If you get to the San Jose area, look me up!