In my last blog post, I warned that I might be too busy to blog with the new job. I received feedback from many folks that they valued my blog and wanted me to keep blogging in the new job. Ask and you shall receive.
This blog post actually started with a cool story I heard in Gene Kim's talk at Dockercon. He discussed how Facebook used existing user's sessions to send test messages to their new messenger service well before it's actual release. It's well worth listening to that part of the video as it gives you some idea of what I'm about to discuss. Net net, the approach described here is the only way Facebook could have been confident in releasing the new messenger service without risk of performance being a problem on day one.
I had actually heard something somewhat similar when Netflix open sourced Zuul. I heard stories of how you could use Zuul to direct a copy (or in Unix terms "tee") of all requests to a parallel cluster that is running active profiling. The real request goes down the main path to the main production cluster, but at the same time a copy goes down a duplicate path and performance instrumented cluster. The request to the non-critical performance cluster would be asynchronous and the response should be ignored (fire and forget basically). As a performance engineer that has helped many IBM customers I started to salivate about such an approach. I was excited given how this would mean that you could then do performance analysis on services with exact production traffic with very minimal impact on the upstream system.
Many folks I've helped in the past needed such an approach. To see why, let's discuss what happens typically without following this approach. As a good performance engineer I suggest getting in early while code is being developed. What that means is creating load testing scripts with virtual users and running these while the code is being developed, focusing on what function is available at the time. The problem here is rooted in the word "virtual". Not only are the users in this approach artificially modeled, but also the timing between requests, data sent, etc is at best estimated. While this approach works with careful consideration most of the time, many times I've seen a service go into production that fails in performance. Why? Simply put -- bad assumptions in the artificial load testing failed to model something that wasn't obvious about how users were going to interact with the service.
So far, none of this relates to what I've seen since I joined Netflix. Interestingly, in the four days I've worked thus far, I've heard a similar approach for two major projects at Netflix. Two technologies that would have a large impact on Netflix are following the same approach. The idea is to deploy a new version of a technology and make sure initially all requests, while serviced by the older version, get sent in duplicate to the new technology. That way, Netflix can assure the new technology is behaving as expected with exactly the production traffic it will eventually receive. You can use such an approach to do basic functional testing of your new code. You can then do performance testing much like the performance instrumented tee described above. Finally, you can even take this beyond performance testing by doing chaos testing (high availability) and scale up/down (capacity and elastic scaling) testing on the new technology implementation without fear of what it would do to the parallel production path.
I'm not sure if there is already a name for this approach. If you find other descriptions of this formally, let me know. For now, I think I'll call it the "Performance testing via microservices tee" pattern.
Ok. Time to go back to going deep at Netflix.
PS. I have now heard the type of testing called "shadow traffic". Doesn't describe how to achieve it, but still a good word.