For love of all things QA, before you launch a new application, test production!
“What? That’s stupid! Why would I want to perform a load test production and risk an outage? That impacts my SLAs. I can’t impact my SLAs!”
Remember the number one rule of quality control: if you don’t find it, your customers will.
When you are about to launch a brand new application into your production environment, test that application against production. However, this only applies for new applications. New applications will introduce new, additional load on the environment, while existing, revised applications already have added that load to the system. Essentially, with an existing application, you already know how well the production environment can handle the additional demand generated by the application’s audience. New applications have not yet generated that load, and production has yet to prove itself.
There is no hard evidence that production can take the additional demand. Maybe your production load balancer can only handle another 5 MB/s, and your new application will demand another 7. Perhaps it is one of the switches, instead. Or for my recent life, maybe it is your ISP. You will not know until you test it, until you measure it, and “if you didn’t measure it, you didn’t do it.”
With a past project, my company created an intranet application for our client, and our application just happened to be hosted off-site. The off-site environment was green, and wasn’t hosting anything else, so our client had no issue with us testing this environment fully since it was going to be production, but wasn’t yet. The hosting company and their ISP rated the environment at 45 Mbps (That’s megabits–lower-case ‘b’), and based on the clients traffic expectations, we needed about 30. It is a good thing we tested the site because we found an issue with the load balancer at about 15 Mbps, a problem with server memory when it was processing enough transactions to produce 20 Mbps, a problem with the database switches when we were generating 22 Mbps, and–this one is the kicker–a bandwidth ceiling at 28. Though all of the routers, switches, balancers, and servers were performing well, we couldn’t get more than 28 Mbps to the web servers. It turns out that the ISP didn’t ever expect anyone to use that 45 Mbps rating, and never tested to make sure they could handle it.
“If you didn’t measure it, you didn’t do it.”
Through two months of midnight through 0600 testing, we upgraded the load balancer, added more memory, put in gigabit switches, had the ISP tweak their infrastructure, pushed through all of the data we needed, and successfully proved that the off-site environment and our new application could handle the load. But, the environment still wasn’t fully tested. Our client used everyone’s favorite single-signon, SiteMinder. However, they wouldn’t let us test the application while integrating their productional SiteMinder policy servers. We could only use staging, and when the staging servers couldn’t handle the load, “that’s okay because it’s staging.” But no matter how much we advocated, we couldn’t test production. We might impact the environment and the SLAs. So, we launched without testing it, and guess what happened? The policy servers failed, and they severely impacted their SLAs.
And to think, we could have tested that at 1:00 AM on a Saturday, and they even if we fried the policy servers, they would have had all weekend to fix it. And most importantly, we would have identified it before the end-user did. But what really cooked their goose was the difference between productional load and performance testing load: performance tests can be stopped. It is a lot harder to fix a jet engine at 30,000 ft.
The moral of the story: when launching a new application, always test production. Always.
Remember Me