It’s rare these days to build a web application without some reliance on a third party hosted tool, service or API. This can have a profound affect on the reliability of your application, because no matter how stable and robust your own code and hosting environment is, it is relying on ‘black box’ services. The reliability of these may be good, but you have no way of knowing whether that’s good design and planning, or simply luck.
Say your application A is loading data from web service B, authenticating users with web service C, and integrating advertising from ad network D. In a random month, B has 13 minutes of downtime, C has an hour, and D has 42 minutes. Assuming your own app doesn’t suffer any internal downtime, and that these periods of downtime don’t overlap, your own app is going to have, in total, 1 hour 55 minutes of downtime, more than any of the individual misbehaving services.
In fact, recently when I was building a site that integrated ads using OpenX hosted, they had some problems and the forums started filling up with panic stricken messages such as
None of my pages are loading! When will OpenX fix this?
One reason that third party APIs often cause problems such as this, is that when you build your website, the APIs are typically working properly, and you simply don’t include outages in the APIs in your test scenarios. So you unwittingly introduce a dependency on the API that you might, if you thought about it, be able to avoid.
I’ve created a tool on Google App Engine that I hope might help. It allows you to simulate a range of misbehaving API scenarios, and test the ability of your application to be resilient to these.
For example:
http://badapi.trib.tv/req?wait=20
Will take 20 seconds to return output. Any integer between 1 and 30 will work.
http://badapi.trib.tv/req?resp=500
Will return a 500 Internal Server Error (any valid HTTP response code will work). If no number is given, returns a 200.
http://badapi.trib.tv/req?op=json1
Returns standard test JSON output number 1 (see index page for full list of test outputs, and ability to define your own). If not specified, returns the string ‘OK’.
You can combine these directives to produce a more tailored response, eg:
http://badapi.trib.tv/req?resp=500&wait=2&op=json1&ct=js&cs=utf
This will wait 2 seconds, then give you a 500 Internal Server Error response containing standard JSON output number 1, served as text/javascript and advertised as UTF-8 encoding.
I hope others also find this useful. Let me know if you have any suggestions.