Monday, February 13, 2012

On the Use, and Misuse, of Software Test Metrics


“You will manage what you measure” – Fredrick W. Taylor
Testing verifies that a thing conforms to its requirements. A metric is a measurement, a quantitative valuation. So a test metric is a measurement that helps show how well a thing aligns with its requirements.

Consider a test you get in school. The goal of the test is to show that you understand a topic, by asking questions of you about the topic. Depending on the subject, the questions may be specific, fact-based (When did the USSR launch Sputnik?); they may be logic-based (Sputnik orbits the earth every 90 minutes, at an altitude of 250 km. How fast is it moving?); or they may be interpretative (Why did the Soviet Union launch the Sputnik satellite?)

Or they can be just evil: write an essay about Sputnik. Whoever provides the longest answer will pass the test.

Note that by asking similar questions we learn about the student's capabilities in different dimensions. So when a piece of software shows up, the purpose of testing should not be to find out what it does (a never-ending quest) but to find out if it does what it is supposed to do (conformance to requirements). The requirements may be about specific functions (Does the program correctly calculate the amount of interest on this loan?); about operational characteristics (Does the program support 10,000 concurrent users submitting transactions at an average rate of one every three minutes, while providing response times under 1.5 sec for 95 percent of those users as measured at the network port?); or about infrastructural characteristics (Does the program support any W3C-compliant browser?)

These metrics follow from the program's intended use. Management may use other metrics to evaluate the staff: How many bugs did we find? Who found the most? How much time does it take, on average, to find a bug? How long does it take to fix one? Who created the most bugs?

The problem with these metrics is they generally misinform managers, and lead to perverse behaviors. If I am rated on the number of bugs I write, then I have a reason to write as little code as possible, and stay away from the hard stuff entirely. If I am rated on the number of bugs I find, then I am going to discourage innovations that would improve the quality of new products. So management must focus on those metrics that will meet the wider goal - produce high quality, low defect code, on time.

Software testing takes a lot of thinking: serious, hard, detailed, clear, patient, logical reasoning. Metrics are not testing - they are a side effect, and they can have unintended consequences if used unwisely. Taylor advised care when picking any metric. Often misquoted as "you can't manage what you do not measure," Taylor's intent was to warn us. Lord Kelvin said "You cannot calculate what you do not measure" but he was talking about chemistry, not management. Choose your metrics with care. 

No comments: