@macshonle@c.cim - Mastadon

08 January 2011

How do you know if your software tests are any good?

Project management by check boxes gives you a nice, but false, sense of security that everything is going smoothly. Although three decades have passed since Glenford Myers wrote the classic The Art of Software Testing many practitioner’s approach to testing is to simply bang out some buzzwords and be done with it.

You can say that you've passed 100% of your unit tests, but that isn't meaningful if most of the tests are trivial or repetitive with each other. You might’ve achieved 95% code coverage, but that won’t matter if important edge cases haven’t been covered. So, how do you know if your tests are any good? If the purpose of testing is to find bugs, then your tests aren’t good unless they’ve found bugs. If a test does not find a bug, it fails as a test.

While that’s simple to state, it can still be daunting if you’re not familiar with testing. There are three main techniques you can use to improve your test design: (1) whitebox techniques; (2) blackbox techniques; and (3) mutation testing.

Whitebox techniques are used with specific source code in mind. One important aspect of whitebox testing is code coverage. E.g.,:
  • Is every function called? [Functional coverage]
  • Is every statement executed? [Statement coverage-- Both functional coverage and statement coverage are very basic, but better than nothing]
  • For every decision (e.g., if, while, ...), do you have a test that forces it to be true, and other that forces it to be false? [Decision coverage]
  • For every condition that is a conjunction (uses &&) or disjunction (uses ||), does each subexpression have a test where it is true/false? [Condition coverage]
  • Loop coverage: Do you have a test that forces 0 iterations, 1 iteration, 2 iterations?
  • Is each break from a loop covered?
Blackbox techniques are used with specific requirements in mind. Blackbox testing follows the principle that a test should not test a single program, but the full class of possible programs. The following blackbox techniques can lead to high-quality tests:
  • Do your blackbox tests cover multiple testing goals? You want your tests to be “fat”: Not only do they test feature X, but they also test Y and Z. The interaction of different features is a great way to find bugs.
  • But you don't want fat tests when you are testing an error condition, such as invalid user input. If you tried to achieve multiple invalid input testing goals (for example, a test to cover an invalid zip code and an invalid street address) it’s likely that one would just mask the other.
  • Consider the input types and form an equivalence class for the types of inputs. For example, if your code tests to see if a triangle is equilateral, the test that uses a triangle with sides (1, 1, 1) will probably find the same kinds of errors that the test data (2, 2, 2) and (3, 3, 3) will find. It’s better to spend your time thinking of other classes of input. For example, if your program handles taxes, you'll want a test for each tax bracket. [This is called equivalence partitioning.]
  • Special cases are often associated with defects. Your test data should also have boundary values, such as those on, above, or below the edges of an equivalence task. For example, in testing a sorting algorithm, you’ll want to test with an empty array, a single element array, an array with two elements, and then a very large array. You should consider boundary cases not just for input, but for output as well. [This is call boundary-value analysis.]
  • Another technique is error guessing. Do you have the feeling if you try some special combination that you can get your program to break? Then just try it! Remember: Your goal is to find bugs, not to “confirm” that the program is valid. Some people have the knack for error guessing.
Finally, suppose you already have lots of nice tests for whitebox coverage, and applied blackbox techniques. What else can you do? It’s time to test your tests. One technique you can use is mutation testing. Under mutation testing, you make a modification to (a copy of) your program, in the hopes of creating a bug. A mutation might be:
Change a reference of one variable to another variable; Insert the abs() function; Change less-than to greater-than; Delete a statement; Replace a variable with a constant; Delete an overriding method; Delete a reference to a super method; Change argument order.
Create several dozen mutants, in various places in your program [the program will still need to compile in order to test]. If your tests do not find these bugs, then you know you need to write a test that can find the bug in the mutated version of your program. Once a test finds the bug, you have killed the mutant and can try another.

Testing is complete when you have stopped finding bugs. Or, more practically, when the rate at which you find new bugs slows down and you see diminishing returns.

Bugs tend to “cluster” in certain modules and features: The moment you find a bug in one, you know that you should look in it further for more bugs. (For example, why does Apple keep on having troubles with the iPhone alarm? It’s a perfect candidate for increased testing efforts.) To find bugs, you can use the techniques of blackbox testing, whitebox testing, and mutation testing. As long as you are finding bugs, you know that your testing process is working!


This post is a revision to two of my answers on the Programmers StackExchange.