It's easy to make mistakes when testing software, ranging from picking the wrong set of measurements, to conducting inadequate tests, to mis-interpreting the results. One of the things that helps me to reduce the mistakes I make is to apply the principles described in a book by Gordon M. Bragg, published in 1974, titled "Principles of Experimentation and Measurement" ISBN 0-13-701169-5 I've bought several copies so far, at about $5 including delivery, from online bookstores.
Tests are similar to experiments - intended to obtain answers to questions where we want the answers sooner than we might otherwise obtain them (e.g. after software has been launched). Gordon's work can help us to create effective tests. For instance, chapter 2 "Defining the Problem" provides several examples of how the definition of the problem (what we want to achieve with our testing) including measuring the height of waves in Lake Huron. "The wave heights in a situation such as this range from less than 1/10th in. to several feet. ... How long a sample is required and under what range of conditions? ... If we consider the reason for the measurements, we can eliminate those which are not required. If the wave heights are required for ship design, then the frequency and magnitude of the largest waves will be important. ... If, now, our purpose is to study the effect of wind on creating possible currents in the lake, we have quite a different situation. In this case the action of the waves on quite small surface waves may be a mechanism for transferring energy from the wind to the water. ... If our purpose is to determine the shape of the waves, we would certainly expect this to vary with wave size. However, here we may need only to measure a few waves in each height range (say 50). The relative frequency may well be unimportant. Each of these cases will require a quite different approach to the basic problem."
By comparison, when we need to 'test' software, there are lots of things we might want to assess, and many possible measurements we could obtain e.g. is the absolute position of an item on screen critical to the behavior of the software? Is the speed of response critical, and if so, does it matter how much the speed varies if the average is within the requirements specified? (and here we could ask what average represents e.g. the mean, the median, and is it even a useful measure?)
Tests are similar to experiments - intended to obtain answers to questions where we want the answers sooner than we might otherwise obtain them (e.g. after software has been launched). Gordon's work can help us to create effective tests. For instance, chapter 2 "Defining the Problem" provides several examples of how the definition of the problem (what we want to achieve with our testing) including measuring the height of waves in Lake Huron. "The wave heights in a situation such as this range from less than 1/10th in. to several feet. ... How long a sample is required and under what range of conditions? ... If we consider the reason for the measurements, we can eliminate those which are not required. If the wave heights are required for ship design, then the frequency and magnitude of the largest waves will be important. ... If, now, our purpose is to study the effect of wind on creating possible currents in the lake, we have quite a different situation. In this case the action of the waves on quite small surface waves may be a mechanism for transferring energy from the wind to the water. ... If our purpose is to determine the shape of the waves, we would certainly expect this to vary with wave size. However, here we may need only to measure a few waves in each height range (say 50). The relative frequency may well be unimportant. Each of these cases will require a quite different approach to the basic problem."
By comparison, when we need to 'test' software, there are lots of things we might want to assess, and many possible measurements we could obtain e.g. is the absolute position of an item on screen critical to the behavior of the software? Is the speed of response critical, and if so, does it matter how much the speed varies if the average is within the requirements specified? (and here we could ask what average represents e.g. the mean, the median, and is it even a useful measure?)