Testing, failures and inability to duplicate.
Testing and Repeat-ability
Repeat-ability of testing results is important to establishing cause and corrective actions. If it is not possible to repeat the sequence of events leading to a failure, it is not possible to replicate and therefore difficult solve the cause of the fault or failure. The steps that evoked the problem are necessary to replicating the problem – only sometimes that seemingly does not work.
Testing and Configuration Management
Things like a specific system configuration may be implicated in the failure, testing to find a failure on one configuration after having found the problem on another configuration, does not mean the failure does not exist. This is another link between testing and configuration management.
Repeating the test with some fidelity requires documenting the test steps with some diligence as well as trace-ability of the test specimen, all parts are not created equal. When it comes to software or embedded products the latter is very true as it is impossible to look at compiled code or a microchip and understand what level of software is contained within the product. All of this traces back to configuration management.
Exploratory Testing
The test cases can be another story. It is a little difficult with exploratory testing, or when things just happen to go wrong while manipulating (playing with) the application. Often in these situations we have not created a test case with the steps prior to execution, rather we are spelunking the product. Exactly repeating the steps including timing of and between steps. These factors can be important contributors, sometimes timing is an ingredient to fault replication. Thus reproducing the failure by recall is difficult. The ability to record the steps can help with that even for exploratory testing. Generally the longer the test sequence, the more difficult to recall the sequence though well documented test cases can help here.
Initialization and Variables have Testing Impact
Perhaps the bug that was momentarily demonstrated is actually dependent upon the prior test case execution, or the original state of the product (memory location). In embedded products for example, we may not have defined the starting point for the contents of the accessed memory locations, but later use that memory content. In this case the phenomenon will occur once and will seem to not be repeatable. However, this would perhaps be repeatable on other instances of the product the first time. Once we find this undefined memory location and define it in the software (flag register or semaphore) the first time, we can cure the fault. For the right type of software product an ICE (In Circuit Emulator) or simulator may be helpful, as these tools allow you to set break points and peek into memory locations.
Testing and Risk
Reporting based upon level of risk is a great idea, though I am not sure if we should neglect to report this fault just because we cannot replicate and it appears to be a low risk failure impact. I think it may be more prudent to report the failure so there is a record of the event occurring. If the symptom occurs in the field, we will then not be in a position to say “we have never seen that failure”. Maybe we can learn something more from the field reports that will help us replicate and fix if necessary.
Summary
There are many reasons why a failure may not be immediately reproducible. That does not mean the failure or fault does not exist.
- Record the steps via test cases
- Record (best you can) the impromptu or exploratory testing
- Check the configuration of the product against intended or expected
- Consider variables in the software associated with the feature
- Consider there could be a timing element associated that is unpredictable (or a background feature timing / collision)
- Log / report the failure in the fault tracking tool along with as much information as possible (configuration, test case, sequence prior, etc)