Last update PvD
Testing
Testing takes a lot of time and effort (i.e. is expensive) and is not productive in itself; however it is vital to assure quality. But even elaborate tests can never guarantee the absence of errors. Why ?
As example we take a very simple device, 8 digital inputs, generating a particular output. A full (exhaustive) test requires 256 distinct input values, so 256 tests. That is perfectly do-able.
Now have 2 such devices (or double the device): 16 inputs, requiring over 65,000 tests. That would be difficult to test, but maybe not impossible (e.g. do-able after automation).
Now double the device again to 32 inputs; that would require over 16 billion tests; it doesn't seem possible to test such a device in full.
And what is a system with only 32 binary inputs nowadays; most equipment have at least 10 such devices on each board. And when the inputs are not digital but analogue, or allow a serial stream of bits, or has some internal states (i.e. reflect some history) it gets much and much worse. That is called the combinatorial explosion.
Clearly it is impossible to simply test all possible cases on any system of some size. So what can you do ?
The point is to avoid the combinatorial explosion !
How ? Avoid the 'black box' that a system presents to the outside world: open up the system by splitting it in small testable blocks. For example, instead of the above mentioned system with 32 inputs, test it as 4 blocks with 8 inputs each. That results in 4 x 256 = 1024 tests all together instead of 16 million, i.e. an improvement with a factor 16,000.
Also consider ranges of values. Suppose the system gets some input value with an allowed range from 1 to 100, boundaries included, and the system will check whether the value is in range. Assuming that the system has no exceptional (conditional) processing for any input value in the range 1..100, it makes no sense to test all 100 values; any single value in that range should be sufficient. However, the boundaries must be tested to see if the system handles them correctly (boundaries are known problem cases). So test cases with input values 0 (outside the range), 1, 57 (i.e. some value in range), 100 & 101 should be sufficient. I.e. 5 cases instead of 100 cases. This is an example of 'clever testing'.
In particular for software, it is important to consider testing while still in the design phase. This is called 'Design for Testability'. Among others, this implies that major internal values –in particular states– are made accessible so they can be examined and possibly set for testing purposes. The axiom for developers must be 'How can this be tested'. See Design Recommendations for more info, and the section Operational Tests below.
Related to design, is coding such that the software is reliable and suited for testing (e.g. use of Assert statements; see Assert Statement for more info).
Splitting a system in testable blocks introduces another problem: individual blocks may work correctly, but interworking between blocks has not been tested. And as interworking is not easy, it is likely to be flawed.
Again, avoid the combinatorial explosion by putting all components of the system together in one go (infamous as 'big bang integration'), but do it step-wise.
The issues for step-wise integration are:
It is much more effective to develop a system with all functionality in it, and –in case particular functions are not required (i.e. not paid for)– block undesired functions. This of course requires some extra testing effort, but negligible compared to the alternative. Of course it should not be easy to circumvent the blocking method.
Similarly, if a 'standard' system has to be modified for a customer (i.e. Customer Design Engineering), avoid modifying the generic system but add the required functionality in seperate modules (typically interfaces). This saves enormous amounts of testing effort for future releases.
Tests –and therefore test cases– should be derived from the Requirements Specification. However, that will most likely not produce a sufficient number of test cases to assure real test coverage. So the number of test cases must be expanded significantly. For that often existing input datasets are used, and existing databases of operational systems. That however has proven to be less effective.
Existing datasets are large, but as test cases most are very similar (nearly all 'positive tests'), take a long time to run and analyse, and hardly provide discriminatory power. It is much better to develop a small set of test cases that has more variety in them. You should still run a large existing dataset once or twice, but use it to select the cases that actualy provide added value.
With existing databases there is another problem: databases have a tendency to become inconsistent, and then you are not testing your new system but the database. Use a small database; you might even insert inconsistent data but then you are aware of that and you can use it in a test case to check the system's behaviour.
Positive tests tests are 'normal cases' with values in range, and should result in 'normal' output, whereas Negative tests indicate 'incorrect' data and should result in an exception.
It is best to first run a couple of positive tests (if these fail, who cares about the negative tests), then the negative tests (if these fails, who cares about boundary cases), and then the boundary test cases.
From the discussion above is should be clear that it is impossible to fully test any large system in a lifetime: there are just too many test cases. The simple solution to this is not to use all test cases, but only those that are likely to show problems, or have done so in the past. In particular make sure to include test cases for reported errors; avoid that a problem in release 1, which was solved in release 2, reappears in release 3 — definitely perceived as unprofessional).
Also do simple Risk Management on potential problems in the system: test aspects which are vital to the system, and parts that will be used often. Information on how potential customers will use the system helps a great deal.
It also implies that testing is effective as long as you find errors; if you don't find errors for some time you should change the test methods, or stop testing.
Develop 'test sets' (series of specific tests), starting with:
As you gather experience with the tests and the system under test, you will addapt tests and test sets, and create new ones.
The general rules for 'smart testing' are:
For effective testing it is important to:
The system to be tested is commonly referred to by SUT: System Under Test, or DUT: Device Under Test. To automate testing one needs to provide test input to the DUT, capture the output it produced, and compare that with reference output. Potential other interfaces (e.g. man-machine) may be needed to set parameters, set or inspect internal values or similar.
→ | DUT | → | ||||
↓ ↑ | ||||||
Test System | ||||||
The test system is probably composed of equipment of various kinds to accomodate the various types of interfaces, and will need some controller to synchronise all equipment. The controller is responsible to process the test cases in some form of script.
Depending on the DUT, it might not be easy to completely automate the tests (i.e. without human intervention). In particular in the beginning the test system will need additional refinement (e.g. to accept varying output like date/time-stamps). However, full automation is the ultimate goal –at least for most tests– as this allows the test system to run day and night unsupervised, doing Regression Tests for example.
Test name | Phase | Description |
---|---|---|
Formal | General | Test using a formal Test Specification (or predescribed test script), and commonly resulting in a Test Report. Most tests are formal. |
Positive | General | Test case(s) with correct input, which should result in correct output. Complement of Negative Test. |
Negative | General | Test case(s) with 'incorrect' input, which should result in 'fault handling'. Mostly used during Development and Release phases; complement of Positive testing. |
Black Box | General | Testing a 'box' without considering the box's internals, i.e. purely testing the requirements on input & output. Required for components where one has no access to its internals (e.g. on buy-in components). |
Grey Box | General | A test where the tester has access to the component's design, but not to the actual internals (i.e. he can not inspect or alter internal values). In the Integration Test such is often the case. |
White Box | General | Test where the tester has access to the internals of the box, i.e. he may inspect and set internal values. Preferably all development tests are White Box tests (avoiding the combinatorial explosion), with a notable exception of Qualification Test. |
Glass box | General | Term used for White box testing, but also as a term by a customer to do (formal) quality assurance on the developers' processes (so not only looking at tests but also at procedures). |
Confidence | General | A test to obtain confidence that a system is in working order (i.e. not an in-depth test), or that work on a system is performed adequately. Common for systems/components which have been Qualification tested before, i.e. after Release. |
Verification | General | A Confidence test, typically to to verify that recent work is done correctly on a product/component which was earlier tested in depth. It may test general aspects of a product, though it can be specific for certain aspects. |
Acceptance | General | Used by a customer to accept a product, but unclear about the what, where and how. Could be a Confidence Test (e.g a component Intake Test, a Factory Acceptance Test or a Site Acceptance Test at regular delivery), but also a system's Qualification Test (by the development organisation) or a System Acceptance Test (by the customer) at system Release. |
Dynamic | General | Testing a component in an operational system or simulated (test) environment. However, failures can not always be bestowed on that component; interfacing components can be the culprit. Dynamic testing is the opposite of Static testing. |
Static | General | Also known as Reviewing, i.e. testing the design and/or implementation through inspection/reviews/audits (manually, potentially doing a simulation). For software development also called Code reviews or Walk-throughs. The main advantage of Static testing is that it can be executed without any code or platform, i.e. in the early stages when error correction is cheap. It is the opposite of Dynamic testing. |
System | General | General term for a test of the whole system (opposite of Unit/Module test). |
Functional | General | Testing the functional aspects of the requirements. |
Alpha | General | Operational test in either a simulated or actual environment, executed by the (potential) customer or by an independent test team at the developers' site. Less formal test (usually no test cases derived from requirements specs, but confrontation with the 'real world'). Commonly as a first step for software before it is released in limited numbers for Beta Testing. |
Beta | General | The Beta version of the software –corrected for the problems found in the Alpha Test– is released to a limited number of customers/users for testing in their respective operational environments (i.e. not a Formal test). |
Module | Development | or Unit Test: testing of a single component/subsystem/module/unit on its own (usually software). |
Unit | Development | or Module Test: testing of a single component (i.e. subsystem, module or something similar) on its own. |
Coverage | Development | Usually done at Module testing to verify that all parts of the software have been covered in a test (all branches executed). |
Exhaustive | Development | Test all possible cases (i.e. all possible input values & internal states). Except for very limited subsystems, the number of all possible values & states will be too large to execute all potential cases. |
Integration | Development | Basically testing the interworking between components (which should have been tested already individually through a Module Test). It is not just a pass or fail test on the interworking between components (modules), but preferrably also a Diagnostic Test on the cause and location of the problem when it fails (which can be a considerable effort). |
Regression | Development | Redoing old tests as something has changed on one or more of the components. Can be during a Module test, but typically done at Integration Testing after debugging software or testing a new version. Automation of such testing is highly desirable. |
Factory Acceptance Test (FAT) | Release | Acceptance Test of the product at the manufacturer's site (i.e. before delivery & integration). The term is sometimes also used for a quality assurance audit of the manufacturer's processes (i.e. to accept the 'factory'). |
Qualification | Release | An in-depth test to verify whether the system under test conforms to the corresponding Requirements Specification. This can be done by the development organisation or independant organisation to release the system for general sales; when executed by a customer to formally accept the system it is commonly called System Acceptance test. For systems which pose a risk (health, safety, environmental) the authorities may prescribe full verification of critical aspects. |
System Acceptance | Release | An in-depth test (Qualification by the customer to verify whether the system under test conforms to the corresponding Requirements Specification. |
Certification | Release | An in-depth test to verify whether the system conforms to the rules & laws from authorities (typical for systems which pose a risk regarding health, safety and/or environment). Typically, it is a non-fumctional test. |
Validation | Release | Formally a Qualification Test, however often used ambiguously. |
Performance | Release | Basically testing the non-functional requirements: a Stress Test to show the system's performance under particular circumstances (e.g. high load, or abnormal conditions). On systems built to cope with such circumstances it is very hard to create sufficient stress; therefore often executed with an impaired system. |
Load | Release | A Stress Test to show the system's performance under heavy load (i.e. a Performance test). On systems built to cope with such a load, it is very hard to generate sufficient load. Therefore often executed with an impaired system. |
Stress | Release | Any test to put the system under stress, i.e. extreme conditions. For example extreme environmental conditions (temperature, humidity, …), failure of (multiple) components, a Load test, etc. Stress, Load and Performance testing are commonly used as synoniems. |
Intake | Delivery | Simple Confidence test to verify that the product/component is in order (e.g. a buy-in component acceptable for assembly, or ready for test and not something taken from incomplete development). |
Site Acceptance Test (SAT) | Delivery | Acceptance test (Confidence test) of the system at the actual site of the customer (i.e. after delivery & installation). Could be before or after integration with other equipment. |
Commissioning | Delivery | Thorough test of correct installation and correct operation of a system. After Commissioning a system is supposed to be operational (i.e. commissioning commonly involves more than testing, e.g. also migration & start-up). Except for the initial delivery, Commissioning is not a Qualification test, but more like a thorough Confidence test; however, for systems which pose a risk (health, safety, environmental) the authorities may prescribe full verification of critical aspects for each delivery. |
Sanity | Operational | A Confidence test to check whether the system is operational. Typically run at boot or start-up, or after maintenance (repair or replacement). |
Diagnostic | Operational | An in-depth test to locate a detected/suspected problem in a subsystem (i.e. it should provide clues to what exactly may be failing). Also used to verify the correct operation of a subsystem (when diagostic tests don't reveal anything, it rules out all known problems). The component under test is commonly temporarily not available for normal operation. |
Maintenance | Operational | Test which verifies the sanity of an operational (sub)system (commonly run while the system remains in operation). |
Commonly the following tests can be identified during a system's life cycle:
during Development:
during system Release or system selection/acceptance:
during Manufacturing and Delivery:
during Operational life:
For (1) a Debugger is a useful tool.
The tests for (4) & (7/8), and the tests for (5), (6) & (7/8/9) have considerable overlap (i.e. many test will be similar, so can be 'borrowed').
Note that most tests must be developed for each system.
Operational tests (confidence and diagnostic tests) are part of the product,
so should be in the Requirents Specification of the product (but are often forgotten).
It makes a lot of sense to develop these test during development (immediately after developping the corresponding module) according to and stressing the principle Design for Test.
See also Quality Management Plan.
A Test plan describes how you are going to do all tests, and what facilities and mechanisms you plan to use (e.g. test set-up) to develop a good product. Implicitely it describes the test strategy. It consists of:
Preferably, all components to be developed are known, so the test effort can be estimated.
The Test Requirements describes for each (major) type of components what (aspects) and how it is going to be tested. It is a detailed application of the Test Plan for a particular type of component. If some components are very similar, they can have the same Test Requirements document. When there are a great many tests, one may decide to plan only a subset; when too many tests fail during execution, one should expand the test subset (levels of success/fail rates should be defined).
The Test Specification describes each tests in detail. Because of that it will be a document of considerable size, or consist of several documents. Not all test must be performed (depending on the rules specified in the Test Requirements and Test Plan).
For each individual test it describes:
The result of the actual test, i.e. the execution of the Test Specification, results in the Test Report.
That is straightforward when all tests are basically succesfull. But they usually aren't (otherwise one may also have doubts on the thoroughness of the tests).
It depends on the type of test (in the system's life cycle) what to do with failed tests. When it is possible to locate the fault and correct it (e.g. by using more diagnostic tests), one should. But during Qualification or Acceptance testing a failed test is not acceptable (it would require to start all over after correction of the fault). And sometimes it is not possible to correct the problem (i.e. the product is rejected or gets a waiver).
=O=