Introduction to Pairwise Testing

Pairwise testing can be quite confusing. Often, introductions of pairwise testing involve symbol-heavy mathematics, Greek letters and a lot of jargon. Pairwise testing also has several alternative names which may or may not have the same meaning. Pairwise testing is applicable in certain stages of the development of software or hardware, but often, discussions of pairwise testing jumps right into more technical aspects of a sub-sub-step of development without relating it to the full picture of the quality assurance of software or hardware.

Before delving into the details of pairwise testing, let us first cover some usual questions that people who are new to pairwise testing usually asks:

What is the difference between pairwise testing and … ?

When reading about pairwise testing, you will probably encounter many of these terms: All-pairs testing, orthogonal array testing, t-wise testing, t-way testing, combinatorial testing, 1-, 2-, 3-, 4-, 5- and 6-wise testing, 1-, 2-, 3-, 4-, 5- and 6-way testing, mixed-strength testing.

Let us look briefly at these different terms.

Pairwise testing can be a more or less thorough kind of testing. The degree of thoroughness is usually given in degrees from 1 to 6 (for reasons that will be covered later!) So, for example, 3-wise (or 3-way) testing is more thorough than 2-wise (or 2-way) testing, which is more thorough than 1-wise (or 1-way) testing. The word pairwise (i.e. pair-wise) is really “2-wise” where the number 2 has been replaced with the word “pair” (for reasons that will also be covered later!). So, in fact, pairwise testing really only refers to the thoroughness of 2. Thus, the terms t-wise (or its equivalent t-way) was introduced as a more general term which does not refer to one particular thoroughness. The letter t is a placeholder for the thoroughness, which is usually 1-6. For example, 3-wise testing is t-wise testing with a thoroughness of 3.

Pairwise testing is usually used as a synonym for t-wise testing, even though t-wise testing is a better term as it clearly refers to any thoroughness. This article follows the custom of using the term pairwise testing to mean any thoroughness.

The term all-pairs testing is simply equivalent to pairwise testing.

Mixed-strength testing is to mix, for example, 2-wise and 3-wise testing in the same round of testing. This will be covered later.

What then about the two terms Orthogonal Array Testing and Combinatorial Testing? Combinatorial testing is a more general term that includes pairwise testing but also other testing techniques. As for Orthogonal Array Testing, for this introduction, it is sufficient to understand that orthogonal array testing is pairwise testing, but not necessarily the other way around.

When should we apply pairwise testing?

Pairwise testing fits nicely into the testing phase of software or hardware development. It works well with agile development practices.

Is pairwise testing completely automatic?

Pairwise testing is usually not fully automatic. Pairwise testing usually requires some manual work to describe the thing that is to be tested. A pairwise testing tool then takes this description (or model) and generates the pairwise tests automatically. These tests usually require some manual work to be set up as automated tests or to be done manually.

Is pairwise testing applicable to … ?

Pairwise testing is probably applicable to testing your software or hardware system, and it will probably exercise the system to such an extent that it will uncover bugs. There are, however, certain things it is less suited for. For example, performance testing and stress testing are testing problems that focus on other aspects than pairwise testing do. Pairwise testing is more focused on putting your system into varied situations, which is simple different than loading the system with work.

Is pairwise testing better than manually making tests?

Having a domain expert make tests for a software or hardware system is valuable and highly recommended. Pairwise testing is not a replacement for this kind of testing but complementary to it. Pairwise testing will put the software or hardware system into varied situations and is sure to explore other paths than those most familiar to a domain expert.

Also, pairwise tests can be a good inspiration for making manual tests. New users of some software or hardware will often interact with it in a different way that those who made it. Pairwise tests acts in a similar way, it explores the different paths of interaction. Can you push this button after pushing that button? If yes, then the pairwise tests will include that case. Can you install this version of a plugin with this version of some other plugin? If yes, a pairwise test will include it.

Does pairwise testing scale?

Many testing techniques work well for testing smaller software or hardware systems, but become too heavy-weight and cumbersome for the larger cases.

From a user’s point of view, pairwise testing is light-weight and simple to apply to even the larger cases.

From the pairwise tool developer’s point of view, however, pairwise testing involves some heavy-weight computations. This was until the last few years a bottleneck in the application of pairwise testing to the larger real-life cases, but in recent years effective algorithms for pairwise testing has been invented and are now available in commercial products.

What skills are needed to do pairwise testing?

From a user’s point of view, pairwise testing is one of the simpler testing techniques to use. It does not require in-depth knowledge of software or hardware development, and the execution of the tests can be done manually. A pairwise test interacts with some software or hardware in the same way a user would: Which buttons to push in what sequence, which modules to use to compose some hardware, which plugins to install, which situation to put some software or hardware in, etc. The different alternatives are given to a pairwise testing tool as parameters with different values. For example, which browser to use to access some website under test is given as:

Browser: Chrome, Firefox, Internet Explorer, Opera

And the operating system to use is given as:

Operating System: Windows, Linux, Mac

And the constraints between the values are given as:

if "Browser" is "Internet Explorer" then "Operating system" is "Windows"

This is the basic complexity level of telling a pairwise testing tool how to make tests for your system. Of course, testing an actual website requires more than just which operating system to use and which browser to use, but if those three lines made sense to you, then you can use a pairwise testing tool to generate tests for a piece of software or hardware you understand.

How effective is pairwise testing at finding bugs?

The effectiveness of pairwise testing is neither hypothetical nor only academic. The effectiveness of pairwise testing has been studied by looking at existing large-scale software or hardware and their histories of problems and bugs. In studies (referenced below), it was studied how bugs — discovered by other means than pairwise testing — could have been found by pairwise testing. As the thoroughness of pairwise testing increases, it has been found that increasing amounts of bugs can be found. I one large study, it was found that for a thoroughness of 1, 2 and 3, the percentage of bugs that might be found was 50, 75 and 90%, respectively. This study found no bugs that required more than a thoroughness of 6, a result that has stood the test of time. This is the reason why a thoroughness of 6 is usually the maximum considered.

As the thoroughness of pairwise testing increases, so does the effort required to do the tests. A thoroughness of 2, i.e. 2-wise or pairwise, has a very good tradeoff between thoroughness and effort. A thoroughness of 3 is a good for more important rounds of testing, for example, for later stages of development.

For testing highly critical software of hardware, pairwise testing is a good start, but other quality assurance techniques should be used in addition.

Telling a Pairwise Testing Tool what to Test

Let us now look at how to do pairwise testing. The first step of pairwise testing is telling a pairwise testing tool what to test. We tell the pairwise testing tool what to test by creating a model of the software or hardware under test.

To understand what is meant by a model, consider this example: When testing the aerodynamics of a car or a plane, a small model of the car or the plane is made. For example like these:

Notice that these models are not actually working cars or working planes. To make these models, the testers have extracted the shape of the car or the plane and left out the other aspects, such as how the engine works. These models can be used to test the aerodynamics of the car or the plane.

Pairwise testing is not a good technique for testing the aerodynamics of a car or a plane. However, it is a good technique for testing, for example, the customization of a car.

Usually when ordering a car you are given a form with a lot of options: Which transmission, which engine, which color, which extras etc. Writing out the options, their alternatives and their relationships is in fact a model of one aspect of the car. This model can be given to a pairwise testing tool and pairwise testing is well suited for testing that aspect of a car.

Based on that model, the pairwise testing tool will generate a set of filled-out forms. If each resulting car is tested, the confidence that any configuration will in fact work is strengthened. It is also likely that doing such testing will uncover a faulty interaction between some two options that a customer might select.

If there are two options that a customer can select and that will produce a problem in the resulting car, this is an example of a fault that will be found by pairwise testing. In fact, pairwise testing ensures that every pair of options a customer can select will occur in at least one of the forms generated.

One might think that this will result in a lot of filled-out forms, but a crucial feature of pairwise testing is that many pairs can be packed into a single configuration. In fact, a single filled-out form can include 25% of all possible pairs. For example, a form with 200 yes-or-no options might result in as little as 30 or 40 filled-out forms for pairwise testing.

(If you want to look at completely worked-out models for pairwise testing right now, check out Example 1 and Example 2.)

Applicability of Pairwise Testing

Pairwise testing is applicable to many kinds of software and hardware testing problems. Let us look at a few cases that should demonstrate the kind of testing problems for which pairwise testing is applicable.

Software and Hardware Product Lines

A product line is some software or hardware that can be customized by the producer to fit a customers need. For example, a car can be configured with different engines, transmissions and extra equipment. A piece of software can be delivered with many kinds of plugins and extensions activated.

Pairwise testing is well-suited for testing that different combinations of customization will work.

Highly Configurable Systems

Software or hardware might also be configured by the customers after it has been delivered. In this case, all parts of the hardware or software must be delivered to the customer for him er her to configure.

Pairwise testing is well-suited for testing that the different configurations the customers chose will work.

Complex and Varied Environments

Software must typically coexist and interact with many other software systems. To add to this complexity, software is usually evolving with new versions being delivered regularly. Some software systems might even run in as many different kinds of environments as there are customers. A website will most likely run in 10 versions of 10 different browsers with a multitude of browser extensions running and the whole package being installed on 5 versions of 8 different operating systems. Some of the browsers might be configured with 200 different settings such as security and privacy settings, window sizes and color configurations.

Pairwise testing is well-suited for testing that software will run in the customer’s software environment.

GUI Work-Flows

Many kinds of software and hardware takes the user through a series of screens that allow the user to select a multitude of options. For example, checkouts of a e-commerce websites, wizards and forms.

Pairwise testing is well-suited for testing that different paths through a work-flow will work.

Functions, methods, classes and modules

Most software is organized internally as functions, methods, classes and modules. They usually exist in complex environments and should be used in a certain sequence.

Pairwise testing is well-suited for testing that functions, methods, classes and modules work in varied environments and that different flows of usage works.

The Pairwise Tests

A test generated by a pairwise tool will typically be a set of values given to all the options or parameters in the model. For example, if we are testing a website and the options are browser and operating system, each test will include one particular browser and one particular operating system; for example, Chrome and Windows, Opera and Linux, Safari and Mac OS.

Notice that these do not include the expected result. A pairwise testing tool does not figure out by itself what the correct result is. There are, however, some general things we can say. For a website, we can say that for each test, the website should look similar to the way it looks on the developer’s machine. Or, we might say that some software or hardware should finish each test without crashing. Or, we might say that a car should complete a certain set of trials with some measurements within certain thresholds.

When we have the pairwise test, we have basically two options: We can either run the tests manually or import them into an automated testing setup.

Manually Running the Tests

If we decide that a tester will do the tests manually, it is helpful to generate a test script for the tester to follow. A test script is a sequence of written steps that a tester can follow.

For example, these are the first three test cases of a pairwise test suite for a website as test scripts:

Test 1.

– Get a Desktop in front of you with Windows.
– In the network options, make sure IPv4 will be used.
– Set network connection to cable and make sure it is connected.
– On the desktop machine, set system language to english.
– Open Chrome.
– Set window size to wide and large.
– Open URL “https://inductive.no/pairwiser”.
– Check that everything looks good.

Test 2.

– Get a Mobile in front of you with Android.
– In the network options, make sure IPv4 will be used.
– Set network connection to wifi and make sure it is connected.
– On the mobile phone, set system language to english.
– Open Chrome.
– Set window size to tall and small.
– Open URL “https://inductive.no/pairwiser”.
– Check that everything looks good.

Test 3.

– Get a Mobile in front of you with iOS.
– In the network options, make sure IPv6 will be used.
– Set network connection to edge and make sure it is connected.
– On the mobile phone, set system language to non-english.
– Open Opera.
– Set window size to wide and small.
– Open URL “http://inductive.no/pairwiser”.
– Check that everything looks good.

These are generated based on the pairwise test cases and a test script template. A test script template has placeholders where the concrete values for each test is entered. For example, the three tests above, the two first lines in each test is generated from these two line of the test script template:

Test |testnr|.

- Get a |Device| in front of you with |if "Device" is "Mobile"||Mobile OS||else||Desktop OS||endif|.

Here we can see a special syntax for including the specific values for each test case. “|testnr|” is replaced with the current test number. “|Device|” is replaced by Mobile or Desktop depending on which value is in the test.

Automatically Running the Tests

If we are running the pairwise tests automatically, we need to import them into a testing tool and set up the expected results.

For example, if we are testing a JavaScript function that classifies triangles, we first need to generate a JavaScript array with the tests. We can use the following test script template to export the tests as a JavaScript array:

[|a|, |b|, |c|, ""],

The three sides of the triangle are a, b and c. This generates the following array:

[5, 5, 5, ""],
[5, 4, 4, ""],
[5, 3, 4, ""],
[5, 4, 6, ""],
[0, 4, 5, ""],
[0, 1, 1, ""],
... etc.

The fourth entry in each element is the expected value. We need to write those in manually and wrap the tests in an array that our testing system can use.

var pairwiseTests = [
    [5, 5, 5, "Equilateral"],   
    [5, 4, 4, "Isosceles"], 
    [5, 4, 3, "Right"],
    [5, 4, 6, "Scalene"],
    [0, 4, 5, "Invalid"],   
    [0, 1, 1, "Degenerate"],
    ... etc.
];

The testing system can, for example, put the three sides as input to the function under test and check the return value against the expected value.

The pairwise tests are now fully automated and can be run again and again.

Degrees of Thoroughness

We have already covered the degree of 2, the classic 2-wise or pair-wise tests. A pairwise test suite with a thoroughness of 2 includes each 2 (or pair) of values in at least one of the test cases.

There are other degrees of thoroughness. Let us look at them in more detail.

Thoroughness of 1

Pairwise testing is a kind of combinatorial testing because it includes all combinations of some number of values in at least one of the tests. If we talk of combining something, we must at least have two things to combine. So, as a kind of combinatorial testing, the lowest thoroughness is 2.

However, we do talk about the thoroughness of 1. This is a special case where each value of each parameter is included once in at least one test. This is a useful first-level of testing, but is not strictly speaking combinatorial testing nor does it test the interaction between the values.

For example, if we are testing a website, the 1-wise testing will include every browser and every operating system in at least one test, but it will not include every pair of operating system and browser.

Thoroughness of 3+

A thoroughness of 3, for example, means that every combination of 3 values occur in at least one of the tests. For example, if we are testing a website, the tests will include every combination of browser, operating system and a third option, for example, whether cookies are enabled or not. For 4-wise and higher, we include a fourth value etc. in at least one test.

For each added degree of thoroughness, the number of tests is roughly multiplied by some factor. If there are 10 2-wise test, then there might be 40 3-wise tests and 160 4-wise test etc.

Mixed thoroughness

When you look at the list of parameters and their values for your system under test, you might conclude that some parameters will not interact with the other parameters to cause any kind of problem or you might conclude that some larger group of parameters will likely interact to cause problems. You can used mixed thoroughness to optimize for these situations.

For example, whether a website is transported with IPv4 or IPv6 will not cause any rendering problems. Still, you do want to test that the website works on both IPv4 and IPv6. Therefore you might set the thoroughness of the parameter IP version to 1 even though the other parameters are set to 2. This means that all pairs of all other parameter values will be covered, but not together with the IP version values. This will decrease the number of tests without causing it to detect less bugs.

Continuing the website example, a similar argument can be used for three other parameters: Rendering the website might not work correctly on Internet Explorer when JavaScript minification is enabled and CSS minification is enabled. The interaction between rendering and the two types of minification is likely to cause a problem, so the thoroughness of these three is set to 3. This will ensure that any combination of these three is included in at least one test case without including all triplets of all other parameters. Thus, we increase the number of tests a little, while increasing the bug-detected capabilities a lot.

In this context we might actually talk about a thoroughness of 0, meaning that any value of a particular parameter will do for a test case and we do not care which.

Partial thoroughness

A pairwise test suite might contain, for example, 30 test cases. The accumulated coverage after each test case has been added will typically form a curve like this:

As we can see in this example, the first 12 tests include more than 95% of the pairs. So, the last 18 tests cover less than 5%.

In ordinary pairwise test generation, the pairs included (pseudo-)randomly, meaning that any faulty pair is most likely to be included early in the test suite. Thus, a partial test suite of, for example, 95% can be expected to detect 95% of the bugs that a complete pairwise test suite would find.