Introduction to Model-Based Testing

Software Testing

Software quality is a matter of increasing importance and growing concern. Systematic testing plays an important role in the quest for improved quality and reliability of software systems. Software testing, however, is often an error-prone, expensive, and time-consuming process. Estimates are that testing consumes 30-50% of the total software development costs. The tendency is that the effort spent on testing is still increasing due to the continuing quest for better software quality, and the ever growing size and complexity of systems. The situation is aggravated by the fact that the complexity of testing tends to grow faster than the complexity of the systems being tested, in the worst case even exponentially. Whereas development and construction methods for software allow the building of ever larger and more complex systems, there is a real danger that testing methods cannot keep pace with these construction and development methods, so that these new systems cannot sufficiently fast and thoroughly be tested anymore. This may seriously hamper the development and testing of future generations of software systems.

Model-Based Testing

Model-Based Testing (MBT) is one of the technologies to meet the challenges imposed on software testing. With MBT a System Under Test (sut) is tested against an abstract model of its required behaviour. This model serves as the system specification and is the starting point for testing. It prescribes what the sut should, and what it should not do, that is, the behaviour of the sut shall conform to the behaviour prescribed in the model. The model itself is assumed to be correct and valid; it is not the direct subject of testing or validation.

The main virtue of MBT is that the model is a perfect basis for the generation of test cases, allowing test automation that goes well beyond the mere automatic execution of manually crafted test cases. MBT allows for the algorithmic generation of large amounts of test cases, including test oracles for the expected results, completely automatically, from the model of required behaviour. Moreover, if this model is valid, i.e., expresses precisely what the system under test should do, all these generated tests are provably valid, too.

From an industrial perspective, model-based testing is a promising approach to detect more bugs faster and cheaper. The current state of practice is that test automation mainly concentrates on the automatic execution of tests, but that the problem of test generation is not addressed. Model-based testing aims at automatically generating high-quality test suites from models, thus complementing automatic test execution.

From an academic perspective, model-based testing is a formal-methods approach to testing that complements formal verification and model checking. Formal verification and model checking intend to show that a system has specified properties by proving that a model of that system satisfies these properties. Thus, any verification is only as good as the validity of the model on which it is based. Model-based testing, on the other hand, starts with a (verified) model, and then aims at showing that the real, physical implementation of the system behaves in compliance with this model. Due to the inherent limitations of testing, such as the limited number of tests that can be performed in a reasonable time, testing can never be complete: ”testing can only show the presence of errors, not their absence” [R24].

Benefits of model-based testing

Model-based testing makes it possible to generate test cases automatically, enabling the next step in test automation. It makes it possible to generate more, longer, and more diversified test cases with less effort, whereas, being based on sound algorithms, these test cases are provably valid. Since a model specifies both the possibe stimuli (inputs) to the sut and the allowed responses (outputs), the generated test cases contain inputs to be provided to the sut as well as outputs expected from the sut, which leads to better test oracles and less misinterpretation of test results.

Creating models for MBT usually already leads to better understanding of system behaviour and requirements and to early detection of specification and design errors. Moreover, constructing models for MBT paves the way for other model-based methods, such as model-based analysis, model checking, and simulation, and it forms the natural connection to model-based system development that is becoming an important driving force in the software industry.

Test suite maintenance, i.e., continuously adapting test cases when systems are modified, is an important challenge of any testing process. In MBT, maintenance of a multitude of test cases is replaced by maintenance of a model. Also diagnosis, i.e., localizing the fault when a failure is detected, is facilated through model-based diagnostic analysis. Finally, various notions of (model-) coverage can be automatically computed, expressing the level of completeness of testing, and allowing better selection of test cases.

Altogether, MBT is a promising approach to detect more bugs faster and cheaper, and thus to improve the quality and reliability of the system under test.

Sorts of model-based testing

There are different kinds of testing, and thus of model-based testing, depending on the kind of models being used, the quality characteristics being tested, the level of formality involved, the degree of accessibility and observability of the system being tested, and the kind of system being tested. Here, we consider model-based testing as formal, specification-based, active, black-box, functionality testing of reactive systems.

It is testing, because it involves checking some properties of the sut by systematically performing experiments on the real, running sut. as opposed to, e.g., formal verification, where properties are checked on a model of the system. The kind of properties being checked are concerned with functionality, i.e., testing whether the system correctly does what it should do in terms of correct responses to given stimuli. as opposed to, e.g., performance, usability, reliability, or maintainability properties. Such classes of properties are often referred to as quality characteristics [R41].

We do specification-based, black-box testing. The sut is seen as a black box without internal detail, which can only be accessed and observed through its external interfaces, as opposed to white-box testing, where the internal structure of the sut, i.e., the code, is the basis for testing. The externally observable behaviour of the system is compared with what has been specified in the model.

The testing is active, in the sense that the tester controls and observes the sut in an active way by giving stimuli and triggers to the sut, and observing its responses, as opposed to passive testing, or monitoring. Our sut s are dynamic, data-intensive, reactive systems. Reactive systems react to external events (stimuli, triggers, inputs) with output events (responses, actions, outputs). In dynamic systems, outputs depend on inputs as well as on the system state. Data-intensive means that instances of complex data structures are communicated in inputs and outputs, and that state transitions may involve complex computations and constraints.

Finally, we deal with formal testing: the model, which serves as specification prescribing the desired behaviour is written in some formal language with precisely defined syntax and semantics. Moreover, there is a formal, well-defined theory underpinning these models, sut s, tests, and their relations, in particular, the correctness (conformance) of sut s with respect to models, and the validity of tests with respect to models. This enables formal reasoning about soundness and exhaustiveness of test generation algorithms and the generated test suites, i.e., that tests exactly test what they should test.

In another form of model-based testing, called statistical model-based testing, models do not prescribe required behaviour of the sut, but they describe how users use a system. Such models are called statistical usage profiles, operational profiles, or customer profiles. The idea is that tests are selected based on the expected usage of the sut, so that behaviours that are more often used, are more thoroughly tested [R55]. Such models are derived from usage information such as usage logs. Statistical model-based testing enables the comprehensive field of statistics to be used with the goal of assessing the reliability of systems.

Theory for model-based testing

A theory for model-based testing must, naturally, first of all define the models that are considered. The modelling formalism determines the kind of properties that can be specified, and, consequently, the kind of properties for which test cases can be generated. Secondly, it must be precisely defined what it means for an sut to conform to a model. Conformance can be expressed using an implementation relation, also called conformance relation [R17]. Since the sut is considered as a black box, its behaviour is unknown and we cannot construct a model that precisely describes the behaviour of the sut, yet, we do assume that such a model, though unknown, exists in a domain of implementation models. This assumption is commonly referred to as the testability hypothesis, or test assumption [R31]. The testability hypothesis allows reasoning about sut s as if they were formal models, and it makes it possible to define the implementation relation as a formal relation between the domain of specification models and the domain of implementation models. Soundness of test suites, i.e., do all correct sut s pass, and exhaustiveness, i.e., do all incorrect sut s fail, are defined with respect to an implementation relation.

In the domain of testing reactive systems there are two prevailing ‘schools’ of formal model-based testing. The oldest one uses Mealy-machines, also called finite-state machines (FSM); see [R18] [R47], 51]. Here, we concentrate on the other one that uses labelled transition systems (LTS) for modelling. A labelled transition system is a structure consisting of states with transitions, labelled with actions, between them. The states model the system states; the labelled transitions model the actions that a system can perform. There is a rich and well-understood theory for MBT with LTS, which is elaborated in Sect. 3. Other approaches to MBT for non-reactive systems include abstract-data-type based testing [R7] and property-based testing, of which the tool QuickCheck is the prime example [R19]. Originally developed for Haskell, property-based testing is now applied for many languages.

Labelled transition systems form a well-defined semantic basis for modelling and model-based testing, but they are not suitable for writing down models explicitly. Typically, realistic systems have more states than there are atoms on earth (which is approximately 10 50) so an explicit representation of states is impossible. What is needed is a language to represent large labelled transition systems. Process algebras have semantics in terms of labeled transition systems, they support different ways of composition such as choice, parallelism, sequencing, etc., and they were heavily investigated in the eighties [R50] [R39] [R42]. They are a good candidate to serve as a notation for LTS models.

Model-based testing challenges

Software is anywhere, and ever more systems depend on software: software controls, connects, and monitors almost every aspect of systems, be it a car, an airplane, a pacemaker, or a refrigerator. Consequently, overall system quality and reliability are more and more determined by the quality of the embedded software. Typically, such software consists of several million lines of code, with complex behavioural control-flow as well as intricate data structures, with distribution and a lot of parallelism, having complex and heterogeneous interfaces, and controlling diverse, multidisciplinary processes. Moreover, this software often comes in many variants with different options and for different platforms. It is continuously evolving and being modified to adapt to different environments and new user requirements, while increasing in size, complexity, connectivity, and variability. Software is composed into larger systems and systems-of-systems, whereas system components increasingly originate from heterogeneous sources: there can be legacy, third-party, out-sourced, off-the-shelf, open source, or newly developed components.

For model-based testing, these trends lead to several challenges. First, the size of the systems implies that making complete models is often infeasible so that MBT must deal with partial and under-specified models and abstraction, and that partial knowledge and uncertainty cannot be avoided. Secondly, the combination of complicated state-behaviour and intricate input and output-data structures, and their dependencies, must be supported in modelling. Thirdly, distribution and parallelism imply that MBT must deal with concurrency in models, which introduces additional uncertainty and non-determinism. In the fourth place, since complex systems are built from sub-systems and components, and systems themselves are more and more combined into systems-of-systems, MBT must support compositionality, i.e., building complex models by combining simpler models. Lastly, since complexity leads to an astronomical number of potential test cases, test selection, i.e., how to select those tests from all potential test cases that can catch most, and most important failures, within constraints of testing time and budget, is a key issue in model-based testing.

In short, to be applicable to testing of modern software systems, MBT shall support partial models, underspecification, abstraction, uncertainty, state & data, concurrency, non-determinism, compositionality, and test selection. Though several academic and commercial MBT tools exist, there are not that many tools that support all of these aspects.

Model-based testing tools

Model-based testing activities are too laborious to be performed completely manually, so, for MBT to be effective and efficient, tool support is necessary. A large number of MBT tools exist, as a Web-search will immediately show. TorXakis is one of these MBT tools.

TorXakis is a proof-of-concept, research tool that is being developed by the Radboud University Nijmegen, the University of Twente, and ESI (TNO) in the Netherlands. It is an on-line (on-the-fly) MBT tool for formal, specification-based, active, black-box, functionality testing of reactive systems, rooted in the ioco-testing theory for labelled-transition systems [R61] [R62]. It implements the ioco-test generation algorithm for symbolic transition systems [R30], and it uses a process-algebraic modelling language Txs inspired by the language LOTOS [R11] [R42], which is supplemented with an algebraic data-type specification formalism, for which rewriting and SMT solvers are used for calculation and manipulation [R22]. Moreover, TorXakis deals with most of the challenges posed in the previous paragraph: it supports modelling of state-based control flow together with complex data, it deals with non-determinism, abstraction, partial models and under-specification, concurrency, and composition of complex models from simpler models. TorXakis supports state & data but no probabilities, real-time, or hybrid systems. Test selection is primarily random, but guidance can be provided using test purposes. TorXakis is an experimental MBT tool, used in (applied) research, education, and industrial case studies and experiments. TorXakis currently misses good usability, scalability does not always match the requirements of complex systems-of-systems, and more sophisticated test selection strategies are necessary but these are being investigated [R14].

Future developments

Current MBT algorithms and tools can potentially generate many more tests from a model than can ever be executed. Consequently, test selection is one of the major research issues in model-based testing. Test selection concerns the problem of finding criteria for selecting from the astronomical number of potential test cases those tests that have the largest chance of detecting most, and the most important bugs, with the least effort. Random approaches, which are often used for small systems, do not suffice for large and complex systems: the probability of completely randomly selecting an important test case within the space of all possible behaviours converges to zero. At the other end from random there is the explicit specification of test purposes, i.e., a tester specifies explicitly what she wishes to test, but that requires a lot of manual effort, and, moreover, how should the tester know what to test. Different approaches have been identified for determining what the “most important behaviours” are, such as testing based on system requirements, code coverage, model coverage, risk analysis, error-impact analysis, or expected user behaviour (statistical usage profiles, or operational profiles).

Related to apriori test selection, is aposteriori coverage, quality, and confidence in the tested system. Since exhaustive testing is practically impossible, the question pops up what has been achieved after testing: can the coverage of testing, the quality of the tested sut, or the confidence in correct functioning of the sut, somehow be formalized and quantified? It is not to be expected that these fundamental research questions will soon be completely solved.

MBT is an interesting technique once a model of the sut is available. Availability of behavioural models, however, is one of the issues that currently prohibits the widespread application of MBT. In the first place, there is the question of making and investing in models: there is reluctance against investing in making models, being considered as yet another software artifact. Secondly, mastering the art of behavioural modeling requires education and experience that is not always available. Thirdly, the information necessary to construct a model, in particular for legacy, third-party, or out-sourced systems or components, is not always (easily) available.

These issues lead to the question whether models can be generated automatically, e.g., for use in regression testing or testing systems after refactoring. Model generation from an sut, a kind of black-box reverse engineering, (re)constructs a model by observing the behaviour of the sut, either passively from system logs, or actively by performing special tests. This activity is called model learning, also known as testbased modeling, automata learning, or grammatical inference, and it is currently a popular research topic [R65].

New software testing methodologies are needed if testing shall keep up with software development and meet the challenges imposed on it, otherwise we may not be able to test future generations of systems. systems. Model-based testing may be one of them.