FORMAL VERIFICATION, TESTING AND CHECKING OF REAL-TIME SYSTEMS
Insup Lee

As computers become essential components of complex systems, it is increasingly important to address the reliability aspects of computer systems. In particular, computers are increasingly used in safety critical environments. Any failure of such computer systems may cause a great financial loss, environmental disaster, or even the loss of lives. The potential high cost associated with an incorrect operation of these systems has created a demand for a rigorous framework for reasoning about the correctness of such systems. Reliability and correctness can be approached in several ways, including analysis and testing of the system before it is deployed, monitoring of the system while it is in operation to ensure that it stays in a safe state, and adjustment of the system while it is in operation in response to errors detected by the monitor.

There are many techniques developed and used in practice for improving reliability; they range from static analysis based on formal methods and scheduling theory to dynamic analysis based on testing and run-time monitoring. Static analysis of real-time systems using formal methods can lead to a proof that a system specification correctly meets the given requirements specification. However, there is a gap between the abstract model described by formal methods and the physical model of the system. This gap is what makes formal methods effective and useful, namely ignoring unnecessary details. At the same time, this can be a problem since the software components of an implementation can be much more complex and detailed than the corresponding components of the specification and can fail in unexpected ways. In addition, a specification often makes assumptions about the hardware (e.g., mechanical and electrical components) that may not be always met in the implementation. Formal verification does not and cannot address these sources of failure.

One possible way of gaining confidence in the physical system is by testing it. However, exhaustive testing of real-time systems is impossible given their complexity. Using reasonable assumptions about system behavior it is sometimes possible to carefully design test suites such that testing with these test suites improves our confidence in the correctness of the system. Particularly promising techniques are automatic test generation based on correctness requirements specified using formal methods.

Another way is to ensure the correct functioning of the system through monitoring the system at run-time. Although there is a fair amount of work on monitoring real-time systems and also monitoring techniques are used in practice, most of monitoring approaches are {\em ad hoc} and works only for specific systems. Furthermore, most of the monitors that have been proposed do not provide a proof that the system is functioning correctly since they only test a set of necessary but not sufficient conditions for correct functioning.

Thus one of the important topics of future investigation is the question of deriving necessary and sufficient conditions for correct functioning of a system. Such monitoring produces proofs of correctness of the system behavior and we will refer to it as ``checking'' to distinguish it from more traditional monitoring. The idea of checking was recently proposed for traditional programs by Blum and his coworkers. Since then program checkers have been designed for a number of problems for which verification is very difficult because of the size and complexity of programs involved. These results demonstrate the viability of the program checking paradigm. It may be possible to extend this paradigm and apply the extended paradigm to real-time systems thereby obtaining run-time proofs of correct behavior of such systems.

Although static analysis, testing and checking are quite effective when applied independently, they have not been applied in unified fashion. What is needed is a general framework where these different techniques can be uniformly applied. With a uniform framework, one model of the system is specified and then can be used for both static and dynamic analyses. This would save the effort of developing multiple models since each technique requires its own model of the system. Furthermore, this would eliminate consistency checks between models used by different techniques since there is only one model.

Such a general framework can be developed by extending an existing real-time formalism such as logics, automata/state machines, Petri nets, and process algebras. Based on our previous work, we believe that it is possible to develop such a framework based on the Algebra of Communicating Shared Resources (ACSR), which is a real-time process algebra that we have developed for the specification and analysis of distributed real-time systems. Other researchers have shown that it should also be possible to develop such a framework based on other formal methods.

There are many challenges and problems that need to be addressed to improve the reliability of real-time systems. In particular, we need to develop principles and concepts for developing reliable software systems, which are proven to be effective in practice. Some of these challenges are: 1) What are effective methods and techniques for analyzing and validating large-scale real-time systems? 2) How can we ensure the proper operation of a deployed system? For example, how can a running system be monitored and recovered from failure? These techniques should be based on formal methods, not ad-hoc methods.