LARGE SCALE DISTRIBUTED REAL-TIME SYSTEMS
Gary Koob

Real-time technology has historically been associated with the niche domains of embedded systems, such as control systems, and critical applications, such as combat control or air traffic control. The unifying attributes of such systems are tight coupling to external processes characterized by stringent time constraints. The emergence of large scale distributed systems enabled by advances in networking technology has introduced real-time concerns into mainstream commercial products. While this is most visible through the current commercial emphasis on multimedia, trends toward increasingly complex network applications, expansion of networks to encompass not just general purpose computers but specialized "appliances" and embedded control systems, continually evolving configurations and mobility all promise to establish advanced real-time technology as a priority for commercial strategy for the forseeable future.

As though the challenges of providing assured real-time service in small, customized, or dedicated systems weren't difficult enough, this new scenario significantly raises the stakes--and the risks. Real-time assurance fundamentally relies on predictability of the timing behavior of both the workload and the platform. In this new environment, applications may still be amenable to detailed analysis, but the platform is now highly unstable and unpredictable and the application must now share resources with an unknown and dynamically changing workload. The fundamental challenge is how to deliver assured real-time service to applications in such an environment.

Time must become a first class attribute at all system levels. Techniques appropriate for closed, static systems must be re-engineered into a framework of services that allow dynamic construction of the execution environment necessary to guarantee application specified timing constraints. While networking technology is emerging that allows low level quality-of-service negotiation and adaptation (ATM) and real-time support (RTTP), there is currently no framework for mapping application requirements into these services. Quality-of-Service concepts, which are currently limited primarily to the networking domain, need to be expanded to encompass all system resources in a manner that permits global, end-to-end assurance. The inherent uncertainties in a large scale, evolving system dictate that adaptation be the norm: innovative methods of structuring applications to adapt to available service quality while still delivering acceptable results (possibly degraded in an understandable way) are needed.

Critical technologies for achieving this goal include: self-describing resources, real-time monitoring technology to measure performance and relate it to application-level expectations, incorporation of timing semantics into module interfaces allowing specification of and reasoning about timing properties of systems composed from such modules, open implementations of resources allowing low-level application-specific manipulation of timing properties and, finally, application development environments that permit the construction of flexible applications supporting rational tradeoffs among timing constraints and result quality.

While this discussion has focused on the challenges of achieving real-time assurance in the emerging environment of large-scale distributed systems, it should be noted that time is but one of the attributes that must be addressed. An integrated application and system view of quality-of-service encompassing dependability, security, and safety constraints as well as timing must be developed. Relationships among these attributes are currently poorly understood. Finally, as future applications are likely to follow the current trend toward increasingly interactive and collaborative paradigms, the role of the human user(s) must be explicitly taken in account, as well.