The area of building embedded real-time systems is one in which the applications being designed are more advanced than the available underlying system support. Examples of such applications can be found in several fields, including robot control, avionics, and plant control systems. These systems all have hard real-time requirements: if a deadline is missed, then the result is catastrophic. Furthermore, such deadlines must often be met even in the face of bounded processor or network failures. Yet, the principles for building such systems are still being developed and the availability of systems supporting these principles is very limited. One of the most important characteristics required by a real-time system is predictability, and predictability can be met in part by ensuring that all timing constraints are met. In order to meet timing constraints, the worst case execution must be computable. Hence, all actions need to be time bounded in order to compute the cost of a given thread, and a scheduling policy must be used that guarantees resource contention does not cause deadlines to be missed [LL73, SRL90]. Several recent research projects have addressed the problem of predictability both in the context of centralized and distributed systems, including ARTS [TM89], RT-Mach [TNR90], MARS [DRSK89], and Spring [SR87, SR89] These projects are based on real-time scheduling algorithms, and usually also include tools for the off-line development of pre-defined schedules. The issue of predictable operation in the face of crashes and network failures, however, has not been as well addressed. Failures are masked by using redundancy. For example, in a distributed system the failure of a given process can be masked by replicating the process on several different machines. By doing so, the failure of one replica (caused by the crash of the machine, for example), does not imply a failure in the service: the other replicas can still provide the desired service [Sch90]. Even ignoring predictability, the development of fault-tolerant applications can be a complex task when the programmer does not have supporting software tools. At Cornell, we have developed the ISIS toolkit that supplies a group programming paradigm for building fault-tolerant programs [BJS87, BC91]. However, the current version of this system is not suitable for building real-time programs. ISIS runs on top of Unix and contains no scheduling support for writing predictable real-time applications. Our goal is to create an environment that supports the development of hard real-time systems even in the face of resource loss. Corto, the system we are building, will support the basic programming abstractions of ISIS; namely, ordered delivery of messages to groups of processes and agreement on membership. Corto will also support the predictable scheduling of processes and communication that systems like ARTS and RT-Mach provide. We are finding it challenging to integrate these two goals. ISIS supports a model of programming called virtual synchrony in which events such as failure, recovery and message delivery are totally ordered. This abstraction is fundamental to ISIS; because of virtual synchrony, building applications that maintain distributed state in the face of changing resources becomes very straightforward. However, the implementation of virtual synchrony is done by a kind of distributed scheduler which must be made predictable. Hence, implementing Corto is not just running ISIS on top of a real-time kernel. Our initial approach is to build a suite of basic mechanisms, described below, that support a small set of real-time applications. We are implementing these mechanisms on top of the ISIS transport layer (MUTS [vRBC + 92]) running on a stand-alone Unix system with minimal terminal support and our own scheduling. While such a system will not be completely hard real-time, this version will help us refine the right set of mechanisms needed for highly available real-time applications. We will then move the system to a kernel that supports hard real-time scheduling.