Systems of systems and coordinated atomic actions

Robert Schaefer

doi:10.1145/1039174.1039196

Abstract

System of systems (SoS) is an emerging field in the design and development of complex systems that are built from large scale component systems. A SoS has the following attributes: operational and managerial independence of components, a geographic extent that limits control mechanisms to information exchange, an evolutionary nature, and emergent behavior. The subsystems that comprise the SoS often are built by different organizations with conflicting goals, designed under different assumptions and built to different quality standards. These factors impact fault detection, fault isolation, and fault tolerance and can result in systems that cannot easily be debugged, integrated, or maintained. When fault detection and fault tolerance are deficient, the system may behave in a fragile or brittle manner, randomly and repeatedly crashing. Crashes prevent automated diagnosis algorithms from being executed and can prevent manual root cause analysis by erasing system state. Fragility during system integration can prevent achieving schedule milestones and deadlines. Deficient fault detection and fault isolation also impacts end users and system maintainers. (Think <insert name of infamous project here>).From the system architect's point of view, designing a system that can detect all possible fault conditions across all components can be an extremely difficult, if not impossible challenge. Can any system be trusted to diagnose or repair itself when it has been corrupted by faults? How do you prevent local faults from growing into global failures? The end users may have unreasonable expectations about how the system should behave when components within the SoS behave abnormally or fail. They may expect better behavior than the typical PC. The system maintainers may expect a coherent systems view of failures to isolate faulted components and to provide an orderly and safe shutdown or recovery.(Think power grid blackouts, Telecomm failures, etc.)The most beneficial way to achieve fault tolerance is to design in fault detection and fault reporting such that defined boundaries such as subsystems serve as natural firewalls for fault containment. Although partitioning the system into subsystems for fault containment is well known and practiced, the end result as experienced at the time of system integration is rarely a success. COTS middleware, intended to aid distributed design often becomes in effect a step backwards by providing fertile ground for faults and failures that breach fault containment boundaries. (Think <insert name of OS or middleware vendor here>)What can be done to improve this situation? This paper addresses the system architectural partitioning concept of the Coordinated Atomic Actions (CAA). CAA promotes a different manner of organizing software architecture that improves fault containment across potentially faulty components. CAA was first invented by members of Brian Randell's research group at the University of Newcastle at Tyne in the mid 1990's. CAA promotes the concept of the "transaction" which has been traditionally identified with database applications. When you access your bank account via ATM, you are exercising database transactions within your bank's financial SoS. CAA applies transactions to cooperating concurrent distributed processes, which are the basis for most large complex computing systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Systems of systems and coordinated atomic actions

Abstract

Talk to us

Similar Papers

More From: ACM SIGSOFT Software Engineering Notes

Lead the way for us

Journal: ACM SIGSOFT Software Engineering Notes	Publication Date: Jan 1, 2005
Citations: 7

Similar Papers

Mutation analysis for system of systems policy testing
...
-
, et. al. ...
20 May 2017
20 May 2017

Fault detection and isolation based on novel unknown input observer design
Weitian Chen ... M Saif
-
Weitian Chen, et. al. Weitian Chen ... M Saif
01 Jan 2006
01 Jan 2006

System of Systems (SoS) enterprise systems engineering for information‐intensive organizations
Paul G Carlock ... Robert E Fenton
Systems Engineering | VOL. 4
Paul G Carlock, et. al.Paul G Carlock ... Robert E Fenton
01 Jan 2001
Systems Engineering | VOL. 4

FAULT TOLERANCE FOR TWO WHEEL MOBILE ROBOT USING FSM (FINITE STATE MACHINE)
Chan Shi Jing ... Mohammad Fadhil Abas
International Journal of Software Engineering and Computer Systems | VOL. 3
Chan Shi Jing, et. al.Chan Shi Jing ... Mohammad Fadhil Abas
28 Feb 2017
International Journal of Software Engineering and Computer Systems | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Systems of systems and coordinated atomic actions

Abstract

Talk to us

Similar Papers

More From: ACM SIGSOFT Software Engineering Notes