Chip Self-Organization and Fault Tolerance in Massively Defective Multicore Arrays

Jacques Henri Collet,Piotr Zajac,Mihalis Psarakis,Dimitris Gizopoulos

doi:10.1109/tdsc.2009.53

Abstract

We study chip self-organization and fault tolerance at the architectural level to improve dependable continuous operation of multicore arrays in massively defective nanotechnologies. Architectural self-organization results from the conjunction of self-diagnosis and self-disconnection mechanisms (to identify and isolate most permanently faulty or inaccessible cores and routers), plus self-discovery of routes to maintain the communication in the array. In the methodology presented in this work, chip self-diagnosis is performed in three steps, following an ascending order of complexity: interconnects are tested first, then routers through mutual test, and cores in the last step. The mutual testing of routers is especially important as faulty routers are disconnected by good ones with no assumption on the behavior of defective elements. Moreover, the disconnection of faulty routers is not physical (“hard”) but logical (“soft”) in that a good router simply stops communicating with any adjacent router diagnosed as defective. There is no physical reconfiguration in the chip and no need for spare elements. Ultimately, the multicore array may be viewed as a black box, which incorporates protection mechanisms and self-organizes, while the external control reduces to a simple chip validation test which, in the simplest cases, reduces to counting the number of valid and accessible cores.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Chip Self-Organization and Fault Tolerance in Massively Defective Multicore Arrays

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Dependable and Secure Computing

Lead the way for us

Journal: IEEE Transactions on Dependable and Secure Computing	Publication Date: Mar 1, 2011
Citations: 79

Similar Papers

On the design and analysis of fault tolerant NoC architecture using spare routers
Yung-Chang Chang ... Chung-Kai Liu
-
Yung-Chang Chang, et. al.Yung-Chang Chang ... Chung-Kai Liu
01 Jan 2010
01 Jan 2010

On the design and analysis of fault tolerant NoC architecture using spare routers
...
-
, et. al. ...
25 Jan 2011
25 Jan 2011

Fault-tolerant communication in invasive networks on chip
Jan Heisswolf ... Marco Duden
-
Jan Heisswolf, et. al.Jan Heisswolf ... Marco Duden
01 Jun 2015
01 Jun 2015

Fault-Tolerant Mesh-Based NoC with Router-Level Redundancy
Yung-Chang Chang ... Cihun-Siyong Alex Gong
Journal of Signal Processing Systems | VOL. 92
Yung-Chang Chang, et. al.Yung-Chang Chang ... Cihun-Siyong Alex Gong
07 Sep 2019
Journal of Signal Processing Systems | VOL. 92

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Chip Self-Organization and Fault Tolerance in Massively Defective Multicore Arrays

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Dependable and Secure Computing