TY - GEN
T1 - Prototype of fault adaptive embedded software for large-scale real-time systems
AU - Messie, Derek
AU - Jung, Mina
AU - Oh, Jae C.
AU - Shetty, Shweta
AU - Nordstrom, Steven
AU - Haney, Michael
PY - 2005
Y1 - 2005
N2 - This paper describes a comprehensive prototype of large-scale fault adaptive embedded software developed for the proposed Fermilab BTeV high energy physics experiment. Lightweight self-optimizing agents embedded within Level 1 of the prototype are responsible for proactive and reactive monitoring and mitigation based on specified layers of competence. The agents are self-protecting, detecting cascading failures using a distributed approach. Adaptive, reconfigurable, and mobile objects for reliablility are designed to be self-configuring to adapt automatically to dynamically changing environments. These objects provide a self-healing layer with the ability to discover, diagnose, and react to discontinuities in real-time processing. A generic modeling environment was developed to facilitate design and implementation of hardware resource specifications, application data flow, and failure mitigation strategies. Level 1 of the planned BTeV trigger system alone will consist of 2500 DSPs, so the number of components and intractable fault scenarios involved make it impossible to design an 'expert system' that applies traditional centralized mitigative strategies based on rules capturing every possible system state. Instead, a distributed reactive approach is implemented using the tools and methodologies developed by the Real-Time Embedded Systems group
AB - This paper describes a comprehensive prototype of large-scale fault adaptive embedded software developed for the proposed Fermilab BTeV high energy physics experiment. Lightweight self-optimizing agents embedded within Level 1 of the prototype are responsible for proactive and reactive monitoring and mitigation based on specified layers of competence. The agents are self-protecting, detecting cascading failures using a distributed approach. Adaptive, reconfigurable, and mobile objects for reliablility are designed to be self-configuring to adapt automatically to dynamically changing environments. These objects provide a self-healing layer with the ability to discover, diagnose, and react to discontinuities in real-time processing. A generic modeling environment was developed to facilitate design and implementation of hardware resource specifications, application data flow, and failure mitigation strategies. Level 1 of the planned BTeV trigger system alone will consist of 2500 DSPs, so the number of components and intractable fault scenarios involved make it impossible to design an 'expert system' that applies traditional centralized mitigative strategies based on rules capturing every possible system state. Instead, a distributed reactive approach is implemented using the tools and methodologies developed by the Real-Time Embedded Systems group
UR - http://www.scopus.com/inward/record.url?scp=28344434790&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=28344434790&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:28344434790
SN - 0769523080
SN - 9780769523088
T3 - Proceedings - 12th IEEE International Conference and Workshops on the Engineering of Computer-Based Systems, ECS 2005
SP - 498
EP - 505
BT - Proceedings - 12th IEEE International Conference and Workshops on the Engineering of Computer-Based Systems, ECS 2005
A2 - Rozenblit, J.
A2 - O'Neill, T.
A2 - Peng, J.
T2 - Proceedings - 12th IEEE International Conference and Workshops on the Engineering of Computer-Based Systems, ECS 2005
Y2 - 4 April 2005 through 7 April 2005
ER -