Optimal Checkpointing Strategy for Real-time Systems with Both Logical and Timing Correctness

Lin Zhang, Zifan Wang, Fanxin Kong

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Real-time systems are susceptible to adversarial factors such as faults and attacks, leading to severe consequences. This paper presents an optimal checkpoint scheme to bolster fault resilience in real-time systems, addressing both logical consistency and timing correctness. First, we partition message-passing processes into a directed acyclic graph (DAG) based on their dependencies, ensuring checkpoint logical consistency. Then, we identify the DAG's critical path, representing the longest sequential path, and analyze the optimal checkpoint strategy along this path to minimize overall execution time, including checkpointing overhead. Upon fault detection, the system rolls back to the nearest valid checkpoints for recovery. Our algorithm derives the optimal checkpoint count and intervals, and we evaluate its performance through extensive simulations and a case study. Results show a 99.97% and 67.86% reduction in execution time compared to checkpoint-free systems in simulations and the case study, respectively. Moreover, our proposed strategy outperforms prior work and baseline methods, increasing deadline achievement rates by 31.41% and 2.92% for small-scale tasks and 78.53% and 4.15% for large-scale tasks.

Original languageEnglish (US)
Article number66
JournalTransactions on Embedded Computing Systems
Volume22
Issue number4
DOIs
StatePublished - Jul 24 2023

Keywords

  • Real-time systems
  • checkpointing
  • fault resilience
  • logical consistency
  • timing correctness

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Optimal Checkpointing Strategy for Real-time Systems with Both Logical and Timing Correctness'. Together they form a unique fingerprint.

Cite this