Module 1: Introduction
Fault Classification, Types of Redundancy, Basic Measures of Fault Tolerance, Hardware Fault Tolerance, The Rate of Hardware Failures, Failure Rate, Reliability, and Mean Time to Failure, Canonical and Resilient Structures , Other Reliability Evaluation Techniques.
Module 2: Information Redundancy
Information Redundancy, Coding, Resilient Disk Systems, Data Replication, Voting: Hierarchical Organization, Primary-Backup Approach, Algorithm-Based Fault Tolerance, Fault-Tolerant Networks: Measures of Resilience, Common Network Topologies and Their Resilience, Fault- Tolerant Routing.
Module 3: Software Fault Tolerance
Acceptance Tests, Single-Version Fault Tolerance, N-Version Programming, Recovery Block Approach, Preconditions, Postconditions, and Assertions, Exception-Handling, Software Reliability Models, Fault-Tolerant Remote Procedure Calls.
Module 4: Checkpointing
Introduction, Checkpoint Level, Optimal Checkpointing-An Analytical Model, Cache-Aided Rollback Error Recovery (CARER), Checkpointing in Distributed Systems, Checkpointing in Shared-Memory Systems, Checkpointing in Real-Time Systems, Case Studies: NonStop Systems, Stratus Systems, Cassini Command and Data Subsystem, IBM G5, IBM Sysplex, Itanium.
Fault Tolerant Systems