Fault Tolerance Techniques For High Performance Computing
Download and Read Fault Tolerance Techniques For High Performance Computing full books in PDF, ePUB, and Kindle. Read online free Fault Tolerance Techniques For High Performance Computing ebook anywhere anytime directly on your device. We cannot guarantee that every ebooks is available!
Fault-Tolerance Techniques for High-Performance Computing
Author | : Thomas Herault |
Publisher | : Springer |
Total Pages | : 325 |
Release | : 2015-07-01 |
Genre | : Computers |
ISBN | : 3319209434 |
Download Fault-Tolerance Techniques for High-Performance Computing Book in PDF, Epub and Kindle
This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems.
Fault-Tolerance Techniques for High-Performance Computing Related Books
Pages: 325
Pages: 498
Pages: 172
Pages: 543
Pages: 543