Supercomputer Stress Test

Los Alamos National Laboratory Computer Science, 2011-12

Liaison(s): Scott Pakin, Randal Rheinheimer
Advisor(s): Christopher Stone
Students(s): Adam Novak, Benson Khau, Camille Marvin, Kimberly Sheely (PM)

As supercomputers increase in scale and utilize an increasing variety of hardware, there are more failures and fewer off-the-shelf test suites for finding them. The goal of this project is to design, implement, test, and deliver to Los Alamos National Laboratory a system that includes a test suite that can be used to find failures. The system can also be used to reduce the possible source of a failure from the entire supercomputer to a small set of components.