NSF Grant Supports Storage-System Research

To help enhance the quality of systems research, the National Science Foundation has awarded Harvey Mudd College computer science professor Geoff Kuenning a grant totaling $161,186.

The project, “Collaborative Research: CI-SUSTAIN: National File System Trace Repository,” will help alleviate a longstanding problem in the study of computer systems: the difficulty of providing workloads to drive the system being studied.

Using funds from several sources, including prior NSF grants and industry donations, Kuenning and Erez Zadok, professor of computer science at Stony Brook University, have successfully developed and deployed the Storage Networking Industry Association’s I/O Tools, Traces and Analysis (IOTTA), a national repository for file system traces. During its operation, the repository has proven its worth to the scientific community and is being used by researchers in a wide variety of projects. Initially, the life of the repository was expected to be only a few years, but its utility and popularity suggest that it will be needed for at least another decade, and possibly far longer. With funds from the current grant, Kuenning and Zadok will continue operating the repository, allowing investigators to make rapid progress in storage-system research, in turn improving the quality of computer systems of all sizes, types and applications. The repository has become the standard resource for file system traces.

“Our project will ensure the continued utility of the trace repository by providing ongoing maintenance,
development of new tools essential to the successful use of traces and transition to community support facilities,” says Kuenning, who also studies techniques for training and modeling file systems, and deduplication techniques for file systems. “In particular, we will continue to provide site management and user support, establish standards for trace storage, gather and distribute traces, ensure that existing tools are supported and coordinate the development of additional tools. We will also expand the site’s services, including adding an overseas mirror.”

A critical requirement of the proposal is to transition the IOTTA repository to long-term community sustainability. “The transition to long-term sustainability will allow the repository to provide a quality resource for many years to come,” says Kuenning.

One significant effect of the NSF grant is its impact on prospective and current students. It provides support for Harvey Mudd student researchers, who will be able to establish relationships with researchers at other institutions, a benefit that enhances their graduate school opportunities. The project also will be used as a vehicle for recruiting undergraduates, women and underrepresented minorities into the research community. Undergraduate and graduate students will be able to study real-world trace data without difficulty, thanks to the standardized format and tools provided. This opportunity has already been demonstrated in courses that both professors have taught (Kuenning teaches a File Systems course, in which students have used IOTTA trace data in their term projects. Zadok has taught special-topics graduate courses titled “Storage Systems” in which IOTTA’s traces were used as part of class projects.).

In earlier work, Kuenning, a Harvey Mudd faculty member since 1998, was involved in the development of a memory-based file system, in a study of file size distributions and in a system for detecting insider misbehavior by observing file access patterns. Before that, he built the SEER predictive hoarding system, which allows mobile users to ensure that they will have the files they need to work while disconnected. The success of SEER demonstrated that prediction is possible and can have a significant impact on computer systems. Some of the SEER work has been applied to Kuenning’s more recent research and continues to influence researchers at other institutions.

“The broader impacts of IOTTA are the essence of the repository,” says Kuenning. “The studies enabled by the repository will help researchers design optimal file systems for 21st-century computers and storage devices, which in turn will be critical to achieving maximum performance in computer systems.”