Computer Science Thesis Proposal

  • Gates Hillman Centers
  • Traffic21 Classroom 6501
  • ABUTALIB AGHAYEV
  • Ph.D. Student
  • Computer Science Department
  • Carnegie Mellon University
Thesis Proposals

Efficiently Adopting Zone Devices in Distributed Storage

Distributed storage systems, such as cluster and parallel file systems and distributed object stores, have conventionally relied on general-purpose local file systems as storage backends.  So far, this convention has delivered reasonable performance, precluding questions on the suitability of file systems as distributed storage backends.

Recent developments in the storage hardware targeted at data centers, however, present a challenge for this convention. Solid-state drives (SSDs) are abandoning the flash translation layer to achieve predictable performance and low tail latency. Hard disk drives (HDDs) are adopting shingled magnetic recording for higher capacity at low cost. Most importantly, these data center SSDs and HDDs are evolving to use the same new backward-incompatible zone interface. Adopting these devices is problematic for most file systems because file systems heavily depend on the venerable block interface and carry the legacy of decades-old design from the era of small drives and single-node operating systems.

Our thesis is that to achieve the low cost and predictable performance offered by zone devices, distributed storage systems should abandon file systems as storage backends and implement specialized backends from scratch that allow them to quickly and effectively leverage the benefits of zone devices.

In this proposal, we present the following evidence to support our thesis. We show that using file systems on HDDs with a translation layer has high garbage collection cost: even on a sequential workload, the overhead can be up to 40%. We perform a longitudinal study of storage backends in Ceph—a widely-used distributed storage system—and show that essential services, such as transactions, can be up to 80% faster when implemented directly on a raw device, compared to when implemented on top of file systems.We propose techniques for adapting BlueStore, a Ceph backend implemented on raw devices, to work effectively on top of zone devices.

Thesis Committee:
George Amvrosiadis (Chair)
Gregory R. Ganger
Garth A. Gibson
Peter J. Desnoyers (Northeastern University)
Remzi H. Arpaci-Dusseau (University of Wisconsin-Madison)
Sage A. Weil (Red Hat, Inc.)

Copy of Thesis Summary

For More Information, Please Contact: 
Keywords: