Friday Dec 7th, 1-2AM @ BA5205
Speaker: Gala Yagdar
On Fault Tolerance, Locality, and Optimality in Locally Repairable Codes
Large-scale storage systems lie at the heart of the big data revolution. As these systems grow in scale and capacity, their complexity grows accordingly, building on new storage media, hybrid memory hierarchies, and distributed architectures. Numerous layers of abstraction hide this complexity from the applications, but also hide valuable information that could improve the system’s performance considerably. I will demonstrate how to bridge this semantic gap in the context of erasure codes, which are used to guarantee data availability and durability.
Locally repairable codes (LRCs) offer tradeoffs between storage overhead and repair cost. They facilitate more efficient recovery scenarios by storing additional parity blocks in the system, but these additional blocks may eventually increase the number of blocks that must be reconstructed. Existing codes differ in their use of the additional parity blocks, but also in their locality semantics and in the parameters for which they are defined. As a result, existing theoretical models cannot be used to directly compare different LRCs to determine which code will offer the best recovery performance, and at what cost.
In this study, we performed the first systematic comparison of existing LRC approaches in light of two new metrics: the average degraded read cost, and the normalized repair cost. We show the tradeoff between these costs and the code’s fault tolerance, and that different approaches offer different choices in this tradeoff. Our experimental evaluation on a Ceph cluster deployed on Amazon EC2 further demonstrates that the normalized repair cost metric can reliably identify the LRC approach that would achieve the lowest repair cost in each system setup.
Gala Yagdar from the Technion will be giving a talk on her Usenix ATC’18 work. This talk is based on joint work with Oleg Kolosov, Matan Liram, Eitan Yaakobi, Itzhak Tamo and Alexander Barg, published in USENIX ATC ‘18.