
Today's large scale distributed storage systems comprise of thousands of nodes, storing hundreds of petabytes of data. In these systems, failures are common, and this makes it essential to store the data in a redundant fashion to ensure reliability and availability. The most common way of adding redundancy is replication. However, replication is highly inefficient in terms of storage capacity utilization, and hence many distributed storage systems are now turning to ReedSolomon (erasure) codes. While these codes are optimal in terms of storage capacity utilization, they perform poorly in terms of other system resources such as disk I/O and network bandwidth. During recovery of a failed or otherwise unavailable unit of data, traditional techniques require a large amount of data to be read and transferred across the network. The above video provides a fun introduction to this issue.


My research deals with constructing new erasure codes (i.e., designing new encoding and decoding algorithms) that overcome the limitations of classical erasure codes for application into large scale distributed storage systems, and building storage systems that employ these new generation of storage codes in novel ways. Here is a brief description of my past projects.
Hitchhiker
A new erasurecoded storage system that reduces both network traffic and disk I/O by around 2545% during recovery with no additional storage, the same fault tolerance, and arbitrary flexibility in the choice of parameters, as compared to ReedSolomon based systems. We have implemented Hitchhiker on top of the Hadoop Distributed File System (HDFS) and evaluated various metrics on the datawarehouse cluster in production at Facebook with realtime traffic and workloads. The underlying Hitchhiker's erasure code is desgined by making use of the Piggybacking framework (see project below). You can find more details about Hitchhiker in our ACM SIGCOMM 2014 paper.
Reference:
Piggybacking Code Design Framework
Piggybacking is a framework for designing distributed storage codes that are efficient in data I/O and network bandwidth consumption during noderepair. The basic idea behind this framework is to take multiple instances of existing codes and add carefully designed functions of the data of one instance to the other. You can find more details about Piggybacking in our IEEE ISIT 2013 paper.
Reference:
Regenerating Code Constuctions
'Regenerating Codes' are a new class of erasure codes for distributed storage systems which are optimal with respect to both storage space and network bandwidth utilization. Regenerating codes come in two flavors: (i) Minimum Storage (MSR): which minimize bandwidth usage while storing an optimal amount of data, (ii) Minimum Bandwidth (MBR): which further minimize bandwidth usage by allowing for a slightly higher storage space.
Here is a description of some of our regenerating code constructions.
ProductMatrix Codes
ProductMatrix codes are the most general class of regenerating code constructions available in the literature, spanning both MBR and MSR. These codes are the first and the only explicit construction of regenerating codes that allow the number of nodes in the system to scale irrespective of other system parameters. This attribute allows one to vary the number of nodes in the system on the fly, which is very useful, for example, in provisioning resources based on the demand. Further, these codes can be implemented using ReedSolomon encoders and decoders (which are already existing) as building blocks.
Under these codes, each node is associated with an encoding vector. The source data is assembled in the form of a matrix, and each node stores the projection of this matrix along its encoding vector. To recover data stored in a node, all helper nodes pass a projection of the data stored in them along the direction of the encoding vector of the failed node.
We have also optimized these codes with respect to other aspects of distributed storage systems such as security, error correction, scaling via ratelessness, etc.


References:
RepairbyTransfer Codes
These are MBR codes which perform repairbytransfer: the data stored in any node can be recovered by mere transfer of data from other nodes, without requiring any computation. This minimizes the disk IO since each node reads only the data that it transfers, and also permits the use of "dumb" nodes. The animated video above presents the working of these codes.


References:
 "Explicit Construction of Optimal Exact Regenerating Codes for Distributed Storage", K. V. Rashmi, N. B. Shah, P. V. Kumar and K. Ramchandran, Proc. Allerton Conf., Sep. 2009.
 "Distributed Storage Codes with RepairbyTransfer and Nonachievability of Interior Points on the StorageBandwidth Tradeoff", N. B. Shah, K. V. Rashmi, P. V. Kumar and K. Ramchandran, IEEE Transactions on Information Theory, March 2012.

Interference Alignment based MISER Codes
Interference Alignment is a concept that was proposed in the field of wireless communication to efficiently handle multiple interfering communications. We show that this concept necessarily arises during repair in MSR codes. Using these insights we construct codes operating at the MSR point.

References:
 "Explicit Codes Minimizing Repair Bandwidth for Distributed Storage", N. B. Shah, K. V. Rashmi, P. V. Kumar and K. Ramchandran, Proc. IEEE Information Theory Workshop (ITW), Jan. 2010.
 "Interference Alignment in Regenerating Codes for Distributed Storage: Necessity and Code Constructions", N. B. Shah, K. V. Rashmi, P. V. Kumar and K. Ramchandran, IEEE Transactions on Information Theory, April 2012.

Twin Codes
This is a simple, yet powerful, framework which allows one to use any erasure code and still remain bandwidth efficient. Under this framework, nodes are partitioned into two types, and the data is encoded using two (possibly different) codes. The data is encoded in a manner such that the problem of repairing nodes of one type is reduced to that of erasuredecoding of the code employed by the other type. Here, one can choose the constituent codes based on the properties required, for instance, employing LDPC codes allows for lowcomplexity decoding algorithms.

References:
Nonachievability Results
It has been shown that there is a fundamental tradeoff between the two resources: storage space and network bandwidth, and MSR and MBR are the two extreme points of this tradeoff. While the codes described above operate at these extreme points, we have also shown that there are no codes which can achieve the interior points on this tradeoff. 

References:
Masters Thesis

  