Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences

COLLEGE OF ENGINEERING

UC Berkeley

Nye's Trie and Floret Estimators: Techniques for Detecting and Repairing Divergence in the SCADS Distributed Storage Toolkit

Jesse Trutna

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2010-30
March 18, 2010

http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-30.pdf

We present two novel data structures developed in the SCADS distributed storage toolkit for synchronizing replicated datasets with predictable performance: Nye's trie is a lightweight index for ordered key-value sets that supports synchronization with time and bandwidth utilization proportional to the number of diverging entries. While efficient, this process is only predictable if the number of divergent entries can be measured. For this, we introduce the floret estimator, a novel sublinear-space set summarization structure used to estimate the cardinalities of set difference, union, and intersection operations. We describe how these structures satisfy the design requirements of the SCADS system, detail their design and implementation, and present a set of microbenchmarks demonstrating their functionality.


BibTeX citation:

@techreport{Trutna:EECS-2010-30,
    Author = {Trutna, Jesse},
    Editor = {Patterson, David A. and Fox, Armando},
    Title = {Nye's Trie and Floret Estimators: Techniques for Detecting and Repairing Divergence in the SCADS Distributed Storage Toolkit},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2010},
    Month = {Mar},
    URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-30.html},
    Number = {UCB/EECS-2010-30},
    Abstract = {We present two novel data structures developed in the SCADS distributed storage toolkit for synchronizing replicated datasets with predictable performance: Nye's trie is a lightweight index for ordered key-value sets that supports synchronization with time and bandwidth utilization proportional to the number of diverging entries. While efficient, this process is only predictable if the number of divergent entries can be measured. For this, we introduce the floret estimator, a novel sublinear-space set summarization structure used to estimate the cardinalities of set difference, union, and intersection operations. We describe how these structures satisfy the design requirements of the SCADS system, detail their design and implementation, and present a set of microbenchmarks demonstrating their functionality.}
}

EndNote citation:

%0 Report
%A Trutna, Jesse
%E Patterson, David A.
%E Fox, Armando
%T Nye's Trie and Floret Estimators: Techniques for Detecting and Repairing Divergence in the SCADS Distributed Storage Toolkit
%I EECS Department, University of California, Berkeley
%D 2010
%8 March 18
%@ UCB/EECS-2010-30
%U http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-30.html
%F Trutna:EECS-2010-30