# Nye's Trie and Floret Estimators: Techniques for Detecting and Repairing Divergence in the SCADS Distributed Storage Toolkit

### Jesse Trutna

###
EECS Department

University of California, Berkeley

Technical Report No. UCB/EECS-2010-30

March 18, 2010

### http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-30.pdf

We present two novel data structures developed in the SCADS distributed storage toolkit for synchronizing replicated datasets with predictable performance: Nye's trie is a lightweight index for ordered key-value sets that supports synchronization with time and bandwidth utilization proportional to the number of diverging entries. While efficient, this process is only predictable if the number of divergent entries can be measured. For this, we introduce the floret estimator, a novel sublinear-space set summarization structure used to estimate the cardinalities of set difference, union, and intersection operations. We describe how these structures satisfy the design requirements of the SCADS system, detail their design and implementation, and present a set of microbenchmarks demonstrating their functionality.

BibTeX citation:

@techreport{Trutna:EECS-2010-30, Author = {Trutna, Jesse}, Editor = {Patterson, David A. and Fox, Armando}, Title = {Nye's Trie and Floret Estimators: Techniques for Detecting and Repairing Divergence in the SCADS Distributed Storage Toolkit}, Institution = {EECS Department, University of California, Berkeley}, Year = {2010}, Month = {Mar}, URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-30.html}, Number = {UCB/EECS-2010-30}, Abstract = {We present two novel data structures developed in the SCADS distributed storage toolkit for synchronizing replicated datasets with predictable performance: Nye's trie is a lightweight index for ordered key-value sets that supports synchronization with time and bandwidth utilization proportional to the number of diverging entries. While efficient, this process is only predictable if the number of divergent entries can be measured. For this, we introduce the floret estimator, a novel sublinear-space set summarization structure used to estimate the cardinalities of set difference, union, and intersection operations. We describe how these structures satisfy the design requirements of the SCADS system, detail their design and implementation, and present a set of microbenchmarks demonstrating their functionality.} }

EndNote citation:

%0 Report %A Trutna, Jesse %E Patterson, David A. %E Fox, Armando %T Nye's Trie and Floret Estimators: Techniques for Detecting and Repairing Divergence in the SCADS Distributed Storage Toolkit %I EECS Department, University of California, Berkeley %D 2010 %8 March 18 %@ UCB/EECS-2010-30 %U http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-30.html %F Trutna:EECS-2010-30