Finding Files Fast

James A. Woods

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-83-148
January 1983

http://www2.eecs.berkeley.edu/Pubs/TechRpts/1983/CSD-83-148.pdf

A fast filename search facility for UNIX is presented. It consolidates two data compression methods with a novel string search technique to rapidly locate arbitrary files. The code, integrated into the standard find utility, consults a preprocessed database, regenerated daily. This contrasts with the usual mechanism of matching search keys against candidate items generated on-the-fly from a scattered directory structure. The pathname database is an incrementally-encoded lexicographically sorted list (sometimes referred to as a "front-compressed" file) which is also subjected to common bigram coding to effect further space reduction. The storage savings are a factor of five to six over the standard ascii representation. The list is scanned using a modified linear search specially tailored to the incremental encoding; typical "user time" required by this algorithm is 40%-50% less than with naive search.


BibTeX citation:

@techreport{Woods:CSD-83-148,
    Author = {Woods, James A.},
    Title = {Finding Files Fast},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {1983},
    Month = {Jan},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1983/5392.html},
    Number = {UCB/CSD-83-148},
    Abstract = {A fast filename search facility for UNIX is presented.  It consolidates two data compression methods with a novel string search technique to rapidly locate arbitrary files. The code, integrated into the standard find utility, consults a preprocessed database, regenerated daily. This contrasts with the usual mechanism of matching search keys against candidate items generated on-the-fly from a scattered directory structure. The pathname database is an incrementally-encoded lexicographically sorted list (sometimes referred to as a "front-compressed" file) which is also subjected to common bigram coding to effect further space reduction. The storage savings are a factor of five to six over the standard ascii representation. The list is scanned using a modified linear search specially tailored to the incremental encoding; typical "user time" required by this algorithm is 40%-50% less than with naive search.}
}

EndNote citation:

%0 Report
%A Woods, James A.
%T Finding Files Fast
%I EECS Department, University of California, Berkeley
%D 1983
%@ UCB/CSD-83-148
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1983/5392.html
%F Woods:CSD-83-148