Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences

COLLEGE OF ENGINEERING

UC Berkeley

Finding Files Fast

James A. Woods

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-83-148
January 1983

http://www.eecs.berkeley.edu/Pubs/TechRpts/1983/CSD-83-148.pdf

A fast filename search facility for UNIX is presented. It consolidates two data compression methods with a novel string search technique to rapidly locate arbitrary files. The code, integrated into the standard find utility, consults a preprocessed database, regenerated daily. This contrasts with the usual mechanism of matching search keys against candidate items generated on-the-fly from a scattered directory structure. The pathname database is an incrementally-encoded lexicographically sorted list (sometimes referred to as a "front-compressed" file) which is also subjected to common bigram coding to effect further space reduction. The storage savings are a factor of five to six over the standard ascii representation. The list is scanned using a modified linear search specially tailored to the incremental encoding; typical "user time" required by this algorithm is 40%-50% less than with naive search.


BibTeX citation:

@techreport{Woods:CSD-83-148,
    Author = {Woods, James A.},
    Title = {Finding Files Fast},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {1983},
    Month = {Jan},
    URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/1983/5392.html},
    Number = {UCB/CSD-83-148},
    Abstract = {A fast filename search facility for UNIX is presented.  It consolidates two data compression methods with a novel string search technique to rapidly locate arbitrary files. The code, integrated into the standard find utility, consults a preprocessed database, regenerated daily. This contrasts with the usual mechanism of matching search keys against candidate items generated on-the-fly from a scattered directory structure. The pathname database is an incrementally-encoded lexicographically sorted list (sometimes referred to as a "front-compressed" file) which is also subjected to common bigram coding to effect further space reduction. The storage savings are a factor of five to six over the standard ascii representation. The list is scanned using a modified linear search specially tailored to the incremental encoding; typical "user time" required by this algorithm is 40%-50% less than with naive search.}
}

EndNote citation:

%0 Report
%A Woods, James A.
%T Finding Files Fast
%I EECS Department, University of California, Berkeley
%D 1983
%@ UCB/CSD-83-148
%U http://www.eecs.berkeley.edu/Pubs/TechRpts/1983/5392.html
%F Woods:CSD-83-148