James A. Woods
EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-83-148
January 1983
http://www2.eecs.berkeley.edu/Pubs/TechRpts/1983/CSD-83-148.pdf
A fast filename search facility for UNIX is presented. It consolidates two data compression methods with a novel string search technique to rapidly locate arbitrary files. The code, integrated into the standard find utility, consults a preprocessed database, regenerated daily. This contrasts with the usual mechanism of matching search keys against candidate items generated on-the-fly from a scattered directory structure. The pathname database is an incrementally-encoded lexicographically sorted list (sometimes referred to as a "front-compressed" file) which is also subjected to common bigram coding to effect further space reduction. The storage savings are a factor of five to six over the standard ascii representation. The list is scanned using a modified linear search specially tailored to the incremental encoding; typical "user time" required by this algorithm is 40%-50% less than with naive search.
BibTeX citation:
@techreport{Woods:CSD-83-148, Author = {Woods, James A.}, Title = {Finding Files Fast}, Institution = {EECS Department, University of California, Berkeley}, Year = {1983}, Month = {Jan}, URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1983/5392.html}, Number = {UCB/CSD-83-148}, Abstract = {A fast filename search facility for UNIX is presented. It consolidates two data compression methods with a novel string search technique to rapidly locate arbitrary files. The code, integrated into the standard find utility, consults a preprocessed database, regenerated daily. This contrasts with the usual mechanism of matching search keys against candidate items generated on-the-fly from a scattered directory structure. The pathname database is an incrementally-encoded lexicographically sorted list (sometimes referred to as a "front-compressed" file) which is also subjected to common bigram coding to effect further space reduction. The storage savings are a factor of five to six over the standard ascii representation. The list is scanned using a modified linear search specially tailored to the incremental encoding; typical "user time" required by this algorithm is 40%-50% less than with naive search.} }
EndNote citation:
%0 Report %A Woods, James A. %T Finding Files Fast %I EECS Department, University of California, Berkeley %D 1983 %@ UCB/CSD-83-148 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1983/5392.html %F Woods:CSD-83-148