This project examines the feasibility of peer-to-peer full-text keyword search of the Web. Two classes of keyword search techniques are in use or have been proposed: flooding of queries over an overlay network (as in Gnutella), and intersection of index lists stored in a distributed hash table (DHT). A simple model of capacity and workload suggests that the Internet does not have enough capacity to make naive use of either of these techniques attractive for Web search. We then present a number of existing and novel optimizations, such as caching and precomputation, bloom filters, gap compression and adaptive set intersections  with clustering. We estimate their effect on performance, and conclude that in combination these optimizations would bring the problem to within an order of magnitude of feasibility. To achieve the last order of magnitude, we propose using Fagin's algorithm (FA)  in conjunction with a ranking function to generate incremental results. This may allow savings in communication if users terminate their queries early.
We intend to build a DHT-based search engine, and compare its performance characteristics with Gnutella and KaZaA.