Interweave: A Fine Grain File Sharing Utility

Ben Y. Zhao and Benjamin Poon1
(Professors Anthony D. Joseph and John D. Kubiatowicz)
(NSF) ANI-9985250, (NSF) ITR CCR-0085899, and (UC MICRO) 00-049

A few years ago, file sharing utilities were the first to explore the power of peer-to-peer computing. Protocols such as Napster, Gnutella, Freenet, and KaZaa provided users with an easy way to share popular files, documents, and other data. First generation protocols had their limitations. Freenet and Gnutella did not provide guaranteed results. Napster had limited scalability due to its use of centralized servers, and Gnutella's scalability was limited by the use of broadcast query messages. All of these protocols relied on the popularity (and resulting large number of replicas) of documents to make them easily accessible via search.

The second generation of peer-to-peer systems, including Tapestry [1,2], Chord [3], content-addressable networks (CAN) [4], and Pastry [5], provide reliable decentralized object location and routing (DOLR) functionality while only keeping local state logarithmic to the network size. These systems scale well in design, and under non-failure conditions, guarantee that queries find the desired object if it exists in the network.

To exploit this deterministic file location property, we've designed and implemented Interweave, a fully decentralized P2P file location utility on top of Tapestry. Interweave allows users to advertise local files by publishing IDs through the Tapestry object location layer, where IDs are generated from specified keywords. Users can do multi-field searches on multiple keywords or filenames, and prune their results further with constraints on file size and last modified date. Most importantly, search results are precise, and a document can easily be found, even if there is only one copy residing in a global network. Finally, Interweave leverages Tapestry's network locality to limit its results by the network distance between the client and each replica.

The Interweave design includes no points of centralization and no inherent scalability limitations, paving the way for its deployment on a large/global user base. The current implementation is feature complete, and interoperates with the current Tapestry Java implementation.

Current work revolves around improving the usability of the Interweave client, and performance optimizations to reduce the number of results returned (and the associated bandwidth usage) of popular requests.

[1]
B. Y. Zhao, J. D. Kubiatowicz, and A. D. Joseph, Tapestry: An Infrastructure for Fault-Tolerant Wide-Area Location and Routing, UC Berkeley Computer Science Division, Report No. UCB/CSD 01/1141, April 2001.
[2]
K. Hildrum, J. D. Kubiatowicz, S. Rao, and B. Y. Zhao, "Distributed Object Location in a Dynamic Network," Proc. ACM Symp. Parallel Algorithms and Architectures, Winnepeg, Canada, August 2002.
[3]
I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan, "Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications," Proc. SIGCOMM, San Diego, CA, August 2001.
[4]
S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Schenker, "A Scalable Content-Addressable Network," Proc. SIGCOMM, San Diego, CA, August 2001.
[5]
A. Rowstron and P. Druschel, "Pastry: Scalable, Distributed Object Location and Routing for Large-Scale Peer-to-Peer Systems," Proc. IFIP/ACM Middleware, Heidelberg, Germany, November 2001.
1Undergraduate (EECS)

More information (http://www.cs.berkeley.edu/~ravenben/tapestry) or

Send mail to the author : (ravenben@eecs.berkeley.edu)


Edit this abstract