The XSet XML Search Engine and XBench XML Query Benchmark

Ben Yanbin Zhao and Anthony Joseph

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-00-1112
September 2000

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2000/CSD-00-1112.pdf

Internet-scale distributed applications (such as wide-area service and device discovery and location, user preference management, Domain Name Service) impose interesting requirements on information storage, management, and retrieval. They maintain structured soft-state and pose numerous queries against that state. These applications typically require the implementation of a customized proprietary query engine, often not optimized for performance, and costly in resources. Alternatives include using traditional databases, which can hamper flexibility and extensibility (both of which are critical requirements of Internet-scale applications), or LDAP (Lightweight Directory Access Protocol), which poses composability problems and imposes rigid structure on queries. This paper proposes a different approach, based upon the use of the eXtensible Markup Language (XML) as a data storage language, along with a main memory-based database and search engine. Using XML allows applications to use dynamic, simple, flexible data schemes and to perform simpler, but faster queries. The approach yields a single, common data management platform, XSet. XSet is an easy to use, main memory, hierarchically structured database with partial ACID properties. Preliminary measurements show that XSet performance is excellent: insertion time is a small constant value, and query time grows logarithmically with the dataset size. A portable Java-based version of XSet is available for download, both as a standalone application and as a component of the Ninja service infrastructure.


BibTeX citation:

@techreport{Zhao:CSD-00-1112,
    Author = {Zhao, Ben Yanbin and Joseph, Anthony},
    Title = {The XSet XML Search Engine and XBench XML Query Benchmark},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2000},
    Month = {Sep},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2000/5795.html},
    Number = {UCB/CSD-00-1112},
    Abstract = {Internet-scale distributed applications (such as wide-area service and device discovery and location, user preference management, Domain Name Service) impose interesting requirements on information storage, management, and retrieval. They maintain structured soft-state and pose numerous queries against that state. These applications typically require the implementation of a customized proprietary query engine, often not optimized for performance, and costly in resources. Alternatives include using traditional databases, which can hamper flexibility and extensibility (both of which are critical requirements of Internet-scale applications), or LDAP (Lightweight Directory Access Protocol), which poses composability problems and imposes rigid structure on queries. This paper proposes a different approach, based upon the use of the eXtensible Markup Language (XML) as a data storage language, along with a main memory-based database and search engine. Using XML allows applications to use dynamic, simple, flexible data schemes and to perform simpler, but faster queries. The approach yields a single, common data management platform, XSet. XSet is an easy to use, main memory, hierarchically structured database with partial ACID properties. Preliminary measurements show that XSet performance is excellent: insertion time is a small constant value, and query time grows logarithmically with the dataset size. A portable Java-based version of XSet is available for download, both as a standalone application and as a component of the Ninja service infrastructure.}
}

EndNote citation:

%0 Report
%A Zhao, Ben Yanbin
%A Joseph, Anthony
%T The XSet XML Search Engine and XBench XML Query Benchmark
%I EECS Department, University of California, Berkeley
%D 2000
%@ UCB/CSD-00-1112
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2000/5795.html
%F Zhao:CSD-00-1112