The BayesStore Project (BayesStore)
Eirinaios C. Michelakis, Daisy Zhe Wang, Michael Franklin, Minos Garofalakis and Joseph M. Hellerstein
Several real-world applications need to effectively manage and reason about large amounts of data that are inherently uncertain. For instance, pervasive computing applications must constantly reason about volumes of noisy sensory readings to accomplish tasks like motion prediction and modeling of human behavior. Another example arises in information extraction systems that try to convert textual information from multiple sources into structured data. In information extraction, it is important to record probabilities from the extracted and integrated data, because all the missing data and errors occur in the extraction and integration pipeline.
Such probabilistic data analysis requires sophisticated machine-learning tools that can effectively model the complex correlation patterns present in uncertain data. Unfortunately, to date, most existing approaches to probabilistic database systems have relied on somewhat simplistic models of uncertainty that can be easily mapped onto existing relational architectures. In those works, probabilistic information is typically associated with individual data tuples, with limited or no support for effectively capturing and reasoning about complex data correlations.
BAYESSTORE is a novel probabilistic data management architecture built on the principle of handling statistical models and probabilistic inference tools as first-class citizens of the database system. Adopting a machine-learning view, BAYESSTORE employs concise statistical relational models to effectively encode the correlation patterns between uncertain data, and promotes probabilistic inference and statistical model manipulation as part of the standard DBMS operator repertoire to support efficient and sound query processing.
We present BAYESSTORE's uncertainty model based on a novel, first-order statistical model, we redefine traditional query processing operators, and we define inference query operators to manipulate the data and the probabilistic models of the database in an efficient manner.