Horizontal Data Partitioning

Amol Deshpande
(Professor Joseph M. Hellerstein)
IBM Research Fellowship

Current database systems treat database relations as a whole when executing queries over them. For example, for executing the declaratively specified query R JOIN S JOIN T, a typical database system considers only two execution plans: (1) join R with S, and join the result with T, and (2) join S with T, and join the result with R. But in many cases, it might be significantly cheaper to partition the relations involved into multiple parts, and to use different execution plans for different partitions of the relations. For example, for the above example query, partitioning S into two partitions S1 and S2, and executing different plans for the two subqueries R JOIN S1 JOIN T, and R JOIN S2 JOIN T, may lead to faster execution times, as well as lower memory footprint.

Currently, we are trying to define the problem of horizontal data partitioning formally, and trying to develop exhaustive algorithms for finding the optimal partitioning of the database relations that minimize the number of intermediate tuples generated. Such an exhaustive algorithm will probably be infeasible in practice, and we are planning to develop heuristic algorithms to find good partitioning (using summary information such as histograms present on the relations).

More information (http://www.cs.berkeley.edu/~amol) or

Send mail to the author : (amol@eecs.berkeley.edu)

Edit this abstract