Scale-Independent
Query Processing
with PQL
Michael Armbrust, Kristal Curtis, Tim Kraska, Armando Fox,
Michael J. Franklin, David A. Patterson
http://www.cs.berkeley.edu/~marmbrus/

Motivation

Interactive Web Applications

Applications Grow Rapidly

Success Disaster

Databases Often the Culprit

Easy to add stateless application servers. [1]

Harder to scale stateful storage systems. [2]

[1] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia. 2010. A view of cloud computing. Commun. ACM 53, 4 (April 2010), 50-58. DOI=10.1145/1721654.1721672 http://doi.acm.org/10.1145/1721654.1721672 online
[2] John Adams. "Billions of Hits: Scaling Twitter" Chirp 2010. online

Traditional Relational Databases

Data Independence

Dirty Secret: Declarative languages hide complexity.

Traditional Relational Databases

Minimum Average Cost Optimization

Dirty Secret: Fails for the 0.1%! http://www.flickr.com/photos/stephen_downes/6246522381/

NoSQL 'Solution'

PIQL Solution

Guarantee performance for sophisticated relational queries through Scale Independence

Architecture Overview

PIQL uses a distributed key/value store [1] for:

[1] Michael Armbrust, Armando Fox, David Patterson, Nick Lanham, Haruki Oh, Beth Trushkowsky, and J. Trutna. SCADS: Scale-independent Storage for Social Computing Applications. In Conference on Innovative Data Systems Research (CIDR), 2009. online
[2] Beth Trushkowsky, Peter Bodík, Armando Fox, Michael J. Franklin, Michael I. Jordan, and David A. Patterson. 2011. The SCADS director: scaling a distributed storage system under stringent performance requirements. In Proceedings of FAST'11 online

Outline

Scale-Independent Optimization

Bound work required to answer all queries [1]

Given the set of all queries in an application $Q$. Let $Exec(q_i)$ denote the number of operations performed in the worst case by a query and $c_{ops}^{q_i}$ be a constant.

$$\forall q_i \in Q \exists c_{ops}^{q_i} : Exec(q_i) < c_{ops}^{q_i}$$
[1] M. Armbrust, K. Curtis, T. Kraska, A. Fox, M. J. Franklin, and D. A. Patterson. Piql: Success-tolerant query processing in the cloud. PVLDB, 5(3):181–192, 2011. online

Mapping Queries to K/V Ops

Example

SELECT *
FROM Posts
WHERE topicId = <id>
ORDER BY timestamp
LIMIT 10
$c_{ops}=1$ GetRange

Mapping Queries to Key/Value Ops

Index Scan

$c_{ops}=1$ GetRange

Mapping Queries to Key/Value Ops

Foreign Key Join

$c_{ops}=|ChildPlan|$ Gets

Mapping Queries to Key/Value Ops

(Sorted) Index Join

$c_{ops}=|Child Plan|$ GetRanges

Handling Harder Queries

Ex: Timeline queries

SELECT p.text
FROM Post p, Subscriptions s
WHERE s.owner = <currentUser> AND
      s.topicId = p.topicId AND
      s.approved = true
ORDER BY p.timestamp

Why is this safe to run? Developers have more information than optimizer.

Language Extensions

  • Bounding Data Returned

    PAGINATE turns an unbounded query into an unbounded set of bounded queries

Language Extensions

  • Bounding Data Returned

    PAGINATE turns an unbounded query into an unbounded set of bounded queries

  • Bounding Intermediate Results

    Cardinality constraints in the schema allows developers to inform the optimizer of natural relationship limits.

Bound Intermediate and Final Results

SLO Compliance Prediction

Convolve empirical distributions for each operator to predict overall latency.

Measure Single-Op
Performance

Predict Query SLO
Compliance

Performance Insight Assistant

Helps developers choose acceptable cardinalities

$ Q_{Timeline} = $
$ \quad \Theta_{IndexScan}( SubscrCard, SubscrSize) \textrm{ } * $
$ \quad \Theta_{SortedIndexJoin}( SubscrCard, PostCard, PostSize) $

Evaluation

TPC-W Scale Experiment

Throughput

Linear scaling as application grows.

Latency

Constant response time as application grows.

Future Work

Conclusion

By adding scale independence to the relational model, PIQL makes it easier to build web application that work even as they explode in popularity.

Techniques:

Learn More

http://www.cs.berkeley.edu/~marmbrus/

Questions?

/

#