Scale-Independent
Query Processing
with P
QL
Michael Armbrust, Kristal Curtis, Tim Kraska,
Michael Franklin, Armando Fox, David Patterson
Motivation
Target: Applications with SLOs for query execution time. (e.g. facebook, twitter, gmail, salesforce.com, digg)
-
Response time is crucial
Delays as small as 200ms affect user behavior [1]
-
Extremely rapid growth
# of tweets per day growing exponentially since 2007
-
Declarative queries hide underlying complexity
Possibly expensive queries expressed as simple declarative statements. Query plans change as statistics change leading to unexpected performance.
[1] E. Schurman and J. Brutlag. Performance Related Changes and their
User Impact. Presented at Velocity Web Performance and Operations
Conference, June 2009
The NoSQL 'Solution'
SQL databases are fundamentally non-scalable, and there is no magical pixie dust that we, or anyone, can sprinkle on them to suddenly make them scale.
Adam Wiggins - Heroku
Separate NoSQL Systems
-
Imperative Code Querying a Denormalized Key/Value Store
- Pros: K/V stores scale trivially, expensive queries are obvious.
- Cons: No optimization or data-independence.
-
Batch Jobs Written in Map/Reduce
- Pros: Process ever growing amounts of data in constant time by adding machines
- Cons: No optimization or data-independence. High latency for updates compared to incremental approaches [1].
[1] “Large-scale Incremental Processing Using Distributed Transactions and Notifications”, Daniel Peng, Frank Dabek, Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, 2010.
Overview: Scale Independence
A scale independent system guarantees consistent performance for any sized database.
Our techniques:
- Scale independent optimization (to appear in VLDB'12) [1]
- Materialized view selection and incremental maintenance (on-going work)
[1] Michael Armbrust, Kristal Curtis, Tim Kraska, Armando Fox, Michael J. Franklin, David A. Patterson: PIQL: Success-Tolerant Query Processing in the Cloud. PVLDB 5(3): 181-192 (2011)
Scale Independent Relational Architecture
The PIQL library relies on a distributed key/value store to provide:
- Scalability with predictable performance [1]
- Consistency
[1] Beth Trushkowsky, Peter Bodík, Armando Fox, Michael J. Franklin, Michael I. Jordan, and David A. Patterson. 2011. The SCADS director: scaling a distributed storage system under stringent performance requirements. In Proceedings of FAST'11
Scale Independent Optimization
-
Definition of Scale
- $|r|$ - the size of all base relations
-
Definition of Scale Independence
- read queries complete in a bounded amount of time independent of $|r|$.
-
Scale Invariants
- for every query the optimizer has can statically calculate an upper bound on the number of k/v operations independent of $|r|$.
Optimization
Select physical operators that limit total k/v store operations, creating indexes as needed.
SLO Compliance Prediction
Use empirical response-time distributions to predict query SLO compliance.
Evaluation
Workloads
- TPC-W - Standard database benchmark simulating a web store
- SCADr - Clone of twitter
Qualitative Analysis
- Introduced suggested cardinality constraints.
- PIQL automatically selected indexes.
- Prediction model accurately predicted SLO compliance.
Methodology
- Ran each benchmark on EC2, increasing both data and system size (weak scaling).
- Measured throughput and 99th percentile response time.
TPC-W Scale Experiment
Throughput scales linearly with the number of machines.
TPC-W Scale Experiment
The 99th percentile latency remains constant, independent of the size of the data and the number of machines.
Comparison with Cost-Based Optimization
SELECT * FROM SUBSCRIPTIONS
WHERE target = <target user> AND
owner IN <friends of current user>
What About Other Queries?
Some interesting queries can't be answered on-demand by a scale-independent relational system:
- How many followers does Oprah have?
- Show me the top 5 pictures that are tagged 'berkeley' and 'amp lab'.
- How many tweets have I tweeted?
Solution: Incrementally Maintained Materialized Views and Stream Processing
Scale Independent View Maintenance
-
Definition of Scale
- $|r|$ - the size of all base relations
- $\Delta r$ - the update rate for all base relations
- $q_{rate}$ - the rate of read queries in the system
-
Definition of Scale Independence
- Read queries complete in a bounded amount of time.
- Updates to base relations will be visible in all indexes and views in a bounded amount of time.
-
Scale Invariants
- Static upper bound on the number of k/v operations independent of $|r|$.
- Total storage must be smaller than $c_s |r|$.
- Total computation must be smaller than $c_c(\Delta r + q_{rate})$.
- Secondary structures don't cause contention as $\Delta r$ increases.
Consistency and Failure Handling
Consistency Semantics
- Developer decides consistency or availability per query.
- View maintenance occurs asynchronously and requires ACID transactions.
Failure Semantics
- Update visibility guarantees contingent on a functioning system.
- During a failure, developers can specify choice of availability over consistency for maintenance.
- Errors corrected automatically when failure is resolved.
View Selection Rules
Create a view such that:
- The view answers the query for all possible run-time parameters
- The answer to the query can be calculated by scanning a contiguous section of the view.
- The view can be efficiently maintained incrementally. [1]
[1] Ceri, S. and Widom, J. (1991) Deriving Production Rules for Incremental View Maintenance. In: VLDB.
View Selection Example
How many followers does a given user have:
SELECT count(*)
FROM Subscriptions
WHERE target = <user>
The resulting view counts the followers for every user:
SELECT target, count(*)
FROM subscriptions
GROUP BY target
Reasoning about View Size
Available Information on Relation Size and Growth
- Cardinality Constraints
- Foreign Key Constraints
- Uniqueness Constraints
- Timestamp Columns
Acceptable Rates of Growth
- Constant
- Linear
- Linear * Time (with optional retention policy)
Incremental View Maintenance 101
For each update run a delta query that computes any tuples that will be added or removed from a given view
The delta query can be derived by substituting the updated tuple in the view definition and simplifying the resulting relational algebra [1].
[1] Ceri, S. and Widom, J. (1991) Deriving Production Rules for Incremental View Maintenance. In: VLDB.
Query Delta Calculation
Given a query that shows all tweets in order for a given user's subscriptions:
SELECT t.*
FROM tweets t, subscriptions s
WHERE s.owner = <user>
t.owner = s.target
s.approved = true
ORDER BY t.timestamp
PAGINATE 10
Query Delta Calculation
The selected view computes the timeline for all possible users:
SELECT s.owner, t.timestamp, t.owner
FROM tweets t, subscriptions s
WHERE s.owner = <user>
t.owner = s.target
s.approved = true
ORDER BY t.timestamp
PAGINATE 10
Query Delta Calculation
When a new tweet (@t) is inserted the delta query distributes it to all followers:
SELECT s.owner, t.timestamp, t.owner
FROM @t, subscriptions s
WHERE @t.owner = s.target
s.approved = true
ORDER BY t.timestamp
PAGINATE 10
Query Delta Calculation
When a new subscription (@s) is inserted the delta query distributes updates the users timeline with any existing tweets:
SELECT s.owner, t.timestamp, t.owner
FROM tweets t, @s
WHERE t.owner = @s.target
@s.approved = true
ORDER BY t.timestamp
PAGINATE 10
Techniques for Bounding Update Visibility Delay
- Check to see if the optimizer can find a bounded query plan.
- Optimization could result in the creation of other views / indexes.
- Bounded operations for update query plan ensures bounded propagation delay for update
- If ordering is preserved and pagination present, use lazy view maintenance.
- First page will be updated with-in SLOs
- Subsequent pages will be updated eventually
- Otherwise, attempt to parallelize the update
Techniques for Limiting Contention
Aggregations
- For bounded ranges => safe independent of aggregate function
- For unbounded ranges => must be commutative. Execution engine automatically uses partial aggregation to avoid contention.
Conclusion
By using scale independence as the objective function for optimization, PIQL guarantees consistent performance even for applications with exploding data volumes.
Techniques:
- Scale independent plan selection
- PIQL language extensions
- Automatic index / view selection
- Scalable incremental view maintenance
- SLO compliance modeling
Questions?
Bounding Data Returned
Pagination and Limit DDL
- Adding
PAGINATE [N] turns an unbounded query into one that returns N results at a time.
SELECT * FROM Tweets
WHERE owner = <user>
ORDER BY timestamp DESC
PAGINATE 10
Implemented using serializable client size cursors
Also supports LIMIT [1] clause when only the first set of results is required
[1] M. J. Carey and D. Kossmann. On saying “Enough already!” in SQL.
SIGMOD Rec., 26(2):219–230, 1997.
View Selection Example #1
Show the first 10 objects that are tagged with two given words:
SELECT t1.id
FROM tags t1, tags t2
WHERE t1.tag = <tag1> AND
t2.tag = <tag2> AND
t1.id = t2.id
PAGINATE 10
The resulting view pre materializes all pairs of tags for a given object:
SELECT t1.tag, t2.tag, t1.id
FROM tags t1, tags t2
WHERE t1.tag = <tag1> AND
t2.tag = <tag2> AND
t1.id = t2.id
PAGINATE 10
View Size Example
SELECT t1.tag, t2.tag, t1.id
FROM tags t1, tags t2
WHERE t1.id = t2.id AND
PAGINATE 10
Size without cardinality constraint: $|tags| * |tags|$
Size with constraint on tags per object $|tags| * c$
←
→
/
#