CS262B Advanced Topics in Computer Systems
Spring 2009
Paper Title: Bigtable: A Distributed Storage System for
Structured Data
Author:
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.
Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber
Date: 2006
Novel Idea: The Bigtable team has provided a simple data model
where
access is by row or column name to achieve Internet scale, high
availability, reliability, and performance.
Main Result(s): The
authors have designed & implemented the Bigtable service, which
hosts large-scale structured data and is used extensively by more than
sixty Google products.
Impact: This technology is widely used within Google.
Evidence:
The authors provide a description of the design and implementation of
Bigtable. The main empirical evidence is Figure 6, which shows
that
read/write rates scale nearly linearly with the number of tablet
servers. They also describe the Google products that make use of
Bigtable to show that it is useful.
Prior Work: Some similar work that also tries to provide
Internet-scale storage includes distributed hash tables and
DBMSs. The
problem description for DHTs is quite different, due to the high churn
and untrusted clients. DBMSs are not able to scale to the number
of
clients that Bigtable is; Bigtable's interface is lower-level, and they
do not have to support complex queries.
Competitive work: As in the Chubby paper, the Bigtable authors
remark
that their service has some overlap with Boxwood but differs mostly in
that Boxwood is meant to be a layer that supports the development of
higher-level services such as file systems, while Bigtable is intended
to serve client applications.
Reproducibility: This work would be difficult to reproduce since
they
understandably don't provide many details about the
implementation.
Also, to do so at a large scale would present many challenges.
Question:
The authors mention that a novel aspect of this work is that the query
set supported is simple (data access is by row/column name).
Could
similar results have been obtained using a standard RDBMS and
restricting the queries issued?
Criticism: In the Lessons section, the authors discuss their
efforts
developing the tablet server membership protocol. Right after
saying
how important it is to use simple designs, the authors say that their
first protocol did not work because it was simple. They should
have
qualified their prescription of simple designs -- clearly, not any
simple protocol will do.
Ideas for further work: I would like to deploy a RDBMS and
restrict
the query set supported so that it would be roughly comparable to
Bigtable and then compare the performance and scalability of the two
systems.
Paper Title: Bigtable: A Distributed Storage System for Structured Data
Author: Chang, Dean, Ghemawat, Hsieh, Wallach, Burrows, Chandra, Fikes,
Gruber
Date: 2006
Novel Idea:
By providing a less rich interface than traditional databases, one can
give applications a distributed database
with better performance and scalability and still enough flexibility
for their tasks.
Main Result(s):
The author's distributed database scales to hundreds of servers
with approximately 50% overhead compared to the 1-server case and is
able to provide the persistent data backend for several highly utilized
services.
Impact:
Shows the utility of developing distributed databases from a more
minimal featureset, rather than (as with the "traditional" approach)
making support for everything a commercial relational database can do a
goal.
Evidence:
The authors have deployed their distributed database as the back
end for several Google projects in production (from which they gain
anecdotes indicating its practical usefulness) and measure its
performance for random and sequential accesses using 1 to 500 nodes and
a synthetic workload.
Prior Work:
This uses the Chubby lock service for coordination and GFS for
storage. The concept of providing a distributed data store with this
sort of interface is similar to earlier work on DHTs, though the
authors were blessed with a friendlier environment.
Competitive work:
More recently: Dynamo, Cassandra [which serve a similar
application, but split up the data in a Chord-like fashion rather than
relying heavily on a distributed filesystem and a distributed lock
service for these tasks];
Traditional distributed databases [for which providing full
transaction support is usually important, and thus have more
sophisticated coordination between servers]
Reproducibility:
Hard, especially given the dependence on other services. Given an
implementation (and enough machines), the performance experiments seem
well enough described.
Question:
Is this (or what is) enough functionality on top of which to implement
a "complete" distributed database?
Criticism:
Though the authors attributed some of their less-than-ideal scaling
to their inadequate ability to do load balancing ("rebalancing is
throttled .... and the load generated ... shifts around"), it is
unclear how load balancing is actually being performed in the system or
how the authors came to the conclusion that an imbalance in load
prevented good scaling.
Ideas for further work:
- Everything SCADS is doing/wants to do. (e.g. supporting queries
involving joins on this.)
- How much would Cassandra/Dynamo/other alternatives lose/gain if
they were implemented on top of a general-purpose distributed FS?
Bigtable: A Distributed Storage System for Structured Data
Chang et al
Novel Idea: They create a large scale distributed storage system, based
on key value pairs. Their system is designed for ridiculously large
scale datacenters. While they don't mention it much, I think one of
the big hidden ideas is that the clients are exposed to locality of the
data, and thus how performance is affected by the manner of their use.
Comparatively, the traditional relational DB view hides performance
from the client (to a greater degree). Likewise, their system gets
away with a bit more since they only provide exactly what they need -
not the full blown relational DB.
Main Result: It works. Like many of the google papers, they are really
just telling us about something they already do and practical things
they learned from a real deployment. They do show some evidence of the
scalability of the system.
Impact: For them, it allows the reuse of a key-value pair store for all
of their internal apps, instead of starting from scratch. For everyone
else, it sets the bar high.
Evidence: They have some minor benchmarks that show scanning and
sequential accesses scale well, and random reads are lousy, as
expected. They come off as not caring about convincing us it works, so
they don't really give us a lot of performance info. Some of it is
just example numbers of how large these things are.
Prior Work: Boxwood, and C-Store
Competitive Work: commercial DBs (oracle, IBM), and later the hadoop
project started HBase, which is their version of big table.
Reproducibility: Lots of engineering. See Hadoop's HBase.
Question: I'm not too familiar (beyond scanning Wikipedia) with Bloom
filters and friends.
Criticism: It was a little boring, actually. Maybe because I read this
one second, and also read it a long time ago. And the chubby paper was
more honest about whether or not it was "research."
Ideas for further work: I think this inspired SCADS in the radlab a
little.
Paper Title: Bigtable: A Distributed Storage
System for Structured Data
Author: Fay Chang et al (Google)
Date: OSDI 2006
Novel Idea
This paper describes Bigtable which is a
distributed storage system developed at good
Main Result(s):
They are able to create a system with good
scaling by limiting the interface and not providing a full rational
database.
Impact:
I think it could be important. As we
create more data, having
databases that provide limited interfaces to get better scaling and
performance might be worthwhile.
Evidence:
They did a series of random and sequential
writes on different
numbers of nodes and measured the performance to show the
scaling.
They also claimed success by having a user base, but they are in a
corporation so their users have more pressure to use internal tools.
Prior Work:
They are related to other work on distributed
hash tables such as Cord or DDS.
Competitive work: It's really hard to say
because they are
working in an internal system with a closed software stack.
Although
it is similar to other database systems that have masters and workers
and try to have high availability.
Reproducibility:
Definitely not easily since they are using
other internal google tools such as Chubby.
Question:
How do we compare these corporate designs to
those in research?
Criticism:
It's hard to see if this is a great new idea or
just an engineering solution to their google needs.
Ideas for further work: It didn't really inspire me
in too many ways.
| Paper Title: |
bigtable |
| Author: |
Google |
| Date: |
OSDI 06 |
|
Novel Idea
|
Novel's
not the right thing for an industry paper, generally. I don't know a
lot of the related work, but I hadn't seen any other system that didn't
just do key-value storage. This is just a system only google could
build as it relies on GFS and chubby.
|
Main Result(s):
|
I'm not really sure about this either. People use it? It
didn't scale spectacularly...
|
Impact:
|
Something
like 60 google web services use it. I would expect that this has
informed future systems, but I don't know any (aside from maybe scads)
that use a table store idiom.
|
Evidence:
|
Again, 60 services. That doesn't seem huge, but this was 06.
|
Prior Work:
|
DDS is the immediate one, though it did key/value. More
appropriate prior work would be the distributed databases.
|
| Competitive work: |
Oracle sells very expensive things for this exact purpose.
|
Reproducibility
|
None at all. This is proprietary built on proprietary systems
using proprietary hardware and networking.
|
Question:
|
When
does the scaling end? You know google could have run this to 50k
machines. Hell, when I was there I ran 1000 hadoop instances before
that could scale. 500 is a pretty arbitrary end of the graph...
|
Criticism:
|
I
didn't think that the paper was well written. There was little
discussion of WHY they made the decisions they did. I see that chubby
provides certain properties, but it seems like you're stacking a lot of
latency and variance. Why the B-tree? Why not a DHT, or for that matter
just a normal hash table? It's very industry, we're not worried about
what's best, just what works. It's sad that this system wouldn't get in
if it were from any other place.
|
| Ideas for further work: |
I think that a lot of the individual pieces can be optimized.
Too bad it's not open enough to do that. |
| Paper Title: |
Bigtable: A Distributed Storage System for Structured Data |
| Author: |
Chang et al. |
| Date: |
2007 |
|
Novel Idea
|
BigTable's data model gives
clients dynamic control of data layout and
format, allowing them to reason about the locality properties of the
data when selecting access schema. This control comes at the cost of a
fully relational data model.
|
Main Result(s):
|
The main goal of this paper is to document the data model,
client API,
and underlying implementation of BigTable. The authors point out that
BigTable has been use din production at Google with table sizes of up
to 800 TB. The provide a small analysis of the effect of number of
tablet serves on the throughput of various operations.
|
Impact:
|
The implementation has been used successfully at Google. The
paper has
also inspired open source reproductions of the service that have seen
deployment in many contexts. The recognition that control of locality
and availability may in some cases trump supporting a full relational
model has had an impact on distributed systems research.
|
Evidence:
|
The main emphasis of the paper is on describing concepts
employed in
the system. Evidence is mainly supplied in the form of anecdotal
descriptions of deployment and use in well known Google services. Some
throughput numbers and installation size statistics are also reported.
|
Prior Work:
|
The authors cite the Boxwood project's distributed services,
work on
WAN services especially DHTs, and services based on key-value pairs.
|
| Competitive work: |
Oracle's RAC and IBM's DB2 and C-store are systems in the
same space
(but with relational guarantees). More recently, Hypertable/Hbase and
other open source implementations are more widely used outside of
Google.
|
Reproducibility
|
It is extremely difficult to reproduce the size of the
instances that
Google uses. However, open source developers have already succeeded in
mimicing the functionality of BigTable, which speaks to the clarity of
the conceptual presentation provided by this paper.
|
Question:
|
To what degree are the demands
placed on the database by Google's apps
(e.g. single row queries) representative of internet applications as a
whole? Did they influence the design to such a degree that BigTable
cannot be adapted for use by applications with more diverse
requirements?
|
Criticism:
|
It is impossible ofr BigTable to
scale outside of the data center, due
to the centralized nature of the single master server, which also
presents a possible availability bottleneck.
|
| Ideas for further work: |
Increasing the expressible nature of the schema to handle
additional locality and consistency tradeoffs. |
Paper Title:
Bigtable: A Distributed Storage System for Structured Data
Author:
Chang
Novel Idea:
If a different interface is provided than that presented by relational
databases, the system can achieve significantly greater performance.
For example, the application may provide information about which
portions of the data to keep in memory for faster accesses.
Main Results:
The authors demonstrate a system that scales well in the cluster
environment and is able to serve as a storage mechanism for a variety
of Google products such as Google Earth and Personalized Search.
Impact:
This system effectively challenges the assumption that relational
databases are the best approach to storing structured data. It also
provides evidence that designs should focus on simplicity and
robustness.
Evidence:
Benchmarks are provided for the performance of Bigtable during
random/sequential reads and writes for different cluster sizes. We see
the potential benefits when the application specifies the parts of the
data that should be mapped into memory. Lastly, we see that as the
number of nodes is varied from 1 to 500, the throughput increases by a
factor of 100.
Prior Work:
Bigtable utilizes the services provided by GFS and Chubby. It also
borrows some ideas from Log-Structured Merge Tree. Lastly, Bigtable
makes use of some of the load-balancing techniques present in
shared-nothing databases.
Competitive Work:
Distributed hash tables seem to have some similar goals present in
Bigtable. However, Bigtable operates in a different environment with
different operating assumptions and argues for a richer API. C-Store,
which operates like a relational database, seems to be the most direct
competitor to Bigtable. However, C-Store tends to focus on maximizing
read performance.
Reproducibility:
This work would be very difficult to reproduce for two reasons. The
first one is that it would be very challenging to get access to the
appropriate cluster. The second reason is that Bigtable depends on many
other systems such as Chubby and GFS.
Question:
The authors of this paper claim that many of the assumptions about
clusters such as the absence of network partitions are not valid when
developing robust systems. To what extent to does it invalidate the
other papers we have read that explicitly make these types of
assumptions?
Criticism:
The experiments in this paper do not provide a comparison of how other
systems would perform in a similar environment. As a result, it is
difficult to assess the actual performance gain compared to a
relational database.
Ideas for Future Work:
There seems to be some potential for improvement in how Bigtable
scales. I would like to develop an improved load rebalancing mechanism
for this system that avoids thrashing yet achieves better performance
than that seen here.
Bigtable:
Novel Idea:
The data model they have chosen, as well as the access patterns they
provide to the data are quite novel in this paper. The row,
column family, column model is an interesting shift from the relational
model, allowing more flexible structuring which can work better for
some data sets (like their crawl table). Their access scheme is
essential primary key only, but that this works for most of their
applications is interesting. Having a single master is a novel
design choice for a distributed system like this.
Main Result:
The authors of this paper have created a highly available, scalable
persistent storage system. Their system automatically handles
failures in data nodes, and supports a number of large Google
services. In addition, this paper establishes a data model (the
row->column family->column model) that seems well suited to a
number of applications and has been adopted by other storage systems.
Impact:
Bigtable shows that RDBMSes are not always the right tool for data
storage. Especially in an environment with load rates like Google
has where there is no need for complex queries or transactions.
The impact has been the creation of a number of 'copy-cat' systems from
Cassandra to Hypertable, as well as a push to make these 'not quite
ACID' (but highly scalable and available) systems able to back a wide
variety of applications.
Evidence:
This paper can present strong evidence simply because it's from
Google. Given the wide variety of Google applications that are
running on Bigtable we can reasonably assume it works quite well and
that the data model suits programmers. Since Bigtable presents a
fairly limited data access model their tests also cover the main access
patterns. They show that they scale well for these patterns and
that they have high throughput.
Prior/Competitive Work:
Bigtable builds on a large amount of previous work. Database
research for things like atomic reads/writes, b-trees and logging, DHTs
for hash based distributed content location, and parallel databases for
data partitioning strategies. A big difference between Bigtable
and somewhat similar systems like Chord is that Bigtable assumes all
its machines are in one admin domain, and that they will all have fast
interconnects. As such it is able to remove much of the overhead
associated with dealing with wide area networks and as a result is much
faster and more scalable than any of the DHTs that came before it.
Reproducibility:
Bigtable has been reproduced. Hypertable (www.hypertable.org)
is an opensource implementation that is quite mature, stable and
fast. It works in a very similar manner to Bigtable (they also
have their own version of Chubby, and run on top of the hadoop file
system) and are able to achieve excellent performance. It's hard
to compare exactly however, since Google hasn't published detailed
Bigtable performance numbers (to my knowledge).
Question:
Does the Bigtable data model make sense for a large variety of
applications, or was it simply convenient for their WebCrawl
data? Certainly other apps have used it, but perhaps that was
despite the data model rather than because of it.
Criticism:
I would liked to have seen some data about how well Bigtable performs
with failures. They claim very low failure rates, but in a less
reliable environment things like a single master could cause
pathological cases. I would like a story about these sorts of
situations.
Ideas for further work:
The SCADS project is pretty much pure future work from this
project. Some obvious ideas like secondary indexes are the low
hanging fruit. At a deeper level however, Bigtable is something
of a move away from the data interdependence that the database
community has championed for so many years. Giving programmers
control over data placement, for example, means the system can't
optimize for usage patterns it might be able to detect. A
direction SCADS is going in is removing developer control from things
like replica count and data placement and using clever algorithms to do
a better job, and to allow the system to adapt to dynamic workloads.
Paper Title: Bigtable: A Distributed Storage System for Structured Data
Author: Fay Chang et al.
Date: OSDI 2006
Novel Idea
Let application influence the way data is stored and accessed. (Both locality and in memory caching policy)
Main Result(s):
Much improved scalability compared to traditional databases
Interface not too awkward to use
Impact:
Pretty much all google application now uses bigtable. Open source project is starting open implementation of similar systems, and get adopted by other large internet companies.
Evidence:
Microbenchmark: showing that it does not scale linearly due to load balancing issues
Macrobenchmark: still better than SQL
Prior Work:
Shared-nothing database, distributed algorithms
Competitive work:
Other column based databases
Reproducibility
Since there is now an open source implementation of it, it is easy to reproduce these experiments
Question:
I think secondary index is not going to be free. (some scalability is going to be sacrificed.) Is it really necessary in the real applications?
Criticism:
The lack of generalized query makes the integration of Bigtable with other SQL-based database difficult. For an application with multiple needs, the current approach does not provide easy ways for these two types of database to coexist and have data flow in both directions.
Novel Ideas:
- A non-relational data model and partitioning strategy that make it
possible to create efficient and very large tables.
Results:
- BigTable scales, and performs well enough to power a bunch of popular
Google services.
Impact:
- There are about 12 open-source clones of this project now, in various
stages of completion.
Evidence:
- Microbenchmarks showing scaling and performance of various
operations. Discussion about scale and throughput achieved in real
applications.
Prior Work:
- DDS was certainly the progenitor of cluster-based non-relational
data stores. Parallel databases (e.g. column stores) also contributed
to BigTable, though none had the same kind of consistency guarantees
and the same data model.
Competitive Work:
- It's interesting to compare the design choices in BigTable with
Dynamo. Dynamo seems to lean more towards neat algorithms, use of
randomness, etc than BigTable, but the design choices in BigTable are
very pragmatic.
Reproducibility:
- Nobody's reproduced the engineering effort that went into the
implementation yet, but the experiments themselves should be
reproducible given it.
Question:
- Why did they choose the particular data model they did (rows, columns
and timestamps)? Are there others that could be useful?
Criticism:
- It would've been nice to explain why they went with BigTable
instead of, say, partitioned MySQL or BerkeleyDB for certain apps.
Google is also a huge user of MySQL.
Ideas for Future Work:
- BigTable is quite different from other work we've read because
it's very modular and reuses some solid components (GFS, Chubby, etc).
This is a nice thing to be able to do when engineering large (or small)
systems. Is there some set of abstractions, libraries, building blocks,
etc one can provide for writing data center apps that would make it
easier to build these kinds of systems? It might be interesting just to
list these first.
Paper Title: Bigtable: A Distributed Storage System for Structured Data
Author: Chang, Dean, Ghemawat, Hsieh, Wallach, Burrows, Chandra, Fikes,
Gruber.
Date: 2006
Novel Idea: They describe the design and implementation of a
distributed storage system for structured data. The main novelty
compared to traditional databases is that they define a simple data
model (instead of supporting a full relational data model) that
supports dynamic control over data layout and format and also gives
clients some locality control for the data. Bigtable does not follow
the typical convention of having a fixed number of rows. It is a
sparse, distributed, persistent multi-dimensional sorted map. It
achieves scalability, high performance, flexibility and high
availability. They describe many smaller techniques in their
implementation as for example their minor, major and merging
compaction. It is hard to know from reading the paper, which of these
are new.
Main Results: Bigtable achieves scalability, high performance, high
availability and flexibility. They further showed that Bigtable has
many applications, at the time of the paper 60 project used it.
Impact: Bigtable seems to be a great way to store very large amounts of
(semi-)structured data. Unlike commercial databases, it can handle
scaling issues well and offers many other advantages: flexibility,
high-performance, dynamic control over data layout and format etc. The
only issue I see with using it widely outside of Google, is to actually
reproduce Bigtable.
Evidence: They use a set of benchmarks (sequential write, random write,
sequential read, random read, scans and random reads in-memory) to test
the performance of a single tablet-server. They check if the system
scales well, by analyzing the throughput and the performance as they
increase the number of tablet servers. They study how the
unavailability of Chubby affects the availability of Bigtable. Finally,
they present several real applications that use Bigtable.
Prior Work: Google File System: stores the persistent data.
Chubby: to ensure that there is at most one active master, to store the
bootstrap location of Bigtable data, to discover tablet servers and
finalize tablet server deaths, to store Bigtable schema information and
to store access control lists.
Map Reduce: used to read and write BigTable data.
Sawzall: BigTable can execute client supplied scripts on the server.
The scripts must be written in Sawzall, a language developed at Google
to process data.
Bloom filters: they use Bloom filters in SSTables to reduce the number
of disk accesses.
Competitive Work: no comparison to competitive work.
Reproducibility: It would be very hard to reproduce it since it is a
very large system that builds on many previous systems.
Criticism: Maybe one negative point is that they don’t support
general-purpose transactions. They also needed to change the interface
of traditional databases. This is not necessarily something negative,
but it forces users to get used to the new interface.
Ideas for Future Work:
Paper Title: The Chubby lock service for loosely-coupled distributed
systems.
Author: Mike Burrows
Date: 2006
Novel Idea: As they make it clear in the introduction, this paper is
not about new algorithms and techniques, but describes more an
engineering experience. Its main contribution is to describe the design
and implementation of a centralized lock service in a large
loosely-coupled distributed system. It is a very practical paper and
takes care of client’s implementation complexities and corner cases for
scaling and failures.
Main Results: Chubby is a lock service that allows its clients to
synchronize their activities and to agree on basic information about
their environment. Chubby achieves reliability, availability,
simplicity and scales to a relatively large number of clients. It
doesn’t focus as much on high-performance and storage capacity. At the
time of the paper, Chubby had already been used for several Google
services as a lock service and as a name server.
Impact: I think it will have large impact. Chubby handles many
practical issues to achieve scalability and clarity. It is also very
simple and a clean basis to add modifications on top as the
requirements of services change.
Evidence: They provide anecdotal evidence to their design choices
throughout the paper, by indicating how they first implemented it and
what problem this caused and why they needed to modify the design. In
the same way, they do not present detailed measurements of the system,
but only provide some anecdotal numbers to indicate typical causes of
outages in their cells, how often they have lost data, request latency
at their servers, RPC read and write latencies measured at the client,
general statistics for the Chubby cell (what kind of files, how many
clients hold locks etc.).
Prior Work: Chubby is based on well-known ideas: distributed consensus
among a few replicas for fault tolerance, consistent client-side
caching to reduce server load while retaining simple semantics, timely
notification updates, a familiar file system interface etc.
Competitive Work: no comparison to competitive work
Reproducibility: I think that some of the ideas in Chubby are presented
in enough detail to reproduce them.
Criticism: I think at several occasions, more quantitative measurements
would have made the paper more convincing. For example, I wonder if
their invalidation strategy is actually scalable for large number of
clients. They do not really provide any support for this claim.
Ideas for Future Work: There are some little extensions that one could
add to the paper. For example, implement the hybrid scheme that
switches caching tactics (block calls that access the node during
invalidation vs. treat the node as uncatchable while cache
invalidations remain unacknowledged) if there is an overload.
|
Paper Title:
|
Bigtable: A Distributed Storage System for Structured Data
|
|
Author:
|
Chang, Dean, Sgemawat, Hsieh, Wallach, Burrows, Chandra, Fikes
& Gruber
|
|
Date:
|
2006
|
|
|
|
|
Novel Idea
|
The authors describe Bigtable, a distributed storage system
used at Google for a wide variety of applications. Bigtable presents a
different abstraction of data than the other distributed storage
systems we've looked at, allowing a fair amount of control over
locality, memory residency, and version control over particular items
in the table.
|
|
Main Result(s):
|
Google has successfully implemented the Bigtable system and is
using it for a variety of services, including Google Earth,
personalized search, and Google Analytics. The authors show that the
system scales well and is suitably efficient for real-time data
services.
|
|
Impact:
|
Bigtable's achievements in scalability and application author
control are commendable (though perhaps near duplications of work that
we've read in recent weeks). The work on optimizing for "sparse" and
multi-dimensional maps is interesting and should influence distributed
storage system work in the future. The Chubby lock service is novel and
useful, and the subject of the next paper we're reading.
|
|
Evidence:
|
The authors ran a series of benchmarks to determine the
effectiveness of the Bigtable data service system. The bulk of these
experiments involved running rand read/writes and sequential
read/writes against varies sizes of Bigtable clusters. These tests show
that Bigtable is effective, and that it scales reasonably well (though
far from perfectly). Secondly, the authors examine the performance of a
single server, standing alone, and show that the performance is
acceptable and that the bottlenecks are primarily related to the
network stack on an individual node.
Finally, the authors discuss a number of Google services that
are currently using the Bigtable system as a backend. They argue that
the effectiveness of these services (and some details of their Bigtable
access patterns) is strong evidence for the performance of Bigtable
itself.
|
|
Prior Work:
|
Bigtable builds on a lot of work in distributed hast tables,
such as Tapestry and Chord. Sadly missing from the list of related work
in the paper, but obviously relevant, is the work on TACC, BASE, and
DDS done at Berkeley. The performance gains due to locality in Bigtable
are similar to those seen in KDB+, MonetDB, and Ailamaki.
|
|
Competitive work:
|
Bigtable has strong similarities to Boxwood, but is focused
more directly on providing services to client applications, rather than
on building high-level data abstractions. Bigtable is different from
most commercial databases in the same way, providing a direct link to
underlying cluster technology for client applications.
|
|
Reproducibility
|
I could not reproduce these results directly without being
much more friendly with Google than I currently am, as I imagine they
protect this source code fairly tightly. That said, one could build
their own system with comparable goals, embodying the design principles
discussed in this paper.
|
|
Question:
|
A lot of this work seems less than entirely novel. I feel like
I'm obviously missing something. I'm curious to see what that something
is.
|
|
Criticism:
|
Google seemed to reinvent the wheel a lot in the context of
this work. While I understand this from a corporate perspective, it
seems odd from an academic/publishing perspective. I'm surprised that
this paper was accepted to OSDI with such weak connections to previous
work, but less that it won a best paper award.
|
|
Ideas for further work:
|
As with several other papers that we've read this semester,
the authors tout the client application API as easy to use and useful,
then go on to provide absolutely no evidence to support this claim. I'm
growing increasingly intrigued by this phenomenon, and would be
interested in finding a way to study it.
|
Novel Idea
<Describe in a sentence or two the new ideas presented in the paper>
This paper describes a storage system designed for scalability and
availability. The system differs from traditional relational databases
in that it stores semi-structured data and provides only a lookup
interface. The system maps a row name, column name and time value to
an arbitrary value string. This structure allows each row to have a
variable number of columns and potentially multiple versions.
Scalability is achieved through data partitioning by rows.
Main Result(s):
<Describe in a sentence or two the main results obtained in the paper>
Web services require a light-weight distributed storage model for
scalability. BigTable provides the required scalability with good
performance, however, the scalability is not completely linear. There
are several bottlenecks including load imbalance and network
saturation when performing small random reads.
Impact:
<What is the importance of these results. What impact might they
have on theory or practice of Computer Systems>
Systems based on BigTable have been adopted by many web services aimed
at high scalability. This paper has also sparked further research into
high performance parallel data storage. Finally, it has brought
attention to previous work on parallel databases, specifically on how
they can be improved for todays applications.
Evidence:
<What reasoning, demonstration, analytical or empirical analysis
did they use to establish their results>
The authors provide a limited set of scale-up benchmarks showing a
slightly less than linear increase in read/writes per second versus
number of tablet servers. They also provide convincing anecdotal
evidence by listing services that use BigTable, including Google
Earth, Analytic, Crawl and Orkut.
Prior Work:
<What previously established results does it build upon and how>
This work builds heavily on prior research into parallel databases.
The key scalability technique of horizontal data partitioning and the
semi-structured table layout comes from the database literature.
However, this project removed many unnecessary DB features such as the
SQL interface, query optimizer and all operations besides lookups.
Competitive work:
<How to the compare their results to related prior or contemporary work>
Distributed hash tables can be seen as a competitive work. The key
difference is that DHTs such as Chord assume a high churn rate and
untrusted peers. So BigTable is able to achieve better overall
performance.
Reproducibility
<Could you reproduce the findings? If so, how? If not, why not?>
Unlikely. The paper describes the design at a very high level without
describing specific policies for load balancing, partitioning, etc.
BigTable also relies on two other Google projects: Chubby and GFS.
Question:
<A question about the work to discuss in class>
How does BigTable performance compare to "Scalable, Distributed Data
Structures for Internet Service Construction"?
Criticism:
<A criticism of the work that merits discussion>
The reliability and availability of the system was not thoroughly
evaluated. Also, further scalability/performance benchmarks would be
useful.
Paper Title: BigTable: A Distributed Storage System for Structured Data
Author: Fay Chang
Date: OSDI '06
Novel Idea
BigTable is a compressed, high performance, and proprietary database system
built on Google File System (GFS), Chubby Lock Service, and a few other Google
programs. Its primary goal is to store a wide variety of data types while
mantaining the ability to scale to unprecedented size.
Main Result(s):
Google's reasons for developing its own database include scalability, and
better control of performance characteristics. Microbenchmarks show that
these cahracteristics are met. Unfortunately, however, perfocrmance doesn't
scale completely linearly.
Impact:
It is used by over 60 different google products including Google Reader,
Google Maps, Google Book Search, "My Search History", Google Earth,
Blogger.com, Google Code hosting, Orkut, and YouTube.
Evidence:
Only microbenchmarks were provided, and no comparison to simialr database type
systems was provided. Microbenchmarks included thoguhtput fo
reads/writes, as well as
Prior Work:
Based on Google File System (GFS), Chubby Lock Service.
Competitive work:
Yahoo! hadoop is working on a clone of this service.
Reproducibility:
No ability to actually run experiments since the source code is not available.
Google probably wants to keep it proprietary anyway.
Question:
The authors mention that the itnerface to BigTable is a bit hard to get used
to. Would it be possible to extend big table to provide semantics similar to
that of traditional relational databases? Would this even be necessary?
Criticism:
I would have liked to see a comparison to an application imlemented using
BigTable and a traditional database.
Ideas for further work:
None
I presented a summary of and led discussion about this paper in Ion's
294 course on Cloud Computing yesterday. Ironically the reading
assigned for his class yesterday was the same set of papers assigned
for today. I also blogged my notes and thoughts about BigTable for his
course at http://netw0rk-king.blogspot.com/2009/02/googles-bigtable.html
Novel Idea (Describe in a sentence or two the new ideas presented in
the paper):
A scalable, highly available, new take on simple semantic table storage
built on top of GFS and Chubby.
Impact (What is the importance of these results. What impact
might they have on theory or practice of Computer Systems):
They have released Google AppEngine which exposes BigTable as the
primary data store. AppEngine is Google's first and only public facing
Cloud Computing offering.
Evidence (What reasoning, demonstration, analytical or empiricial
analysis did they use to establish their results):
They present an evaluation in which they run GFS and the whole
BigTable stack on the set of 1786 machines. Since they allow the choice
between disk and memory storage, they obviously see a significant
difference in perf. (seq reads/writes and random writes all fairly
clustered, peaking at about 1M per sec (1000byte values)).
Prior Work (What previously established results does it build upon and
how):
-They reference DHTs which provide much more functionality than
they want in the form of handling variance in bandwidth, churn, etc.
while not having the indexing, versioning, and other functionality they
want, built in already. They of course borrowed much from DMBS research
of the last 30 years.
Competitive work (How to they compare their results to related prior or
contemporary work):
-They compare themselves to C-Store, and the Log Structured Merge
Tree. For C-Store, they point out that they differ significantly in
their API. They claim that they perform better on writes than C-Store,
though their performance evaluation shows poor performance for random
writes (admittedly, a hard problem).
-More generally, DHTs such as DSS could be seen as competitors, thought
they address this in Related Work.
Reproducibility (Could you reproduce the findings? If so,
how? If not, why not?):
There are MANY open source projects aiming at various points in
this space. At Yahoo, a table store project built on top of HDFS was in
its infancy stages while I was working there over the summer.
Question (A question about the work to discuss in class):
-Is it possible to successfully find a sweet spot between general
declarative languages (SQL) and a value-key access model (i.e. DHTs)?
Is it necessary? In my opinion, the Amazon S3 model to storage is more
approachable than the Google AppEngine approach exactly because S3 is
simpler. They even comment on this problem with in Google (where a lot
of filtering has already happened in terms of engineering abiliities).
Criticism (A criticism of the work that merits discussion):
-Is the interface too complex?
Ideas for further work (Did this paper give you ideas for future work,
projects, or connections to other work?):
-The RAD Lab SCADS project, and the various projects it encounters
(and undoubtedly cites as related work in papers) is a testoment that
there are many potential research topics worth exploring in the
scalable data structures space.