CS262B Advanced Topics in Computer Systems
Spring 2009

Paper Title:  Bigtable:  A Distributed Storage System for Structured Data
Author:  Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber
Date:  2006

Novel Idea:  The Bigtable team has provided a simple data model where access is by row or column name to achieve Internet scale, high availability, reliability, and performance.
Main Result(s):  The authors have designed & implemented the Bigtable service, which hosts large-scale structured data and is used extensively by more than sixty Google products.
Impact:  This technology is widely used within Google. 
Evidence:  The authors provide a description of the design and implementation of Bigtable.  The main empirical evidence is Figure 6, which shows that read/write rates scale nearly linearly with the number of tablet servers.  They also describe the Google products that make use of Bigtable to show that it is useful.
Prior Work:  Some similar work that also tries to provide Internet-scale storage includes distributed hash tables and DBMSs.  The problem description for DHTs is quite different, due to the high churn and untrusted clients.  DBMSs are not able to scale to the number of clients that Bigtable is; Bigtable's interface is lower-level, and they do not have to support complex queries.
Competitive work:  As in the Chubby paper, the Bigtable authors remark that their service has some overlap with Boxwood but differs mostly in that Boxwood is meant to be a layer that supports the development of higher-level services such as file systems, while Bigtable is intended to serve client applications.
Reproducibility:  This work would be difficult to reproduce since they understandably don't provide many details about the implementation.  Also, to do so at a large scale would present many challenges.
Question:  The authors mention that a novel aspect of this work is that the query set supported is simple (data access is by row/column name).  Could similar results have been obtained using a standard RDBMS and restricting the queries issued?
Criticism:  In the Lessons section, the authors discuss their efforts developing the tablet server membership protocol.  Right after saying how important it is to use simple designs, the authors say that their first protocol did not work because it was simple.  They should have qualified their prescription of simple designs -- clearly, not any simple protocol will do.
Ideas for further work:  I would like to deploy a RDBMS and restrict the query set supported so that it would be roughly comparable to Bigtable and then compare the performance and scalability of the two systems.

Paper Title: Bigtable: A Distributed Storage System for Structured Data
Author: Chang, Dean, Ghemawat, Hsieh, Wallach, Burrows, Chandra, Fikes, Gruber
Date: 2006

Novel Idea:
By providing a less rich interface than traditional databases, one can give applications a distributed database
with better performance and scalability and still enough flexibility for their tasks.
Main Result(s):
The author's distributed database scales to hundreds of servers with approximately 50% overhead compared to the 1-server case and is able to provide the persistent data backend for several highly utilized services.
Shows the utility of developing distributed databases from a more minimal featureset, rather than (as with the "traditional" approach) making support for everything a commercial relational database can do a goal.
The authors have deployed their distributed database as the back end for several Google projects in production (from which they gain anecdotes indicating its practical usefulness) and measure its performance for random and sequential accesses using 1 to 500 nodes and a synthetic workload.
Prior Work:
This uses the Chubby lock service for coordination and GFS for storage. The concept of providing a distributed data store with this sort of interface is similar to earlier work on DHTs, though the authors were blessed with a friendlier environment.
Competitive work:
More recently: Dynamo, Cassandra [which serve a similar application, but split up the data in a Chord-like fashion rather than relying heavily on a distributed filesystem and a distributed lock service for these tasks];
Traditional distributed databases [for which providing full transaction support is usually important, and thus have more sophisticated coordination between servers]
Hard, especially given the dependence on other services. Given an implementation (and enough machines), the performance experiments seem well enough described.
Is this (or what is) enough functionality on top of which to implement a "complete" distributed database?
Though the authors attributed some of their less-than-ideal scaling to their inadequate ability to do load balancing ("rebalancing is throttled .... and the load generated ... shifts around"), it is unclear how load balancing is actually being performed in the system or how the authors came to the conclusion that an imbalance in load prevented good scaling.
Ideas for further work:
- Everything SCADS is doing/wants to do. (e.g. supporting queries involving joins on this.)
- How much would Cassandra/Dynamo/other alternatives lose/gain if they were implemented on top of a general-purpose distributed FS?

Bigtable: A Distributed Storage System for Structured Data
Chang et al

Novel Idea: They create a large scale distributed storage system, based
on key value pairs. Their system is designed for ridiculously large
scale datacenters. While they don't mention it much, I think one of
the big hidden ideas is that the clients are exposed to locality of the
data, and thus how performance is affected by the manner of their use.
Comparatively, the traditional relational DB view hides performance
from the client (to a greater degree). Likewise, their system gets
away with a bit more since they only provide exactly what they need -
not the full blown relational DB.

Main Result: It works. Like many of the google papers, they are really
just telling us about something they already do and practical things
they learned from a real deployment. They do show some evidence of the
scalability of the system.

Impact: For them, it allows the reuse of a key-value pair store for all
of their internal apps, instead of starting from scratch. For everyone
else, it sets the bar high.

Evidence: They have some minor benchmarks that show scanning and
sequential accesses scale well, and random reads are lousy, as
expected. They come off as not caring about convincing us it works, so
they don't really give us a lot of performance info. Some of it is
just example numbers of how large these things are.

Prior Work: Boxwood, and C-Store

Competitive Work: commercial DBs (oracle, IBM), and later the hadoop
project started HBase, which is their version of big table.

Reproducibility: Lots of engineering. See Hadoop's HBase.

Question: I'm not too familiar (beyond scanning Wikipedia) with Bloom
filters and friends.

Criticism: It was a little boring, actually. Maybe because I read this
one second, and also read it a long time ago. And the chubby paper was
more honest about whether or not it was "research."

Ideas for further work: I think this inspired SCADS in the radlab a

Paper Title:     Bigtable: A Distributed Storage System for Structured Data
Author:     Fay Chang et al (Google)
Date:     OSDI 2006

Novel Idea
    This paper describes Bigtable which is a distributed storage system developed at good
Main Result(s):
    They are able to create a system with good scaling by limiting the interface and not providing a full rational database.
    I think it could be important.  As we create more data, having databases that provide limited interfaces to get better scaling and performance might be worthwhile.
    They did a series of random and sequential writes on different numbers of nodes and measured the performance to show the scaling.  They also claimed success by having a user base, but they are in a corporation so their users have more pressure to use internal tools.
Prior Work:
    They are related to other work on distributed hash tables such as Cord or DDS.
Competitive work:     It's really hard to say because they are working in an internal system with a closed software stack.   Although it is similar to other database systems that have masters and workers and try to have high availability.
    Definitely not easily since they are using other internal google tools such as Chubby.
     How do we compare these corporate designs to those in research?
    It's hard to see if this is a great new idea or just an engineering solution to their google needs.
Ideas for further work:    It didn't really inspire me in too many ways.

Paper Title: bigtable
Author: Google
Date: OSDI 06

Novel Idea
Novel's not the right thing for an industry paper, generally. I don't know a lot of the related work, but I hadn't seen any other system that didn't just do key-value storage. This is just a system only google could build as it relies on GFS and chubby.
Main Result(s):
I'm not really sure about this either. People use it? It didn't scale spectacularly...
Something like 60 google web services use it. I would expect that this has informed future systems, but I don't know any (aside from maybe scads) that use a table store idiom.
Again, 60 services. That doesn't seem huge, but this was 06.
Prior Work:
DDS is the immediate one, though it did key/value. More appropriate prior work would be the distributed databases.
Competitive work: Oracle sells very expensive things for this exact purpose.
None at all. This is proprietary built on proprietary systems using proprietary hardware and networking.
When does the scaling end? You know google could have run this to 50k machines. Hell, when I was there I ran 1000 hadoop instances before that could scale. 500 is a pretty arbitrary end of the graph...
I didn't think that the paper was well written. There was little discussion of WHY they made the decisions they did. I see that chubby provides certain properties, but it seems like you're stacking a lot of latency and variance. Why the B-tree? Why not a DHT, or for that matter just a normal hash table? It's very industry, we're not worried about what's best, just what works. It's sad that this system wouldn't get in if it were from any other place.
Ideas for further work: I think that a lot of the individual pieces can be optimized. Too bad it's not open enough to do that.

Paper Title: Bigtable: A Distributed Storage System for Structured Data
Author: Chang et al.
Date: 2007

Novel Idea
BigTable's data model gives clients dynamic control of data layout and format, allowing them to reason about the locality properties of the data when selecting access schema. This control comes at the cost of a fully relational data model.
Main Result(s):
The main goal of this paper is to document the data model, client API, and underlying implementation of BigTable. The authors point out that BigTable has been use din production at Google with table sizes of up to 800 TB. The provide a small analysis of the effect of number of tablet serves on the throughput of various operations.
The implementation has been used successfully at Google. The paper has also inspired open source reproductions of the service that have seen deployment in many contexts. The recognition that control of locality and availability may in some cases trump supporting a full relational model has had an impact on distributed systems research.
The main emphasis of the paper is on describing concepts employed in the system. Evidence is mainly supplied in the form of anecdotal descriptions of deployment and use in well known Google services. Some throughput numbers and installation size statistics are also reported.
Prior Work:
The authors cite the Boxwood project's distributed services, work on WAN services especially DHTs, and services based on key-value pairs.
Competitive work: Oracle's RAC and IBM's DB2 and C-store are systems in the same space (but with relational guarantees). More recently, Hypertable/Hbase and other open source implementations are more widely used outside of Google.
It is extremely difficult to reproduce the size of the instances that Google uses. However, open source developers have already succeeded in mimicing the functionality of BigTable, which speaks to the clarity of the conceptual presentation provided by this paper.
To what degree are the demands placed on the database by Google's apps (e.g. single row queries) representative of internet applications as a whole? Did they influence the design to such a degree that BigTable cannot be adapted for use by applications with more diverse requirements?
It is impossible ofr BigTable to scale outside of the data center, due to the centralized nature of the single master server, which also presents a possible availability bottleneck.
Ideas for further work: Increasing the expressible nature of the schema to handle additional locality and consistency tradeoffs.

Paper Title:
Bigtable: A Distributed Storage System for Structured Data
Novel Idea:
If a different interface is provided than that presented by relational databases, the system can achieve significantly greater performance. For example, the application may provide information about which portions of the data to keep in memory for faster accesses.
Main Results:
The authors demonstrate a system that scales well in the cluster environment and is able to serve as a storage mechanism for a variety of Google products such as Google Earth and Personalized Search.
This system effectively challenges the assumption that relational databases are the best approach to storing structured data. It also provides evidence that designs should focus on simplicity and robustness.
Benchmarks are provided for the performance of Bigtable during random/sequential reads and writes for different cluster sizes. We see the potential benefits when the application specifies the parts of the data that should be mapped into memory. Lastly, we see that as the number of nodes is varied from 1 to 500, the throughput increases by a factor of 100.
Prior Work:
Bigtable utilizes the services provided by GFS and Chubby. It also borrows some ideas from Log-Structured Merge Tree. Lastly, Bigtable makes use of some of the load-balancing techniques present in shared-nothing databases.
Competitive Work:
Distributed hash tables seem to have some similar goals present in Bigtable. However, Bigtable operates in a different environment with different operating assumptions and argues for a richer API. C-Store, which operates like a relational database, seems to be the most direct competitor to Bigtable. However, C-Store tends to focus on maximizing read performance.
This work would be very difficult to reproduce for two reasons. The first one is that it would be very challenging to get access to the appropriate cluster. The second reason is that Bigtable depends on many other systems such as Chubby and GFS.
The authors of this paper claim that many of the assumptions about clusters such as the absence of network partitions are not valid when developing robust systems. To what extent to does it invalidate the other papers we have read that explicitly make these types of assumptions?
The experiments in this paper do not provide a comparison of how other systems would perform in a similar environment. As a result, it is difficult to assess the actual performance gain compared to a relational database.
Ideas for Future Work:
There seems to be some potential for improvement in how Bigtable scales. I would like to develop an improved load rebalancing mechanism for this system that avoids thrashing yet achieves better performance than that seen here.


Novel Idea:
The data model they have chosen, as well as the access patterns they provide to the data are quite novel in this paper.  The row, column family, column model is an interesting shift from the relational model, allowing more flexible structuring which can work better for some data sets (like their crawl table).  Their access scheme is essential primary key only, but that this works for most of their applications is interesting.  Having a single master is a novel design choice for a distributed system like this.

Main Result:
The authors of this paper have created a highly available, scalable persistent storage system.  Their system automatically handles failures in data nodes, and supports a number of large Google services.  In addition, this paper establishes a data model (the row->column family->column model) that seems well suited to a number of applications and has been adopted by other storage systems.

Bigtable shows that RDBMSes are not always the right tool for data storage.  Especially in an environment with load rates like Google has where there is no need for complex queries or transactions.  The impact has been the creation of a number of 'copy-cat' systems from Cassandra to Hypertable, as well as a push to make these 'not quite ACID' (but highly scalable and available) systems able to back a wide variety of applications.

This paper can present strong evidence simply because it's from Google.  Given the wide variety of Google applications that are running on Bigtable we can reasonably assume it works quite well and that the data model suits programmers.  Since Bigtable presents a fairly limited data access model their tests also cover the main access patterns.  They show that they scale well for these patterns and that they have high throughput.

Prior/Competitive Work:
Bigtable builds on a large amount of previous work.  Database research for things like atomic reads/writes, b-trees and logging, DHTs for hash based distributed content location, and parallel databases for data partitioning strategies.  A big difference between Bigtable and somewhat similar systems like Chord is that Bigtable assumes all its machines are in one admin domain, and that they will all have fast interconnects.  As such it is able to remove much of the overhead associated with dealing with wide area networks and as a result is much faster and more scalable than any of the DHTs that came before it.

Bigtable has been reproduced.  Hypertable (www.hypertable.org) is an opensource implementation that is quite mature, stable and fast.  It works in a very similar manner to Bigtable (they also have their own version of Chubby, and run on top of the hadoop file system) and are able to achieve excellent performance.  It's hard to compare exactly however, since Google hasn't published detailed Bigtable performance numbers (to my knowledge).

Does the Bigtable data model make sense for a large variety of applications, or was it simply convenient for their WebCrawl data?  Certainly other apps have used it, but perhaps that was despite the data model rather than because of it.

I would liked to have seen some data about how well Bigtable performs with failures.  They claim very low failure rates, but in a less reliable environment things like a single master could cause pathological cases.  I would like a story about these sorts of situations.

Ideas for further work:
The SCADS project is pretty much pure future work from this project.  Some obvious ideas like secondary indexes are the low hanging fruit.  At a deeper level however, Bigtable is something of a move away from the data interdependence that the database community has championed for so many years.  Giving programmers control over data placement, for example, means the system can't optimize for usage patterns it might be able to detect.  A direction SCADS is going in is removing developer control from things like replica count and data placement and using clever algorithms to do a better job, and to allow the system to adapt to dynamic workloads.

Paper Title: Bigtable: A Distributed Storage System for Structured Data
Author: Fay Chang et al.
Date: OSDI 2006

Novel Idea
Let application influence the way data is stored and accessed. (Both locality and in memory caching policy)
Main Result(s):
Much improved scalability compared to traditional databases
Interface not too awkward to use
Pretty much all google application now uses bigtable. Open source project is starting open implementation of similar systems, and get adopted by other large internet companies.
Microbenchmark: showing that it does not scale linearly due to load balancing issues
Macrobenchmark: still better than SQL
Prior Work:
Shared-nothing database, distributed algorithms
Competitive work:
Other column based databases

Since there is now an open source implementation of it, it is easy to reproduce these experiments
I think secondary index is not going to be free. (some scalability is going to be sacrificed.) Is it really necessary in the real applications?


The lack of generalized query makes the integration of Bigtable with other SQL-based database difficult. For an application with multiple needs, the current approach does not provide easy ways for these two types of database to coexist and have data flow in both directions.

Novel Ideas:
- A non-relational data model and partitioning strategy that make it possible to create efficient and very large tables.

- BigTable scales, and performs well enough to power a bunch of popular Google services.

- There are about 12 open-source clones of this project now, in various stages of completion.

- Microbenchmarks showing scaling and performance of various operations. Discussion about scale and throughput achieved in real applications.

Prior Work:
- DDS was certainly the progenitor of cluster-based non-relational data stores. Parallel databases (e.g. column stores) also contributed to BigTable, though none had the same kind of consistency guarantees and the same data model.

Competitive Work:
- It's interesting to compare the design choices in BigTable with Dynamo. Dynamo seems to lean more towards neat algorithms, use of randomness, etc than BigTable, but the design choices in BigTable are very pragmatic.

- Nobody's reproduced the engineering effort that went into the implementation yet, but the experiments themselves should be reproducible given it.

- Why did they choose the particular data model they did (rows, columns and timestamps)? Are there others that could be useful?

- It would've been nice to explain why they went with BigTable instead of, say, partitioned MySQL or BerkeleyDB for certain apps. Google is also a huge user of MySQL.

Ideas for Future Work:
- BigTable is quite different from other work we've read because it's very modular and reuses some solid components (GFS, Chubby, etc). This is a nice thing to be able to do when engineering large (or small) systems. Is there some set of abstractions, libraries, building blocks, etc one can provide for writing data center apps that would make it easier to build these kinds of systems? It might be interesting just to list these first.

Paper Title: Bigtable: A Distributed Storage System for Structured Data
Author: Chang, Dean, Ghemawat, Hsieh, Wallach, Burrows, Chandra, Fikes, Gruber.
Date: 2006
Novel Idea: They describe the design and implementation of a distributed storage system for structured data. The main novelty compared to traditional databases is that they define a simple data model (instead of supporting a full relational data model) that supports dynamic control over data layout and format and also gives clients some locality control for the data. Bigtable does not follow the typical convention of having a fixed number of rows. It is a sparse, distributed, persistent multi-dimensional sorted map. It achieves scalability, high performance, flexibility and high availability. They describe many smaller techniques in their implementation as for example their minor, major and merging compaction. It is hard to know from reading the paper, which of these are new.
Main Results: Bigtable achieves scalability, high performance, high availability and flexibility. They further showed that Bigtable has many applications, at the time of the paper 60 project used it.
Impact: Bigtable seems to be a great way to store very large amounts of (semi-)structured data. Unlike commercial databases, it can handle scaling issues well and offers many other advantages: flexibility, high-performance, dynamic control over data layout and format etc. The only issue I see with using it widely outside of Google, is to actually reproduce Bigtable.
Evidence: They use a set of benchmarks (sequential write, random write, sequential read, random read, scans and random reads in-memory) to test the performance of a single tablet-server. They check if the system scales well, by analyzing the throughput and the performance as they increase the number of tablet servers. They study how the unavailability of Chubby affects the availability of Bigtable. Finally, they present several real applications that use Bigtable.
Prior Work: Google File System: stores the persistent data.
Chubby: to ensure that there is at most one active master, to store the bootstrap location of Bigtable data, to discover tablet servers and finalize tablet server deaths, to store Bigtable schema information and to store access control lists.
Map Reduce: used to read and write BigTable data.
Sawzall: BigTable can execute client supplied scripts on the server. The scripts must be written in Sawzall, a language developed at Google to process data.
Bloom filters: they use Bloom filters in SSTables to reduce the number of disk accesses.
Competitive Work: no comparison to competitive work.
Reproducibility: It would be very hard to reproduce it since it is a very large system that builds on many previous systems.
Criticism: Maybe one negative point is that they don’t support general-purpose transactions. They also needed to change the interface of traditional databases. This is not necessarily something negative, but it forces users to get used to the new interface.
Ideas for Future Work:

Paper Title: The Chubby lock service for loosely-coupled distributed systems.
Author: Mike Burrows
Date: 2006
Novel Idea: As they make it clear in the introduction, this paper is not about new algorithms and techniques, but describes more an engineering experience. Its main contribution is to describe the design and implementation of a centralized lock service in a large loosely-coupled distributed system. It is a very practical paper and takes care of client’s implementation complexities and corner cases for scaling and failures.
Main Results: Chubby is a lock service that allows its clients to synchronize their activities and to agree on basic information about their environment. Chubby achieves reliability, availability, simplicity and scales to a relatively large number of clients. It doesn’t focus as much on high-performance and storage capacity. At the time of the paper, Chubby had already been used for several Google services as a lock service and as a name server.
Impact: I think it will have large impact. Chubby handles many practical issues to achieve scalability and clarity. It is also very simple and a clean basis to add modifications on top as the requirements of services change.
Evidence: They provide anecdotal evidence to their design choices throughout the paper, by indicating how they first implemented it and what problem this caused and why they needed to modify the design. In the same way, they do not present detailed measurements of the system, but only provide some anecdotal numbers to indicate typical causes of outages in their cells, how often they have lost data, request latency at their servers, RPC read and write latencies measured at the client, general statistics for the Chubby cell (what kind of files, how many clients hold locks etc.).
Prior Work: Chubby is based on well-known ideas: distributed consensus among a few replicas for fault tolerance, consistent client-side caching to reduce server load while retaining simple semantics, timely notification updates, a familiar file system interface etc.
Competitive Work: no comparison to competitive work
Reproducibility: I think that some of the ideas in Chubby are presented in enough detail to reproduce them.
Criticism: I think at several occasions, more quantitative measurements would have made the paper more convincing. For example, I wonder if their invalidation strategy is actually scalable for large number of clients. They do not really provide any support for this claim.
Ideas for Future Work: There are some little extensions that one could add to the paper. For example, implement the hybrid scheme that switches caching tactics (block calls that access the node during invalidation vs. treat the node as uncatchable while cache invalidations remain unacknowledged) if there is an overload.

Paper Title:

Bigtable: A Distributed Storage System for Structured Data


Chang, Dean, Sgemawat, Hsieh, Wallach, Burrows, Chandra, Fikes & Gruber



Novel Idea

The authors describe Bigtable, a distributed storage system used at Google for a wide variety of applications. Bigtable presents a different abstraction of data than the other distributed storage systems we've looked at, allowing a fair amount of control over locality, memory residency, and version control over particular items in the table.

Main Result(s):

Google has successfully implemented the Bigtable system and is using it for a variety of services, including Google Earth, personalized search, and Google Analytics. The authors show that the system scales well and is suitably efficient for real-time data services.


Bigtable's achievements in scalability and application author control are commendable (though perhaps near duplications of work that we've read in recent weeks). The work on optimizing for "sparse" and multi-dimensional maps is interesting and should influence distributed storage system work in the future. The Chubby lock service is novel and useful, and the subject of the next paper we're reading.


The authors ran a series of benchmarks to determine the effectiveness of the Bigtable data service system. The bulk of these experiments involved running rand read/writes and sequential read/writes against varies sizes of Bigtable clusters. These tests show that Bigtable is effective, and that it scales reasonably well (though far from perfectly). Secondly, the authors examine the performance of a single server, standing alone, and show that the performance is acceptable and that the bottlenecks are primarily related to the network stack on an individual node.

Finally, the authors discuss a number of Google services that are currently using the Bigtable system as a backend. They argue that the effectiveness of these services (and some details of their Bigtable access patterns) is strong evidence for the performance of Bigtable itself.

Prior Work:

Bigtable builds on a lot of work in distributed hast tables, such as Tapestry and Chord. Sadly missing from the list of related work in the paper, but obviously relevant, is the work on TACC, BASE, and DDS done at Berkeley. The performance gains due to locality in Bigtable are similar to those seen in KDB+, MonetDB, and Ailamaki.

Competitive work:

Bigtable has strong similarities to Boxwood, but is focused more directly on providing services to client applications, rather than on building high-level data abstractions. Bigtable is different from most commercial databases in the same way, providing a direct link to underlying cluster technology for client applications.


I could not reproduce these results directly without being much more friendly with Google than I currently am, as I imagine they protect this source code fairly tightly. That said, one could build their own system with comparable goals, embodying the design principles discussed in this paper.


A lot of this work seems less than entirely novel. I feel like I'm obviously missing something. I'm curious to see what that something is.


Google seemed to reinvent the wheel a lot in the context of this work. While I understand this from a corporate perspective, it seems odd from an academic/publishing perspective. I'm surprised that this paper was accepted to OSDI with such weak connections to previous work, but less that it won a best paper award.

Ideas for further work:

As with several other papers that we've read this semester, the authors tout the client application API as easy to use and useful, then go on to provide absolutely no evidence to support this claim. I'm growing increasingly intrigued by this phenomenon, and would be interested in finding a way to study it.

Novel Idea
<Describe in a sentence or two the new ideas presented in the paper>
This paper describes a storage system designed for scalability and
availability. The system differs from traditional relational databases
in that it stores semi-structured data and provides only a lookup
interface. The system maps a row name, column name and time value to
an arbitrary value string. This structure allows each row to have a
variable number of columns and potentially multiple versions.
Scalability is achieved through data partitioning by rows.

Main Result(s):
<Describe in a sentence or two the main results obtained in the paper>
Web services require a light-weight distributed storage model for
scalability. BigTable provides the required scalability with good
performance, however, the scalability is not completely linear. There
are several bottlenecks including load imbalance and network
saturation when performing small random reads.

<What is the importance of these results. What impact might they
have on theory or practice of Computer Systems>
Systems based on BigTable have been adopted by many web services aimed
at high scalability. This paper has also sparked further research into
high performance parallel data storage. Finally, it has brought
attention to previous work on parallel databases, specifically on how
they can be improved for todays applications.

<What reasoning, demonstration, analytical or empirical analysis
did they use to establish their results>
The authors provide a limited set of scale-up benchmarks showing a
slightly less than linear increase in read/writes per second versus
number of tablet servers. They also provide convincing anecdotal
evidence by listing services that use BigTable, including Google
Earth, Analytic, Crawl and Orkut.

Prior Work:
<What previously established results does it build upon and how>
This work builds heavily on prior research into parallel databases.
The key scalability technique of horizontal data partitioning and the
semi-structured table layout comes from the database literature.
However, this project removed many unnecessary DB features such as the
SQL interface, query optimizer and all operations besides lookups.

Competitive work:
<How to the compare their results to related prior or contemporary work>
Distributed hash tables can be seen as a competitive work. The key
difference is that DHTs such as Chord assume a high churn rate and
untrusted peers. So BigTable is able to achieve better overall

<Could you reproduce the findings? If so, how? If not, why not?>
Unlikely. The paper describes the design at a very high level without
describing specific policies for load balancing, partitioning, etc.
BigTable also relies on two other Google projects: Chubby and GFS.

<A question about the work to discuss in class>
How does BigTable performance compare to "Scalable, Distributed Data
Structures for Internet Service Construction"?

<A criticism of the work that merits discussion>
The reliability and availability of the system was not thoroughly
evaluated. Also, further scalability/performance benchmarks would be

Paper Title: BigTable: A Distributed Storage System for Structured Data
Author: Fay Chang
Date: OSDI '06

Novel Idea
BigTable is a compressed, high performance, and proprietary database system
built on Google File System (GFS), Chubby Lock Service, and a few other Google
programs. Its primary goal is to store a wide variety of data types while
mantaining the ability to scale to unprecedented size.
Main Result(s):
Google's reasons for developing its own database include scalability, and
better control of performance characteristics. Microbenchmarks show that
these cahracteristics are met. Unfortunately, however, perfocrmance doesn't
scale completely linearly.
It is used by over 60 different google products including Google Reader,
Google Maps, Google Book Search, "My Search History", Google Earth,
Blogger.com, Google Code hosting, Orkut, and YouTube.
Only microbenchmarks were provided, and no comparison to simialr database type
systems was provided. Microbenchmarks included thoguhtput fo
reads/writes, as well as
Prior Work:
Based on Google File System (GFS), Chubby Lock Service.
Competitive work:
Yahoo! hadoop is working on a clone of this service.
No ability to actually run experiments since the source code is not available.
Google probably wants to keep it proprietary anyway.
The authors mention that the itnerface to BigTable is a bit hard to get used
to. Would it be possible to extend big table to provide semantics similar to
that of traditional relational databases? Would this even be necessary?
I would have liked to see a comparison to an application imlemented using
BigTable and a traditional database.
Ideas for further work:


I presented a summary of and led discussion about this paper in Ion's 294 course on Cloud Computing yesterday. Ironically the reading assigned for his class yesterday was the same set of papers assigned for today. I also blogged my notes and thoughts about BigTable for his course at http://netw0rk-king.blogspot.com/2009/02/googles-bigtable.html

Novel Idea (Describe in a sentence or two the new ideas presented in the paper):
A scalable, highly available, new take on simple semantic table storage built on top of GFS and Chubby.

Impact (What is the importance of these results.  What impact might they have on theory or practice of Computer Systems):
They have released Google AppEngine which exposes BigTable as the primary data store. AppEngine is Google's first and only public facing Cloud Computing offering.

Evidence (What reasoning, demonstration, analytical or empiricial analysis did they use to establish their results):
They present an evaluation in which they run GFS and the whole BigTable stack on the set of 1786 machines. Since they allow the choice between disk and memory storage, they obviously see a significant difference in perf. (seq reads/writes and random writes all fairly clustered, peaking at about 1M per sec (1000byte values)).

Prior Work (What previously established results does it build upon and how):
-They reference DHTs which provide much more functionality than they want in the form of handling variance in bandwidth, churn, etc. while not having the indexing, versioning, and other functionality they want, built in already. They of course borrowed much from DMBS research of the last 30 years.

Competitive work (How to they compare their results to related prior or contemporary work):
-They compare themselves to C-Store, and the Log Structured Merge Tree. For C-Store, they point out that they differ significantly in their API. They claim that they perform better on writes than C-Store, though their performance evaluation shows poor performance for random writes (admittedly, a hard problem).
-More generally, DHTs such as DSS could be seen as competitors, thought they address this in Related Work.

Reproducibility (Could you reproduce the findings?  If so, how?  If not, why not?):
There are MANY open source projects aiming at various points in this space. At Yahoo, a table store project built on top of HDFS was in its infancy stages while I was working there over the summer.

Question (A question about the work to discuss in class):
-Is it possible to successfully find a sweet spot between general declarative languages (SQL) and a value-key access model (i.e. DHTs)? Is it necessary? In my opinion, the Amazon S3 model to storage is more approachable than the Google AppEngine approach exactly because S3 is simpler. They even comment on this problem with in Google (where a lot of filtering has already happened in terms of engineering abiliities).

Criticism (A criticism of the work that merits discussion):
-Is the interface too complex?

Ideas for further work (Did this paper give you ideas for future work, projects, or connections to other work?):
-The RAD Lab SCADS project, and the various projects it encounters (and undoubtedly cites as related work in papers) is a testoment that there are many potential research topics worth exploring in the scalable data structures space.