CS262B Advanced Topics in Computer Systems
Spring 2009

Paper Title: The Chubby lock service for loosely-coupled distributed systems.
Author: Mike Burrows
Date: 2006
Novel Idea:
Provide a distributed locking + small data storing service a useful way for distributed applications to coordinate.
Main Result(s):
The lock service the authors implement is able to act as a coordination and naming service, using five machines to handle tens of thousands of client processes.
Impact:
Suggests that distributed consensus services substantially above the level of Paxos libraries but not as sophisticated as a distributed database or filesystem are useful to application developers.
Evidence:
The authors implemented Chubby and had experience with it being used by many of Google's internal and external services.
Prior Work:
This uses Paxos, and is derived from distributed file system's (usually less user-visible) session and lock handling.
Competitive work:
Other work has provided explicit locking management for some distributed data structure, but usually in concert with a heavier-weight distributed data structure than Chubby provides, like a full-features distributed filesystem or database.
Reproducibility:
There aren't any systematic experiments here to reproduce, and I certainly do not have the environment (of user applications) to see if the runtime statistics displayed in Section 4.1 are realistic.
Question:
Is this model less error-prone for application programmers than providing a more Paxos-like interface? Is exporting a familiar parallel interface for a distributed system really a good idea?
Criticism:
Though they include valuable and rare measurements of how much use the service achieves in practice, they lack data about the latency of individual operations like locking (how would the system perform if people used it for fine-grained locking?) and establishing a new connection and the actual overhead per file/lock.
Ideas for further work:
- Making programming the fail-over parts of services like Chubby less error-prone.


The Chubby Lock Service for Loosely-coupled Distributed Systems
Mike Burrows

Novel Idea: Provide a coarse-grained locking service with high
availability/reliability distributed system. It differs from
traditional distributed systems research in that it is a practical
implementation. Instead of attempting to provide Paxos or virtual
synchrony library, they provide a service that is easy to use and
understand. I also liked how it scales better than DNS for a name
service, since they can keepalive a client with one message and cover
all name caches the client is using. Another things is how he just
made up his own interface, similar to POSIX to make it easy to work
with, but threw away all the complications.

Main Result: He simply tells us what he did, why, and what happened
later. Other than some usage information, he was not trying to
convince us of anything.

Impact: The distributed system research community wasn't put on it's
head, but it's a great practical system, used all over Google.

Evidence: None required.

Prior Work: Paxos, Birman's stuff, AFS (statefulness, compared to NFS)

Competitive Work: Boxwood

Reproducibility: Could probably build this, with time.

Question: Did Chubby actually use Paxos? BigTable said they did, but
it seemed like their writes just needed a quorum of replicas.

Criticism: None really. He was upfront about nothing being new, and he
kept it lively by sprinkling in nice systems-nuggets and connections to
prior work.

Ideas for further work: Mostly just more examples of the stateful vs
stateless when it comes to scaling. Another thing was his creation of
a compare-and-swap facility. It's just a good idea to give a strong

atomic principle for other systems to build on.
 



Paper Title:     The Chubby lock service for loosely-coupled distributed systems.
Author:     Mike Burrows (Google)
Date:     OSDI 2006

Novel Idea
    This is a lock service which allows distributed systems to obtain locks
Main Result(s):
    They are able to create a lot service using keepalives which is highly available by having 5 copies running with an elected master.
Impact:
    Having a keepalive protocol and a lock service allows the creation of many applications without having to worry that some worker/lock holder died with the lock.  At google it ended up being used mostly as a name service.
Evidence:
    They present the number of messages, fails, etc of Chubby in the google system.
Prior Work:
    They are based on distributed file systems, token protocols, and locking systems.
Competitive work:     Chubby provides a lock service and session service in the same mechanism while other systems have these as separate services.
Reproducibility
    No since they are using internel google usage patterns.
Question:
    A lock service seems like a heavy weight mechanism for locking is this really necessary?
Criticism:
    Their client side code is huge.  What makes it so complex?
Ideas for further work:     I don't know.  The idea that you have to rely on an external service to know if parts of your system are working since they could all go down is rather interesting.


Paper Title: Chubby
Author: Mike Burrows
Date: OSDI 06

Novel Idea

Main Result(s):
They explicitly say that this is not research. I disagree, the lessons here are probably the most important research. How do you build useful software?
Impact:
Hugely influential in google, at least.
Evidence:
I don't think they gave explicit service counts, but both map-reduce and bigtable use it.
Prior Work:
Mostly DFS locking mechanisms.
Competitive work: Surprisingly, DNS. In retrospect, it should be obvious to use a locking mechanism for discovery.
Reproducibility
Proprietary, but well explained. This should be pretty easy to reproduce, as there's no performance data.
Question:
I don't really have one. This paper is a great snapshot of the engineering difficulties at google. There's always interest in the future, how this thing changed.
Criticism:
It was a little dense, but it was 16 pages of information. They organized it really well, giving the tradeoffs they made and which worked out. Just to make it easier to read, this should have been two papers. Probably wouldn't have gotten into OSDI then. However, this was way better than the bigtable paper.
Ideas for further work: How does this apply to multicore locking? One big trick is the coarse grained locking, which encourages applications to use less locking. Multicore locking wants fine grained locking to reduce sharing. I feel like the difference is that we're OK with processes blocking. The model of them knowing they can't get the lock, but continuing is good. Just signal the process when the lock is available. Again, I'm not sure if there's anything here. It's an interesting line though.


Novel Idea
Chubby provides coarse-grain locking and reliable low-volume storage for loosely coupled distributed machines connected by a high speed network. The goal of the service is to allow clients to synchronize and form consensus about global state. The design goals were reliability and availability to a large set of clients, as opposed to high throughput or capacity.
Main Result(s):
The paper describes the rationale behind Chubby, and a description of its interface and implementation, as well as how these changed as the requirements of Google services became better understood. Several use cases are examined.
Impact:
The implementation has been used successfully at Google. It is used not only for synchronization but also as a name service and configuration information repository. The problems Chubby solves are clearly ones faced by programmers in many distributed environments, where the design ideas presented here would certainly apply.
Evidence:
The main emphasis of the paper is on describing concepts employed in the system. Evidence is mainly supplied in the form of anecdotal descriptions of deployment and use in well known Google services. A snapshot of run time statistics is also reported. A qualitative comparison to the Boxwood lock server is provided.
Prior Work:
The authors state that Chubby is based on well established ideas found in the distributed filesystems literature, citing Echo, V, VMS, and AFS. The difference between Chubby and such systems is in the amount of data clients can store and the level of performance (latency/throughput) that they can expect. The authors compare to Boxwood's lock server, stating that most of the differences are in the division of services and the high level API provided by Chubby, as well as different default parameters based on different assumptions about use.
Competitive work: Boxwood has a similar service. Other datacenter environment developers have created open source versions of Chubby (e.g. Zookeeper) to address similar requirements.
Reproducibility
There are no quantitative results to reproduce. The concepts described in the paper have already been reproduced by open source developers (though this work is ongoing and nontrivial).
Question:
What limitations are created by the single Master? Are they similar to the limitations created in BigTable?
Criticism:
The authors state that they optimized for reliability and availability in exchange for capacity and performance, however this tradeoff was never quantified to my satisfaction. I would like a better understanding of the spectrum of possible designs and the spot in that space which Chubby occupies.
Ideas for further work: Improve scaling via proxies and partitioning.


Chubby:

Novel Idea:
As stated in the paper, there are no new algorithms or techniques presented in this paper.  Rather, I think what's novel is that abstraction and interface presented to clients by Chubby.  It would have been possible to give clients a more lock oriented view, but this method allows Chubby to be more flexible and programmable, while still providing the services it set out to provide.

Main Result:
The result of this paper is a design for a distributed lock manager that can support a wide range of services and that works well for them.  In addition, this paper provides a lot of useful information about how applications will use a lock service and also, how they won't use it.

Impact:
I'm not sure this paper will have a huge impact on the CS community, if only because it doesn't contain a large amount of 'new' work.  However, I know of at least one system inspired by Chubby (Hyperspace), and I'm sure the lessons learned will influence the design of future distributed lock managers.

Evidence:
As in the Bigtable paper, Chubby benefits from having been used in production for some time.  The paper doesn't present any direct experiments, but the fact that they can support over 50 thousand clients and have been at the core of some of Google's most used services (like Bigtable) suggests that it works quite well.

Prior/Competitive Work:
In some sense, Chubby is all prior work, as it is a bringing together of existing techniques.  It relies on Paxos primarily, and work in distributed file systems. The actual API mirrors the unix filesystem API quite closely (by design).  Boxwood is compared to Chubby in the paper, the primary differences seeming to be that Boxwood targets a more sophisticated audience that need finer grained locks.

To compare Chubby to other similar lock services one would need to set up controlled experiments and test how long it takes to obtain and release locks, how usable the API was (perhaps with a user study), how stable the service was, and how well it recovered from failures.

Reproducibility:
The techniques Chubby uses are all well known, so I would expect the work to be fairly easy to reproduce.  Hyperspace (the lock manager for Hypertable) has reproduced much of Chubby's functionality, which indicates this is indeed the case.

Question:
Do developers really not want fine-grained locks, or do they just make do without them since Chubby doesn't provide them?  Would there be other interesting applications we could enable by efficiently supporting fine-grained distributed locks?

Criticism:
I would like to have seem some actual performance numbers in this paper.  As it is we simply have to take it on faith that Chubby works well.

Ideas for further work:
It would be interesting to see how much of Chubby could be written in P2.  I know there's been an implementation of Paxos and some of GFS in P2 and Chubby seems like an obvious candidate for another service that could be specified declaratively.


Paper Title: The Chubby lock service for loosely-coupled distributed system
Author: Mike Burrows
Date: OSDI 2006

Novel Idea
Coarse grained locking has completely different design parameters
Design for availability rather than performance
Main Result(s):
Used in unexpected ways as a name service
distributed locking service hides complexity from the end user, but can be abused if the user is unaware of the nature of the locking service
Impact:
Operation of other google services depend on Chubby to elect master copy, store metadata, serve as root of lookup tree, etc.

Prior Work:
Various distributed algorithms
Reproducibility
Difficult to reproduce
Question:
Are people at Google worried about Chubby being the single point of failure? It seems like services will be negatively impacted should Chubby go down.
Criticism:
N/A . Difficult to criticize an experience paper such as this one.


Novel Ideas:

- Centralized lock manager architecture for building distributed
systems. Designed to be easy to use (filesystem interface,
coarse-grained locks rather than fine-grained consensus, lock checks
not mandated, etc).



Results:

- System serves up to tens of thousands of clients at once and supports many production services.



Impact:

- Clearly this has been useful in Google. There is also an open-source Chubby-like system now (Apache Zookeeper).



Evidence:

- Mostly anecdotes about usage of Chubby in Google.



Prior Work:

- Consensus protocols had been studied for a long time. The new
thing in Chubby is a means of exposing them to applications and a
particular point in the space of number of replicas, number of things
being locked, availability, etc.



Competitive Work:

- They provide a comparison to Boxwood, which is a storage system
project with an internal lock manager rather than a general-purpose
lock service.



Reproducibility:

- Lots of detail about the algorithms means it should be possible to build something similar.



Question:

- How do you test a system like Chubby?



Criticism:

- The author says Chubby isn't really research. I disagree - Chubby
is precisely like a lot of OS research, trying to define an interface
over some service that provides adequate performance, generality, etc.
In this sense it is also a piece of data center architecture research.
Finally, the "social research" of how well programmers understand the
concepts and how they use them is useful as well.



Ideas for Future Work:

- While Chubby makes it possible to write replicated, highly
available masters, you still need to do a lot of the replication work
that is not locking yourself. Again looking from the point of view of
data center programming primitives, is there something we could provide
that would make writing e.g. the master node in GFS or the master in
MapReduce very easy? Both of these contain replicated state; perhaps
some kind of replicated DB primitive on top of which programmers only
write a soft-state master?



Paper Title: The Chubby lock service for loosly-coupled distributed systems
Author: Mike Burrows
Date: OSDI '06

Novel Idea:
The main goal is to develop a course-grained locking mechanism based on
advisory locks for distributd systems, whose primary design criteria is
reliability and availability rather than performance.
Main Result(s):
There is no real experimental data in this paper. It is more a description of
the mechanisms used to design and implement the Chubby lock server. The most
concrete result is that it is used to serve as the primary lock server for
providing course-grained synchornization of all of googles distibuted systems
services.
Impact:
It is used by google....
Evidence:
Throughout the paper they talk about how using this lock server eases
developer effort above all. This is reminescent of the agruments used for
promoting MapReduce.
Prior Work:
All of chubby's services are based on existing work. These ideas are
primarliy borrowed from systems such as Echo, System V, VMS, Boxwood, and
Paxos.
Competitive work:
I would be surpised if other cloud computing giants like amazon and yahoo!
don't have similar systems in place.
Reproducibility:
There are no real findings. It may be possible to reimplement the Chubby
capabilities given the descriptions provided in the paper however. Although I
suspect not everything about its inner workings has been revealed.
Question:
How often do we all as developers really consider availablity when
designing/implementing our own algorithms?
Criticism:
I'd like to see a comparison of some service implemented using Chubby with a
comparable service implemented using a traditional consesus protocol.
Ideas for further work:
Course grained locking in multicore for communication among processor cores through shared memory.



Paper Title: The Chubby lock service for loosly-coupled distributed systems
Author: Mike Burrows
Date: OSDI '06

Novel Idea:
The main goal is to develop a course-grained locking mechanism based on
advisory locks for distributd systems, whose primary design criteria is
reliability and availability rather than performance.
Main Result(s):
There is no real experimental data in this paper. It is more a description of
the mechanisms used to design and implement the Chubby lock server. The most
concrete result is that it is used to serve as the primary lock server for
providing course-grained synchornization of all of googles distibuted systems
services.
Impact:
It is used by google....
Evidence:
Throughout the paper they talk about how using this lock server eases
developer effort above all. This is reminescent of the agruments used for
promoting MapReduce.
Prior Work:
All of chubby's services are based on existing work. These ideas are
primarliy borrowed from systems such as Echo, System V, VMS, Boxwood, and
Paxos.
Competitive work:
I would be surpised if other cloud computing giants like amazon and yahoo!
don't have similar systems in place.
Reproducibility:
There are no real findings. It may be possible to reimplement the Chubby
capabilities given the descriptions provided in the paper however. Although I
suspect not everything about its inner workings has been revealed.
Question:
How often do we all as developers really consider availablity when
designing/implementing our own algorithms?
Criticism:
I'd like to see a comparison of some service implemented using Chubby with a
comparable service implemented using a traditional consesus protocol.
Ideas for further work:
Course grained locking in multicore for communication among processor cores



through shared memory.



Novel Idea:  The authors created a lock service which consists of a few (~5) servers which use the Paxos consensus protocol to elect a leader, which then responds to a large number of client requests.  Client-side caching and small files stored at the master enable more clients to be served.
Main Result(s):  The authors have deployed Chubby within Google, and it is widely used as Google's internal name service; its master-election capability is used by other services within Google such as GFS, MapReduce, and BigTable.
Impact:  This technology is widely used. 
Evidence:  The authors provided little evidence in this paper beyond their explanations of how it was designed and some information about how it is used.  The only numerical data they provided was a table with some statistics about a Chubby cell's use during a ten-minute period.
Prior Work:  Chubby makes use of the Paxos consensus protocol and the idea of leases, which are granted to masters.  Once the leases expire, if a master has died, a new master can be elected with minimal impact to the clients due to the coarse granularity of the locks issued.
Competitive work:  Chubby is similar to distributed file systems; it differs in that it sacrifices performance for consistency, availability, and reliability.  Since it serves small files and coarse-grained locks, it can afford to provide lower performance.  It is different from another lock server called Boxwood mostly in the simplicity of the interface, which makes it accessible to developers of varied skill levels.
Reproducibility:  This work would be difficult to reproduce since they understandably don't provide many details about the implementation.  Also, to do so at a large scale would present many challenges.
Question:  They mentioned that users sometimes abused Chubby, storing files larger than it was intended to serve.  How could these files automatically be moved to a more appropriate service such as GFS?
Criticism:  It would have been more compelling if the authors had made it clear earlier in the paper some of the most popular uses of Chubby within Google as well as the differences between Chubby and regular DFSs.
Ideas for further work:  I'd be interested to try using machine learning to automatically identify the best location for an application's files (GFS vs. Chubby) and then move them.  This is an interesting case where even though the applications are trusted since they are all internal, they do not always know what is best for them, so some oversight would be helpful.



Paper Title:
The Chubby lock service for loosely-coupled distributed systems
Author:
Burrows
Novel Idea:
When developers are creating new systems they usually do not become concerned with issues such as availability until the system becomes popular. A lock service that is integrated with a file system can mitigate the challenges experienced by developers at this stage.
Main Results:
One Chubby cell can handle up to 90,000 clients simultaneously. This service has been very useful to a variety of Google’s products and has even replaced Google’s DNS servers that were previously used for internal name resolution.
Impact:
This work shows how a file system can be combined elegantly with a lock system in order to allow clients acquiring locks to effectively communicate. It also provides an important of example of the importance of reducing the complexity experienced by developers.
Evidence:
Statistics are provided about a Chubby cell. These show that Chubby can in fact scale to a large number of clients (if designed effectively) because 93% of the RPCs are keep alives. We also see that reads are significantly more common than writes, which supports the ideas of client caches and cache invalidations.
Prior Work:
Chubby borrows many of its ideas from distributed file systems, such as caching. It also utilizes the idea that many objects may be represented as a file, which is an idea present in Unix. Lastly, the idea of providing a lock service is present in VMS.
Competitive Work:
Chubby competes most directly with Boxwood. While Chubby is designed as monolithic system with a simple interface that is accessible to a wide range of developers, Boxwood focuses on providing a toolkit that is only suitable for more experienced developers. Furthermore, the parameters set in Chubby and Boxwood vary dramatically because of their differing assumptions about things such as service uptime.
Reproducibility:
This work would be very challenging to reproduce. Very little information is provided about the environment under which the Chubby cell statistics were gathered. Furthermore, it would be challenging to obtain access to an appropriate datacenter and to obtain access to the code used in Chubby.
Question:
Can we learn some things from Chubby in improving our DNS system or are all of the advantages that are gained the result of operating in the cluster environment?
Criticism:
The experimental results that are provided do not seem substantial enough to validate this work. For example, we do not see how the performance of Chubby scales as the number of clients increases.
Ideas for Future Work:
The parameters used for Boxwood and Chubby vary dramatically. I would like to perform a characterization of the typical cluster environment so that appropriate parameters could be chosen for these types of systems.




Novel Idea (Describe in a sentence or two the new ideas presented in the paper):
-A highly reliable and scalable (90,000+ clients) coarse grained lock service that is easy use (file system like interface).

Impact (What is the importance of these results.  What impact might they have on theory or practice of Computer Systems):
-Being a very popular implemention of Paxos, this paper discusses scale which Academics at the time were unable to achieve. This also inspired the open source project Zookeeper that has many of the same goals. The combination of the publication of this Google paper with the GFS and BigTable papers established Google's reputation in the academic systems realm.

Evidence (What reasoning, demonstration, analytical or empiricial analysis did they use to establish their results):
-The paper has a very anecdotal feeling, presenting many engineering lessons learned. It lacks a substantive evaluation section, in which they could have presented an actual application making use of Chubby, measured various service times and graphed them.

Prior Work (What previously established results does it build upon and how):
-They built on Paxos and a whole slew of other distributed systems work including ordering of messages in distributed systems, and the file system like interface they provide to Chubby.

Competitive work (How to they compare their results to related prior or contemporary work):
-They say, rather unconvincingly, "The large number of file systems and lock servers described in the literature prevents an exhaustive comparison, so we provide details on one." They subsequently compare Chubby to Boxwood's lock server, presenting an convincing argument that they are significantly different. I would have liked to see a short list of the "large number of lock servers described in the literature" especially if they were only planning on addressing a single one of those.

Reproducibility (Could you reproduce the findings?  If so, how?  If not, why not?):
Chubby was built to satisfy a very practical business demand internal to Google. The existance of Zookeeper, which is used heavily inside of Yahoo and has similar goals to Chubby is evicence that the results are reproducable.

Question (A question about the work to discuss in class):
Even inside of Google, where they seemingly have complete control over development of each project, the Chubby dev. team still had to choose a point in the tradeoff space between usability/retrofit-abitlity and advanced features. This theme shows up several times in their discussion, such as when they discuss sequencers for ordering guarantees. Given that they aim for a simple design, and thus cannot please everyone (another tradeoff space), would the software that uses Chubby have benefited in stablitity or correctness if they had chosen to present fewer but arguably more "correct" options to programmers, potentially at the expense of adopability?

Criticism (A criticism of the work that merits discussion):
-They don't provide mandatory locks, only advisory. Couldn't they have provided both for the developer to choose from?
-The table in 4.1 presenting statistics is vague about what their percentage numbers mean for RPC calls, is it percentage of RPC throughput or of RPC calls?
-is the latency associated with the global cell (inter-datacenter communication) obvious to developers or hidden behind an abstraction? Similar to the anti-RPC argument.

Ideas for further work (Did this paper give you ideas for future work, projects, or connections to other work?):
-Over the summer, I helped build a monitoring framework at Yahoo called Chukwa. During the engineering phase, we discussed potential uses of Zookeeper, but decided not to use it because of the overhead of learning yet another non-native framework. Maybe that design decision should be revisited, especially in light of the small, collaborative, clustered nature of the Chukwa data collector nodes.