The Zebra Striped Network File System

John Henry Hartman

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-95-867
December 1994

This dissertation presents a new network file system, called Zebra, that provides high performance file access and is highly available. Zebra stripes file data across its servers, so that multiple servers may participate in a file access and the file access bandwidth therefore scales with the number of servers. Zebra is also highly available because it stores parity information in the style of a RAID [Patterson88] disk array; this increases storage costs slightly but allows the system to continue operation even while a single storage server is unavailable.

Zebra is different from other striped network file systems in the way in which it stripes data. Instead of striping individual files (file-based striping), Zebra forms the data written by each client into an append-only log, which is then striped across the servers. In addition, the parity of each log is computed and stored as the log is striped. I call this form of striping log-based striping, and its operation is similar to that of a log-structured file system (LFS) [Rosenblum91]. Zebra can be thought of as a log-structured network file system: whereas LFS uses a log abstraction at the interface between a client and its servers. Striping logs, instead of files, simplifies Zebra's parity mechanism, reduces parity overhead, and allows clients to batch together small writes.

I have built a prototype implementation of Zebra in the Sprite operating system [Ousterhout88]. Measurements of the prototype show that Zebra provides 4-5 time the throughput of the standard Sprite file system or NFS for large files, and a 15-300% improvement for writing small files. The utilizations of the system resources indicate that the prototype can scale to support a maximum aggregate write bandwidth of 20 Mbytes/second, or about ten clients writing at their maximum rate.

Advisor: John K. Ousterhout


BibTeX citation:

@phdthesis{Hartman:CSD-95-867,
    Author = {Hartman, John Henry},
    Title = {The Zebra Striped Network File System},
    School = {EECS Department, University of California, Berkeley},
    Year = {1994},
    Month = {Dec},
    URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/1994/5596.html},
    Number = {UCB/CSD-95-867},
    Abstract = {This dissertation presents a new network file system, called Zebra, that provides high performance file access and is highly available. Zebra stripes file data across its servers, so that multiple servers may participate in a file access and the file access bandwidth therefore scales with the number of servers. Zebra is also highly available because it stores parity information in the style of a RAID [Patterson88] disk array; this increases storage costs slightly but allows the system to continue operation even while a single storage server is unavailable.  <p>  Zebra is different from other striped network file systems in the way in which it stripes data.  Instead of striping individual files (file-based striping), Zebra forms the data written by each client into an append-only log, which is then striped across the servers.  In addition, the parity of each log is computed and stored as the log is striped.  I call this form of striping log-based striping, and its operation is similar to that of a log-structured file system (LFS) [Rosenblum91].  Zebra can be thought of as a log-structured network file system: whereas LFS uses a log abstraction at the interface between a client and its servers. Striping logs, instead of files, simplifies Zebra's parity mechanism, reduces parity overhead, and allows clients to batch together small writes.  <p>  I have built a prototype implementation of Zebra in the Sprite operating system [Ousterhout88].  Measurements of the prototype show that Zebra provides 4-5 time the throughput of the standard Sprite file system or NFS for large files, and a 15-300% improvement for writing small files.  The utilizations of the system resources indicate that the prototype can scale to support a maximum aggregate write bandwidth of 20 Mbytes/second, or about ten clients writing at their maximum rate.}
}

EndNote citation:

%0 Thesis
%A Hartman, John Henry
%T The Zebra Striped Network File System
%I EECS Department, University of California, Berkeley
%D 1994
%@ UCB/CSD-95-867
%U http://www.eecs.berkeley.edu/Pubs/TechRpts/1994/5596.html
%F Hartman:CSD-95-867