Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences

COLLEGE OF ENGINEERING

UC Berkeley

The Zebra Striped Network File System

John Henry Hartman

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-95-867
December 1994

This dissertation presents a new network file system, called Zebra, that provides high performance file access and is highly available. Zebra stripes file data across its servers, so that multiple servers may participate in a file access and the file access bandwidth therefore scales with the number of servers. Zebra is also highly available because it stores parity information in the style of a RAID [Patterson88] disk array; this increases storage costs slightly but allows the system to continue operation even while a single storage server is unavailable.

Zebra is different from other striped network file systems in the way in which it stripes data. Instead of striping individual files (file-based striping), Zebra forms the data written by each client into an append-only log, which is then striped across the servers. In addition, the parity of each log is computed and stored as the log is striped. I call this form of striping log-based striping, and its operation is similar to that of a log-structured file system (LFS) [Rosenblum91]. Zebra can be thought of as a log-structured network file system: whereas LFS uses a log abstraction at the interface between a client and its servers. Striping logs, instead of files, simplifies Zebra's parity mechanism, reduces parity overhead, and allows clients to batch together small writes.

I have built a prototype implementation of Zebra in the Sprite operating system [Ousterhout88]. Measurements of the prototype show that Zebra provides 4-5 time the throughput of the standard Sprite file system or NFS for large files, and a 15-300% improvement for writing small files. The utilizations of the system resources indicate that the prototype can scale to support a maximum aggregate write bandwidth of 20 Mbytes/second, or about ten clients writing at their maximum rate.

Advisor: John K. Ousterhout


BibTeX citation:

@phdthesis{Hartman:CSD-95-867,
    Author = {Hartman, John Henry},
    Title = {The Zebra Striped Network File System},
    School = {EECS Department, University of California, Berkeley},
    Year = {1994},
    Month = {Dec},
    URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/1994/5596.html},
    Number = {UCB/CSD-95-867},
    Abstract = {This dissertation presents a new network file system, called Zebra, that provides high performance file access and is highly available. Zebra stripes file data across its servers, so that multiple servers may participate in a file access and the file access bandwidth therefore scales with the number of servers. Zebra is also highly available because it stores parity information in the style of a RAID [Patterson88] disk array; this increases storage costs slightly but allows the system to continue operation even while a single storage server is unavailable.  <p>  Zebra is different from other striped network file systems in the way in which it stripes data.  Instead of striping individual files (file-based striping), Zebra forms the data written by each client into an append-only log, which is then striped across the servers.  In addition, the parity of each log is computed and stored as the log is striped.  I call this form of striping log-based striping, and its operation is similar to that of a log-structured file system (LFS) [Rosenblum91].  Zebra can be thought of as a log-structured network file system: whereas LFS uses a log abstraction at the interface between a client and its servers. Striping logs, instead of files, simplifies Zebra's parity mechanism, reduces parity overhead, and allows clients to batch together small writes.  <p>  I have built a prototype implementation of Zebra in the Sprite operating system [Ousterhout88].  Measurements of the prototype show that Zebra provides 4-5 time the throughput of the standard Sprite file system or NFS for large files, and a 15-300% improvement for writing small files.  The utilizations of the system resources indicate that the prototype can scale to support a maximum aggregate write bandwidth of 20 Mbytes/second, or about ten clients writing at their maximum rate.}
}

EndNote citation:

%0 Thesis
%A Hartman, John Henry
%T The Zebra Striped Network File System
%I EECS Department, University of California, Berkeley
%D 1994
%@ UCB/CSD-95-867
%U http://www.eecs.berkeley.edu/Pubs/TechRpts/1994/5596.html
%F Hartman:CSD-95-867