Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences

COLLEGE OF ENGINEERING

UC Berkeley

Practical Shape Analysis

Bill McCloskey

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2010-57
May 10, 2010

http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-57.pdf

Shape analysis is a program analysis technique used to prove that imperative programs using manual memory management will not crash. In the past, shape analysis has been applied to data structures like linked lists and binary trees. It has also been used on simplified versions of Windows device drivers. We describe techniques that allow us to apply shape analysis to data structures that occur commonly in systems code. These data structures often use arrays, hash tables, C strings, and buffers of a known size. Sometimes, memory in these data structures is managed by manual reference counting. Analyzing such code is difficult or impossible with existing shape analyses. Most difficult of all, many data structures use several of these patterns at the same time, such as a hash table pointing to reference counted objects through which a doubly linked list threads. We describe an analysis capable of handling these data structures easily and efficiently. Our technique uses abstract interpretation over the combination of two abstract domains. One, based on three-valued logic, is used for analyzing the heap. The other domain reasons about integers and set cardinality. The key feature of the combined domain is that quantified facts can be shared between the integer and heap domains. The precision we achieve is significantly greater than if either domain were used independently. Besides improvements in precision, we also describe changes that make both domains more scalable and efficient. We present the results of experiments analyzing the cache data structure of the thttpd web server, which uses a hash table, linked lists, and reference counting in a single data structure. We successfully prove the absence of memory errors in about two minutes.

Advisor: Eric Brewer


BibTeX citation:

@phdthesis{McCloskey:EECS-2010-57,
    Author = {McCloskey, Bill},
    Title = {Practical Shape Analysis},
    School = {EECS Department, University of California, Berkeley},
    Year = {2010},
    Month = {May},
    URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-57.html},
    Number = {UCB/EECS-2010-57},
    Abstract = {Shape analysis is a program analysis technique used to prove that imperative programs using manual memory management will not crash. In the past, shape analysis has been applied to data structures like linked lists and binary trees. It has also been used on simplified versions of Windows device drivers.

We describe techniques that allow us to apply shape analysis to data structures that occur commonly in systems code. These data structures often use arrays, hash tables, C strings, and buffers of a known size. Sometimes, memory in these data structures is managed by manual reference counting. Analyzing such code is difficult or impossible
with existing shape analyses. Most difficult of all, many data structures use several of these patterns at the same time, such as a hash table pointing to reference counted objects through which a doubly linked list threads.

We describe an analysis capable of handling these data structures easily and efficiently. Our technique uses abstract interpretation over the combination of two abstract domains.  One, based on three-valued logic, is used for analyzing the heap. The other domain reasons about integers and set cardinality. The key feature of the combined domain is that quantified facts can be shared between the integer and heap domains. The precision we achieve is significantly greater than if either domain were used independently.

Besides improvements in precision, we also describe changes that make both domains more scalable and efficient. We present the results of experiments analyzing the cache data structure of the thttpd web server, which uses a hash table, linked lists, and reference counting in a single data structure. We successfully prove the absence of
memory errors in about two minutes.}
}

EndNote citation:

%0 Thesis
%A McCloskey, Bill
%T Practical Shape Analysis
%I EECS Department, University of California, Berkeley
%D 2010
%8 May 10
%@ UCB/EECS-2010-57
%U http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-57.html
%F McCloskey:EECS-2010-57