Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences

COLLEGE OF ENGINEERING

UC Berkeley

Elkhound: A Fast, Practical GLR Parser Generator

Scott G. McPeak

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-02-1214
December 2002

http://www.eecs.berkeley.edu/Pubs/TechRpts/2002/CSD-02-1214.pdf

Elkhound is a parser generator based on the Generalized LR (GLR) parsing algorithm. Because it uses GLR, Elkhound can parse with any context-free grammar, including those that are ambiguous or require unbounded lookahead. Due to a novel improvement to the GLR algorithm, wherein the parser can choose between GLR and ordinary LR on a token-by-token basis, Elkhound parsers are about as fast as LALR(1) parsers on deterministic portions of the input.

Unlike existing GLR implementations, Elkhound allows the user to associate arbitrary action code with reductions, and to directly manage subtree sharing and merging. These capabilities make Elkhound adaptable to a wide range of applications and memory management strategies.

To demonstrate Elkhound's effectiveness, we used it to build a parser for C++, a language notorious for being difficult to parse. Our C++ parser is small (3500 lines), efficient, maintainable, and elegantly handles even the most troublesome constructs -- by delaying disambiguation until semantic analysis when necessary.


BibTeX citation:

@techreport{McPeak:CSD-02-1214,
    Author = {McPeak, Scott G.},
    Title = {Elkhound: A Fast, Practical GLR Parser Generator},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2002},
    Month = {Dec},
    URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/2002/6187.html},
    Number = {UCB/CSD-02-1214},
    Abstract = {Elkhound is a parser generator based on the Generalized LR (GLR) parsing algorithm. Because it uses GLR, Elkhound can parse with any context-free grammar, including those that are ambiguous or require unbounded lookahead. Due to a novel improvement to the GLR algorithm, wherein the parser can choose between GLR and ordinary LR on a token-by-token basis, Elkhound parsers are about as fast as LALR(1) parsers on deterministic portions of the input. <p>Unlike existing GLR implementations, Elkhound allows the user to associate arbitrary action code with reductions, and to directly manage subtree sharing and merging. These capabilities make Elkhound adaptable to a wide range of applications and memory management strategies. <p>To demonstrate Elkhound's effectiveness, we used it to build a parser for C++, a language notorious for being difficult to parse. Our C++ parser is small (3500 lines), efficient, maintainable, and elegantly handles even the most troublesome constructs -- by delaying disambiguation until semantic analysis when necessary.}
}

EndNote citation:

%0 Report
%A McPeak, Scott G.
%T Elkhound: A Fast, Practical GLR Parser Generator
%I EECS Department, University of California, Berkeley
%D 2002
%@ UCB/CSD-02-1214
%U http://www.eecs.berkeley.edu/Pubs/TechRpts/2002/6187.html
%F McPeak:CSD-02-1214