(Fall 2012 - Bodik and Necula): What is a continuation? Pick any two flavors of "general" continuations Problem 1: This problem concerns the definition of an abstract interpretation over the domain of intervals. The abstract interpretation analysis will have to compute at each program point of an imperative program an interval abstraction of the form [L .. H] (or, L <= x <= H) for each variable in the program. a) Define the abstract domain with all its aspects. b) Define a concretization and an abstraction function. How does the concretization function relate to the ordering relation in your domain? What relationship exists between the concretization and the abstraction function? Consider the following simple imperative language:e ::= n | x | e1 + e2 | e1 - e2 c ::= skip | x := e | c1; c2 | if e >= 0 then c1 else c2c) Define the abstract interpretation for the conditional command. State the soundness property. d) Consider the program "if x - 2 >= 0 then x = x + 1 else x = 3". What refinements do you need to make to your abstract interpretation to be able to infer that x >= 3 at the end of that command. Problem 2: The second question asked about datarace detection. (a) Define datarace. (b) Is datarace a static or a dynamic property, and what does "static" vs. "dynamic" means in the context of datarace detection? (c) What is the relationship of the datarace to other nondeterministic conditions in concurrent shared memory programs? Specifically, what (harmless) nondeterminism does datarace intentionally not model? What (harmful) nondeterminism conditions datarace fails to model? (d) Name key program analysis techniques for datarace detection. Are these techniques static or dynamic? Can they be either, ie, is the datarace property orthogonal from whether we are analyzing a program or an execution? (Also, what is an execution in a concurrent program?) (e) Define the lockset-based datarace detection technique. Can we think of the lockset property as an abstract domain (as in abstract interpretation)? If yes, what is the concrete semantics that the lockset abstraction approximates? Does lockset have false positives? If yes, give an example of a detected datarace that is not a real datarace. Does lockset have false negatives? Again, give an example. Finally, does dynamic lockset analysis generalize its analysis of a single execution into properties of similar executions? (f) Repeat (e) for happens-before detection. (g) Implementation. Which of lockset / hb is more efficient to implement? Can you suggest a strategy for increasing the efficiency of the more expensive detector? What is the effect of your strategy on false positives, false negatives? (Spring 2012 - Bodik and Hilfinger): What is a continuation? Pick any two flavors of "general" continuations (ie not exceptions) and define their semantics. (These could be call-cc, reset/shift, coroutines.) Show example programs using these continuations. Variations of continuation constructs: What is the rationale for the spectrum of continuations? What program concept does a continuation reify? How is a continuation represented? How many times can a continuation be invoked, and how does the choice influence the implementation efficiency. Give three examples using a continuation construct X to implement language construct Y, eg exceptions with call/cc, iterators with coroutines, regex with coroutines. Implement a lazy iterator for a recursive tree traversal with a coroutine. Consider the impact of semantic differences between Lua and Python coroutines/generators, on programmers, and on language implementers.(Fall 2011 - Necula and Sen): Questions: 1. * Write the syntactic rules for typed lambda calculus with integers. * Add exceptions to this language, with the following requirements: ** The exceptions have a name and carry a value ** Exceptions are first-class values ** The program can declare new exception names with static scoping, like variable names ** The exception handlers can specify the name of the exceptions to be handled ** Maintain type soundness * Show the syntax of the extended language * Show an example of an expression that uses all the new constructs and explain informally how you type check and run the expression. You can assume that you have basic types and operators (integers, arithmetic, booleans, comparisons). Compare this example with a similar one written in Java. * Show the typing rules * Show the operational semantics 2. * Describe static and dynamic scoping using examples. * Explain the advantages and disadvantages of static and dynamic Scoping both in terms of implementation and usability. Is is possible to compile dynamically scoped languages efficiently?
(Spring 2011 - Hilfinger & Necula): 1. How might one use "reflection" in a programming language? 2. What kind of overheads can one expect from the use of reflection (as compared to conventional accesses to fields and methods)? What might a compiler/runtime implementor do to improve performance of reflection mechanisms? 3. Consider a simple language with the following type system: T ::= string | { f1 : T1, ..., fn : TN } | object ,where f1-fn are field names, object is the supertype of all other types. Assume that T list is an abbreviation for the type { car : T, cdr : T list}. Consider that the language runtime system provides the following reflection functions: gfs : returns the list of field names for a value. The function is not defined for string. gf : given an object and a field name returns the value of the field. isString : given an object returns true iff the object is of type string. 3a. Write the type of these functions. 3b. How should gf behave when passed a non existent field name? 3c. Write a pretty printing function using these functions. 3d. What exceptions may your function throw, according to a standard type checker? 3e. How can you enrich the type system to enable the type checker to verify the absence of exceptions in your function?"
(Fall 2010 - Bodik & Sen): "The first question asked about dataraces and memory models: 1. What is a data race? Why multithreaded programs should avoid data races? 2. The following sequential program finds a minimum cost solution: best = infinity for (w in queue): cost = compute_cost(w) if cost < best: best = cost best_soln = w Parallelize this code. 3. best = infinity for (w in queue): if (lower_bnd(w) >= best): continue cost = compute_cost(w) if cost < best: best = cost best_soln = w Note 1. compute_cost is expensive call Note 2. lower_bnd is inexpensive call Note 3. lower_bnd(w) <= compute_cost(w) Parallelize this code. The second question asked about continuations. What is a continuation? What language constructs can be encoded with (are special case of) continuations? What is continuation-passing style? What are the uses and benefits of CPS? Transform a program into CPS form. Translate an exception construct into lambda calculus in CPS form."
(Fall 2008 - Hilfinger & Sen): "1. This question concerns the differences between static and dynamic typing in programming languages. a. Define what these mean. Illustrate the difference. b. Methodology: what methodological advantages are claimed for statically typed languages? For dynamically typed ones? c. Formal semantics: Compare and constrast the formal semantic definitions of statically and dynamically typed languages. d. Implementation: The compiler presumably knows more about statically typed programs. How can it use that information? What analyses might it perform on dynamically typed programs to get some of this information? 2. Assume that we have a simple program whose syntax is Prog ::= L: START; S* S ::= L: Var = C S ::= L: Var = Var op Var op ::= + | - | * S ::= L: if (Var cmp Var) goto L cmp ::= > | < | <= | >= | == | != S ::= L: ERROR S ::= L: END Assume we have a set of n predicates P1, P2, P3, ..., PN over program variables. Assume [P] is the set of program states that satisfy P. What is [P1 \/ P2]? What is [P1 /\ P2]? What is [~P]? What is P1 -> P2 in terms of [P1] and [P2]? When do we say that a formula P1 is weaker than P2? What is the weakest precondition of a formula P and a state S? How do you compute the weakest precondition of a statement of the form WP(P, x = e)? How do you compute the weakest precondition of a statement of the form WP(P, assume [x1 == x2])? How do you compute the weakest precondition of a statement of the form WP(P, S1; S2)? What is symbolic path constraint of a program execution? Given an execution path how can you compute the symbolic path constraint using weakest pre condition computation? What is strongest post condition? Assume that the program states are abstracted by the set of predicates P1, P2, ..., PN. Given an abstract state F, how do you compute the state resulting from the execution of S using weakest pre condition computation? How can you use symbolic execution to compute strongest post condition? Given a program, how can you compute a predicate abstraction? Is it finite? What is the complexity of a predicate abstraction? How can it be used to compute program correctness?"
(Spring 2008 - Bodik & Hilfinger): "Question 1: The question asked about automatic vectorization and array dependence analysis. The question was based on the paper "Automatic Translation of FORTRAN Programs to Vector Form". The subquestions were as follows: What is vectorization? What is a vector machine? Give a loop that is vectorizable and one that is NOT. Describe some transformations that you perform before vectorization. What is the goal of these transformation? What does it mean when we say that a statement depends on itself? Formalize the notion of dependence for the loop "do i= 1,N: X(f(i)) F(X(g(i)))". Express the condition for existence of the dependence using equation f(x) - g(y) = 0 (dependence exits if solution lies in a particular x-y region.) Derive the GCD dependence test. What are the limitations of the GCD test? How can we improve on the GCD test by means of solving the above equation in the domain of reals? Dependence graphs: what do they express? Name the three kinds of dependences. Why do these dependences must be enforced during loop transformations? Question 2: This was intended as an open-ended question to see how well candidates understood some of the issues raised by parallel programming. The parts below are the collective result of where the three students took this question. a. What makes parallel programming difficult? b. What memory models accurately describe the behavior of actual programs? c. How might one deal with a weak memory-consistency models? d. How does functional programming address the issues of parallel programming? e. What problems does one encounter when attempting to use functional programming to write parallel programs?"
(Spring 2006 - Bodik & Graham): "Question 1: This question presented a simple language extension to Java and asked the student to develop static analysis for analyzing programs written with this extension. Specifically, single-threaded Java was extended with multi-threaded constructs for distributed memory computing (places, async's), inspired by the X10 language. The goal was to develop analysis for proving safety of run-time remote-access checks. Question 2: The question asked the student to give a high-level outline of the structure of an optimizer for an optimizing compiler and to explain the respects in which an optimizer only approximates optimality. The student was then asked to explain what kind of runtime information might inform optimization, how that information might be collected, and how it might be used."
(Fall 2005 - Hilfinger & Necula): "1. Write the notation and the definition for Hoare triples for partial and total correctness. Discuss how the definitions must be changed for non deterministic languages. Show a few partial correctness rules. Show the rule for "do c until b" and argue informally why is it sound and complete. Show a total correctness rule for a looping construct. 2. This question concerns a programming language and environment (compiler, editor, project manager, etc.) with these features: A. A program consists of a set of units (typically stored one per file). Units are either 'specifications' or 'bodies' of packages (modules), generic packages, or subprograms. B. A specification of a subprogram is like a C function declaration (i.e., no body). A specification of package is like a C++ namespace, and contains only type declarations, subprogram specifications, variable declarations, and (nested) package specifications (possibly generic). A specification of a generic package is just like that of a package, but can be parameterized by types and values (incl. subprograms), as in C++. C. Each unit explicitly imports all subprograms and packages it refers to (other than the standard prelude). D. Bodies of packages contain bodies for all subprograms and nested packages in the specification, plus any additional types, subprograms, (static) variables, nested packages, and static initializers that are needed. E. Specifications of subprograms may be marked "inline". F. The programming environment for this language allows you to compile by specifying a unit. As long as all SPECIFICATIONS mentioned by that unit (recursively) are available, the compilation will succeed, performing all static checks on the available units (and bodies, too, if available). That is, if the compilation of each body succeeds in the absence of all other bodies, the compilation will succeed when all bodies are present. G. If all bodies are present as well, the system will also produce a linked executable. Typically, one specifies the subprogram that is to serve as the "main program", and the environment does whatever is necessary. H. The environment attempts to honor as many inline declarations as feasible. I. The environment attempts to avoid recompiling anything that doesn't need recompilation in order to speed up compilation. In answering the questions, assume we understand ordinary C compilation, and filter out anything that would be obvious to someone who has implemented a C compiler. We've left out a lot of detail about the language; you may assume the rest is a "typical" language. Questions (very open-ended): Q1: What language restrictions that we didn't mention must be true for this design to be possible? Q2: How would you implement the compiler so as to have the characteristics described in F-I?"
(Spring 2005 - Bodik & Graham): "Question 1 SSA Form: definition, properties, construction. Sub-questions follow. -What does “SSA” stand for? -What types of programs was SSA defined for? functional, imperative, OO -Briefly, what is the key idea behind transforming a program into SSA form? -Define SSA form, i.e., give a rule for when a program is in the SSA form? -Construct SSA form for an example program -Why does SSA simplify bookkeeping in optimizations -How much can the program grow as a result of SSA transformation? o show a pathological case -Optimal SSA o There is a commonly used notion of SSA optimality. What aspects of the form does it refer to? -Naive SSA construction -Applications, benefits of SSA o SSA makes some analyses/transformations/optimizations more efficient, even asymptotically. Why? o SSA makes some analyses more effective (i.e., more precise). Why? -The dominance frontier algorithm. Question 2 What are macros? What are their purposes? How are they implemented? Macros are “outside” the programming language and often don’t use PL principles. How can the purposes of macros be achieved by language redesign or augmentation? How do the language solutions differ from the pre-processor solutions? [If there is time] Many language designers like to have as much information as possible determined at compile-time rather than runtime. Why is that desirable? Give some examples of language features and trade-offs if the information can change at runtime."
(Fall 2004 - Hilfinger & Yelick): "Q1: For the first question, we gave the student a brief paper summary of the original (1976) Gries-Owicki axiomatization of parallel execution. And asked the following questions: 1. Justify the axiom for cobegin ... coend in the absence of shared regions ('with' statements in their syntax). 2. Justify the axiom for 'with' itself. 3. What, if any, relaxation on the conditions in the axioms is possible if we assume that the processes in a cobegin coend are executed sequentially in an non-deterministic order rather than in parallel? 4. How would you formulate the actual behavior of global-variable reference within their model (esp. taking into account memory consistency models)? Q2: The second question was about the following code fragment: for i = 1 to 3 for j = 1 to 4 b[j,i] = a[j,i-1] + a[j,i+1] a[j,i] = a[j,i] + b[j,i] We first asked the student to draw a picture of the iteration space for this loop, which is did without trouble. We then asked about possible optimizations and what types of checks would a compiler have to do to see of those transformations were legal. A Pentium 4 architecture was used as the example."
(Spring 2004 - Bodik & Graham): "Q1: The first question explored the structure of a compiler, then a source-to-source compiler, then a C++ to Java translator. Q2: The second question asked about the varieties of garbage collectors, then about the implementation of stop-and-copy collectors, and finally about incremental stop-and-copy."
(Fall 2003 - Bodik & Necula): "Q1: The first problem was concerned with axiomatic semantics. Students had to define the notions of partial and total correctness, based on the operational semantics, for a simple imperative language. Then the students had to write the derivation rules for the partial correctness assertions. The last part of the question was concerned with the relationship between weakest preconditions and partial correctness assertions. Q2: The second question asked the student to develop static analysis for verifying (model checking) the lock usage in a simple imperative program (with no pointers and no procedures). First, the question sought to determine whether flow-sensitive or flow-insensitive class of analyses would be more suitable for this verification task, focusing on differences of the two styles of analyses and inherent limitations of the weaker of the two analyses. Next, the question asked to develop dataflow analysis for this verification problem. The focus was on making the analysis as efficient as possible, and on identifying "symmetry" with a similar error checking problem. Finally, several questions about the role of monotonicity and distributivity in dataflow analysis were asked."
(Fall 2002 - Necula & Yelick): "Q1: Subtyping in lambda calculus Q2: After being presented with a short fragment of code performing a matrix operation, discuss various optimizations that could be performed on that code."
(Spring 2002 - Graham & Hilfinger): "Q1: Semantics Q2: Optimization"
(Fall 2001 - Aiken & Necula): "1. Parametric polymorphism. Syntax and typing rules of polymorphic lambda calculus. Kinds of parametric polymorphism. Let polymorphism. 2. Graph-coloring register allocation. The problem to the solved, computing the register intereference graph, the coloring heuristic, what to do when the heuristic fails, register spilling."
(Fall 2000 - Yelick & Necula): "Q1: The first question was about the semantics of a simple object oriented language. Consider the language of expressions: e ::= x | new T | if e1 then e2 else e3 | while e1 do e2 x := e | let x:T = e1 in e2 | e0.m(e1_,...,en) In the above language, all variables are introduced by let. They are statically scoped and the scope is the body of the let (e2). The language of types consists of a set of class identifiers along with a single inheritance hierarchy. There is a distinguished class called Bool. a) Define a typing judgment and write typing rules for this language taking into consideration subtyping originating from subclassing in the class hierarchy. How do the typing rules change if we have multiple inheritance? b) Define an evaluation judgment and write the evaluation rules. Consider that the set of values consists of elements of the form Obj(T) c) If you have Gamma |- e : T and (Sigma, e) -> (Sigma', Obj(T')) what relationship is there between the two judgments and under what conditions.
Q2: This question is about implementation of object-oriented languages. Consider the following Java program fragment: In Java: class X { public int val; public int get () { return val; } public int set (int y) { val = y; } } Part 1) Draw a picture of the runtime structures that would exist after the following statements have executed: X x1 = new X(); X x2 = new X(); Part 2) Inheritance Now assume there is an additional class: class B extends A { set (int z) { ... } print () { ... } } How would the picture change after the creation of a B object? B b = new B(); Part 3) Method Inlining For Java: If objects are small, this is quite expensive, even to access the data fields. There are two sources of inefficiency: the method call overhead and the object indirection. Under what circumstances can a method be inlined? Part 4) Object Inlining Under what circumstances can the extra level of indirection in the object representation be eliminated, i.e., when is it a safe transformation for the compiler to take an object (represented as a pointer to a record) and store the record in place of the pointer? Part 5) Distribution Imagine you now want to implement a system in which you can access objects that are on other machines. In particular, say you have an object on some processor P2 and you want to call methods on it from P1. What kind of code and runtime structures would you need on your own node to make this work?"
(Spring 2000 - Aiken & Yelick): "Q1: Regarding the design and implementation of various garbage collection schemes; Q2: Regarding efficient compilation of a subset of MATLAB."
(Fall 1999 - Necula & Hilfinger): "Q1: Given this program fragment: x < -1 L: y < -x x < -x + 1 if P(x) jmp L return x - translate to SSA; - optimize the SSA; and - translate back. Q2: Given a language with templates, write some typing rules, and explain some of the issues templates raise in implementing a compiler."
August 2000