Artificial Intelligence Questions


(Spring 2013 - Abbeel* & Bartlett):

  1. MDPs
    What is an MDP? Formulate elevator control as an MDP. Describe either value iteration or policy iteration. Explain how partial observability can be handled.

  2. Kernel methods
    Formulate linear regression as a maximum likelihood learning problem. How would you compute a MAP estimate with a Gaussian prior on the parameters? Explain how these two regression methods can be kernelized.

  3. Vision
    Explain how to recover depth from stereo images. Explain an algorithm for camera calibration. Explain an algorithm for object recognition.

  4. Statistical language processing
    What is a PCFG? Explain how you could estimate a PCFG from data. What are N-gram language models and how can they be estimated from data?

  5. Graphical models
    Consider a directed graphical model with discrete variables. Describe an algorithm for parameter estimation and discuss its computational complexity. How could you deal with missing data and parameter sharing?

(Spring 2012 Jordan and Abbeel):

  1. Reinforcement Learning

  2. Computer Vision

  3. Graphical models

  4. NLP

(Fall 2010 - Klein & Malik):
  -- Express the ASR problem as an HMM.  What are the features and state space?
  -- Imagine you have a sequence of words and a PCFG in CNF.  Define the quantity computed by the CKY algorithm and give the recurrence used by the algorithm.
  -- How would you jointly do the ASR and parsing?  Explain how to merge the two algorithms.

  -- Define texture.
  -- Explain how to segment an image by texture.  What is the key reason simple methods like convolving with basic filters don't work? What is the solution?  How does segmentation work?
  -- Explain how to classify a region given reference examples of various textures.  What would the features be, etc?

Graphical Models / Search:
  -- [Example GM, questions about what is conditionally independent of what] 
  -- State the Bayes ball (or d-separation) algorithm as a state-space search problem.  What are the states, successor function, and so on?
  -- Give an admissible heuristic for telling whether a source and target variable are conditionally independent given evidence.

Reinforcement Learning:
  -- State Blackjack as an MDP.  What are the states, rewards, etc?
  -- What is a value?  How do you compute the values in a known MDP.
  -- Imagine you did not know the transition function for Blackjack (or didn't want to compute it).  Describe how to determine values of a policy.
  -- What is q-learning?  What is the difference between on- and off-policy learning?  What is the purpose of an epsilon-greedy policy?"

(Spring 2009 - Klein & Jordan):
"1) What is a CSP?  Define formally what a constraint is, etc.  What is
arc consistency and why would we want it?  Give an algorithm for
establishing it.

2) What is PCA?  What is kernel PCA?  Why would we use it?  What are
the inputs and outputs?  Give the algorithm.

3) What is the EM algorithm?  What is it for?  What does it do,
formally?  How does it work for a bivariate Gaussian with missing
components?  What sufficient statistics?

4) Describe how a statistical machine translation system works.  What
data does it require?  How are the models learned?  How does decoding

5) Describe how depth can be recovered from an image.  What about
without stereo or motion?"

(Spring 2007 - Bartlett & Jordan):
"Candidates were asked to answer the following four questions covering reinforcement 
learning, vision, language and graphical models:

1. What is an MDP?
   Explain how to formulate the game of Tetris as an MDP.
   Describe an algorithm for finding the optimal policy for this game.
   Suggest an efficient approximate approach that might be more effective in practice.

2. What is the image segmentation and why might one want to segment an image?
   Describe an image segmentation algorithm and explain its advantages and disadvantages.

3. What is part-of-speech tagging?
   Suggest two approaches to the part-of-speech tagging problem.

4. What is the junction tree algorithm?
   Given a parameterized graphical model, and given observations of a subset of
   nodes of the model, describe a general approach to maximum likelihood parameter 
   estimation for the model.  What role might the junction tree algorithm play in 
   this procedure?"

(Fall 2006 - Klein & Malik):
"General AI:
   -- What is an MDP?  Define it formally.
   -- What are the Bellman equations
   -- What is the value iteration algorithm?
   -- What is Q-learning?

   -- Define linear regression as an optimization problem.
   -- What is a kernel and why are they used?
   -- Can you kernelize linear regression?  How?

   -- Describe a chain of processing from a wave file to first-order logic.
   -- How does ASR work?  What is an HMM in general?  What is the Viterbi algorithm?
   -- How does parsing work?  What is an efficient algorithm for probabilistic parsing?
   -- How might semantic translation to first-order logic work?

   -- Define the object recognition problem.  What makes it hard?
   -- How might you segment and classify a zip code from a picture of an envelope?
   -- What is edge detection, and how does it work?"

(Fall 2005 - Klein & Jordan):
"General AI:
  1. Robot motion planning
     a.  Draw a configuration space for a 2D robot arm with rotational shoulder and elbow joints.  
	   Is the mapping from configurations to workspace positions of the actuator invertible?
     b.  If the shoulder joint is suspended from the ceiling, how does the free space change?
     c.  Assume now there's a floor with a hole in it that the arm move obstacles into.  Why is 
	   the free space no longer convex?
  2. Search
     a.  How could I use state-space search to plan paths for the robot discussed before, assuming 
	   there is no uncertainty in the robots movements and the obstacles are fully known?  What 
	   are the drawbacks to the method you proposed?
     b.  How could I use A* search?  What would be an admissible heuristic?

Statistical Learning Theory:
  1. Kernels
     a.  What is Fisher discriminant analysis (aka linear discriminant analysis)?
     b.  Can it be kernelized?  What does that mean?  Why do you know that the 
     parameter vector will be a linear combination of transformed data points?
  2. Parameter estimation
     a.  Show how logistic regression can be motivated from a generative
     perspective.  In particular, what assumptions on the class-conditional
     densities do you need.
     a.  Given a specific generative model, describe two ways to estimate 
     parameters for the resulting logistic regression, using maximum likelihood
     in both cases.  Will they give the same results?
  3. Graphical models
     a.  What is the treewidth of a graph?  What does it have to do with the
     complexity of inference?
     b.  What is a junction tree?  How is it different from an arbitrary tree of cliques?

Language and Speech
  1. PCFGs
     a.  What is a PCFG?  How could one be used in a speech recognition?
     b.  Sketch an algorithm for finding the best parse or the sum of all parses for a sentence.  
         What are the time and space complexities of your algorithm and why?
  2. Semantic Translation
     a.  Write a reasonable translation of "Tom likes Amy" into FOPL.  How could you derive 
         this expression compositionally from a parse tree.
     b.  How about "Everyone likes Amy"?  "Everyone likes someone"?
     c.  How might we represent tense and aspect in this process?

  1. Shape detection
     a.  What is a Hough transform?  How can it be used for segmentation?
  2. Depth
     a.  What are some techniques for finding depth in images?"

(Spring 2005 - Russell & Wilensky):
"1. Vision
	Define object recognition and explain why it's hard.
	How might learning help?
	Why might it be important for higher-level visual
		hypotheses to inform lower-level processes such 
		as edge detection?
2. KR
	What are the principal differences between propositional and 
		first-order formalisms?
	Can you write the rules of chess in propositional logic?
	Are graphical models propositional? Can we make them first-order?

3. Planning
	How might actions be described in first-order logic?  
	What are the benefits and drawbacks to such a representation?  
	What are some alternatives?
	Are there domain-independent heuristics for planning?

4. Language, speech
	What are some important types of ambiguity that occur in 
		natural language use?  
	Pick one of these and explain some approaches to addressing it.  
	What are some examples of semantic ambiguity other than 
		lexical semantic ambiguity?
	What sorts of ambiguity arise in speech? How do speech systems
		resolve those ambiguities?

5. Robotics
	How can a robot get its hand from here to there?
	What are some of the problems that one would encounter in designing 
	a robot perform some task in a new environment, such as finding a 
	telephone in an unfamiliar house? Solutions?

6. Learning
	Suppose you wanted to get a machine to play Tetris.  
	How might you formulate and solve this as a learning problem?  
	Ditto for battleships"

(Fall 2004 - Feldman & Klein):
  1. What are some levels of structure in natural languages, and what formal models 
     are used for them?
  2. Why might we want to model syntax as something other than a CFG?  Name some 
     options and reasons.

  1. What is state-space search?  Describe a general heuristic approach to search.
  2. What is A* search, what is admissibility, and why is it important? What would 
     happen if we used an inadmissible heuristic?
  3. Contrast state-space search and dynamic programming.  Can they be combined?

  1. Describe your favorite classifier.  What objective function does it maximize, 
     and what are its properties?  What is its mechanism for preventing overfitting?
  2. When might your classifier be inappropriate?  What are its weaknesses? What would 
     you use instead, and why?

  1. For 2D images, suggest a domain where automatic visual recognition is successful. 
     What representations and techniques are used?
  2. For 3D, do the same."

(Spring 2004 - Jordan & Russell):
"1. What is a planning problem and how could one solve such problems using
    a satisfiability algorithm?

2. What is boosting and how does it work? What formal properties
   of boosting can be shown?

3. Consider the problem of moving a robot arm from one place to another.
   How can such problems be formulated and solved?

4. What is the part-of-speech tagging problem and how can it be solved?

5. Explain the EM algorithm for hidden Markov models. Is forward-backward
   an elimination algorithm?"

(Fall 2003 - Feldman & Wilensky):
"1. Vision 
	Motion is a major problem for visual systems.
	Describe how Kalman filters can be used in tracking moving objects.  
	How would you handle the problem of data association?  
	To what extent is the problem of tracking people solved?

2. Language 
	What is the problem of reference resolution?  
	Can it be divided into subproblems that may be subject to different constraints?  
	Does the need to address this problem have representational consequences?  
	What are some possible algorithms to address this problem?

3. Speech recognition 
	What are hidden Markov models, and why and how are they used in speech recognition?  
	How are HMMs trained?  
	What other components are needed in a speech recognition system? 

	A current speech recognition system, given two utterances, transcribed them as follows: 
		1.	“One carrot cake, one carat diamond, one karat gold ring”
		2.	“My cake is made with carrots. Diamonds are weighed in carrots. 
			Gold is weighing the in carrots.”
	The sound variously transcribed as “carrot”, “carat” and “karat” is identical.   
	How might one account for this variation in performance?

4. Logic and planning 
	Assuming a STRIPS-like representation, how might we represent a simple 
	operator, e.g., "Go" (i.e., the robot goes from one location to another).  
	Such a representation is generally preferred to a "pure" logical formulation.  
	Why?  Compare state-based and plan-based approaches to problem solving.  
	Suppose that our robot could pick something up and take it with it.  
	How might we reason now about the consequences of actions?  
	How is the planning process complicated? 

5. Search
	Propose an approach to solving cryptarithmetic problems.  
	Within this approach, what are some different strategies one can exploit?

6. Learning
	Tell us about your favorite learning algorithm.  
	Is there a class of problems for which it is not appropriate?"

(Spring 2003 - Feldman & Russell):
"1. Vision
        What is segmentation and what role does it play in the overall
        vision problem? [Explain with reference to a specific image.]
        How is segmentation done?
        What is the connection between segmentation and edge detection?
        How is edge detection done?

2. Language
        What is unification grammar? What basic advantage does it
        offer over standard context free grammars.
        Give an example of a unification grammar rule that
        incorporates number agreement.
        What about semantics?
        What are some limitations of unification grammar?

3. Speech recognition
        What is a language model and how is it used in speech?
        Describe the probability models used in statistical speech recognition.
        Describe the EM algorithm and how it is used to train these models.

4. Logic and planning
        Describe an efficient SAT algorithm. Is your algorithm complete?
        Explain how to use a SAT algorithm to solve a STRIPS planning
        problem such as the following:
           Preconditions: At(x)
           Effects: ~At(x), At(y)
           Initial state: At(A)
           Goal state: At(B)

5. Search and robotics
        Describe an algorithm for finding the shortest path to a
        known goal location in a deterministic grid world with obstacles.
        What if the map of the grid world is not available?
        How can a real robot build a map of a real environment?

6. Learning
        Explain decision tree learning
        Explain how to combine this with boosting
        Explain the problems involved in learning Bayesian networks
        from data, distinguishing between Bayesian and maximum-likelihood

(Spring 2002 - Jordan & Russell):
"See ( for the
reading requirements."
(Fall 2001 - Feldman & Wilensky):
"1) What are computational methods for going from 2D images to
   models of the 3D world?

2) Tell us about your favorite learning algorithm.

3) Discuss state-space and plan-space planning techniques.

4) Compare Neural Networks and Bayesian Networks.

5) Discuss the feasibility of Context Free Grammars for Natural
   Language processing."

(Fall 1999 - Jordan & Russell):
"Q1: search as a solution to robot path planning 
  a. admissible search algorithm
  b. heuristics
  c. what if you can't assume polygonal obstacles (configuration space)?

Q2: belief networks 
  a. are diagnostic or causal queries easier?
  b. can we remove a node from the network for a given query?
  c. complexity of variable elimination for diagnostic/causal

Q3: mdps: what are two ways of solving them / pros-cons?
  a. what if they are not observable?
  b. what is the size of the search space (unobservable)?
  c. why should we assume the Markov property holds in belief space?

Q4: mdps vs. dbns
  a. complexity of inference (forward algorithm)
  b. what if the model is a mixture of gaussians?  complexity of forward
chaining then?  what if the gausisian nodes have discrete parents?  what

Q5: mdp vs. reinforcement learning 
  a. what are the differences?
  b. what are some strategies for reinforcement learning?"

(Fall 1998 - Malik & Wilensky):
"Q1: Draw a belief network with two causal nodes A and B, and one
symptom node C.  What are the independence relationships in the network?
Number of parameters in the CPT?  Write a joint probability.  What will
happen if we reverse some arrows (can we still capture the same
independence relationships)?

Q2: Draw a belief network for "smoking", "lung cancer", "coughing",
"positive x-ray", and "black lung".  How do you do diagnostic query?
What do you do when poly tree doesn't work here?  Suppose we use
clustering.  Is there any systematic way of deciding which nodes to

Q3: Construct a neural network for the example in the last question.
Compare the neural nets with belief nets, both domain independent and
domain specific properties.  Why do people use sum-of-square error
instead of sum-of-absolute error in neural net training?

Q4: What is non-linear planning?  Why is that better than linear
planning?  Illustrate resolving threats with promotion and demotion.
What other problems do planning have in real world situations? How do we
cope with uncertainties?  Can you incorporate probability into POP? Any
other ways of using probability?  Do you get a plan out of a Markov 
decision process? How do you find such policy?

Q5: What is version space? Compare version space to neural nets,
advantages and disadvantages."