Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences

COLLEGE OF ENGINEERING

UC Berkeley

Markov Decision Processes with Multiple Long-run Average Objectives

Krishnendu Chatterjee

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2007-105
August 22, 2007

http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-105.pdf

We consider Markov decision processes (MDPs) with multiple long-run average objectives. Such MDPs occur in design problems where one wishes to simultaneously optimize several criteria, for example, latency and power. The possible trade-offs between the different objectives are characterized by the Pareto curve. We show that every Pareto optimal point can be \epsilon-approximated by a memoryless strategy, for all \epsilon>0. In contrast to the single-objective case, the memoryless strategy may require randomization. We show that the Pareto curve can be approximated (a) in polynomial time in the size of the MDP for irreducible MDPs; and (b) in polynomial space in the size of the MDP for all MDPs. Additionally, we study the problem if a given value vector is realizable by any strategy, and show that it can be decided in polynomial time for irreducible MDPs and in NP for all MDPs. These results provide algorithms for design exploration in MDP models with multiple long-run average objectives.


BibTeX citation:

@techreport{Chatterjee:EECS-2007-105,
    Author = {Chatterjee, Krishnendu},
    Title = {Markov Decision Processes with Multiple Long-run Average Objectives},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2007},
    Month = {Aug},
    URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-105.html},
    Number = {UCB/EECS-2007-105},
    Abstract = {We consider Markov decision processes (MDPs) with multiple long-run average objectives. Such MDPs occur in design problems where one wishes to simultaneously optimize several criteria, for example, latency and power. The possible trade-offs between the different objectives are characterized by the Pareto curve. We show that every Pareto optimal point can be \epsilon-approximated by a memoryless strategy, for all \epsilon>0. In contrast to the single-objective case, the memoryless strategy may require randomization. We show that the Pareto curve can be approximated (a) in polynomial time in the size of the MDP for irreducible MDPs; and (b) in polynomial space in the size of the MDP for all MDPs. Additionally, we study the problem if a given value vector is realizable by any strategy, and show that it can be decided in polynomial time for irreducible MDPs and in NP for all MDPs. These results provide algorithms for design exploration in MDP models with multiple long-run average objectives.}
}

EndNote citation:

%0 Report
%A Chatterjee, Krishnendu
%T Markov Decision Processes with Multiple Long-run Average Objectives
%I EECS Department, University of California, Berkeley
%D 2007
%8 August 22
%@ UCB/EECS-2007-105
%U http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-105.html
%F Chatterjee:EECS-2007-105