Learning Semantic Image Representations at a Large Scale

Yangqing Jia

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2014-93
May 16, 2014

http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-93.pdf

I present my work towards learning a better computer vision system that learns and generalizes object categories better, and behaves in ways closer to what human behave. Specifically, I focus on two key components of such a system: learning better features, and revisiting existing problem statements. For the first component, I propose and analyze novel receptive field learning and dictionary learning methods, mathematically justified by the Nystrom sampling theory, that learn more compact and effective features for object recognition tasks. For the second component, I propose to combine otherwise independently developed computer vision and cognitive science studies, and present the first large-scale system that allows computers to learn and generalize closer to what a human learner will do. I also provide a large-scale human behavior database, which will hopefully enable further research along this research direction.

Following the recent success of convolutional neural networks, I present and release a well-engineered framework for general deep learning research, and provide an extensive analysis on the generality of deep features learned from the state-of-the-art CNN pipeline: whether they serve as a general-purpose visual descriptor that could be adopted in various applications, and future research directions made possible by such general features.

Advisor: Trevor Darrell


BibTeX citation:

@phdthesis{Jia:EECS-2014-93,
    Author = {Jia, Yangqing},
    Title = {Learning Semantic Image Representations at a Large Scale},
    School = {EECS Department, University of California, Berkeley},
    Year = {2014},
    Month = {May},
    URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-93.html},
    Number = {UCB/EECS-2014-93},
    Abstract = {I present my work towards learning a better computer vision system that learns and generalizes object categories better, and behaves in ways closer to what human behave. Specifically, I focus on two key components of such a system: learning better features, and revisiting existing problem statements. For the first component, I propose and analyze novel receptive field learning and dictionary learning methods, mathematically justified by the Nystrom sampling theory, that learn more compact and effective features for object recognition tasks. For the second component, I propose to combine otherwise independently developed computer vision and cognitive science studies, and present the first large-scale system that allows computers to learn and generalize closer to what a human learner will do. I also provide a large-scale human behavior database, which will hopefully enable further research along this research direction.

Following the recent success of convolutional neural networks, I present and release a well-engineered framework for general deep learning research, and provide an extensive analysis on the generality of deep features learned from the state-of-the-art CNN pipeline: whether they serve as a general-purpose visual descriptor that could be adopted in various applications, and future research directions made possible by such general features.}
}

EndNote citation:

%0 Thesis
%A Jia, Yangqing
%T Learning Semantic Image Representations at a Large Scale
%I EECS Department, University of California, Berkeley
%D 2014
%8 May 16
%@ UCB/EECS-2014-93
%U http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-93.html
%F Jia:EECS-2014-93