Algorithms and Frameworks for OCR and Content-based Image Retrieval
Jike Chong, Bryan Christopher Catanzaro, Narayanan Sundaram, Fares Hedayati and Kurt Keutzer
A new breed of general-purpose manycore computing platform is emerging. Exemplary examples include the Niagara from Sun Microsystems, the G80 from Nvidia, the Cell from IBM, and the up-coming Larrabee from Intel. These manycore processors each pack 8-32 relatively simple cores on a chip, capable of supporting up to 100s of threads, and boast tremendous potential peak single-chip performances up to the range of Tera-FLOPS. However, traditional algorithms and applications in many domains cannot take advantage of much of the parallelism provided by these platforms.
The ParLab at Berkeley was recently founded in part to help meet this acute need for novel algorithmic approaches to unleash the performance potentials of emerging manycore platforms for a wide range of application domains. It proposes to concentrate on analyzing the communication and computation patterns (or Dwarfs) of important classes of algorithms underlying modern application domains, and develop techniques to efficiently parallelize them for the general purpose manycore platforms.
We concentrate on the domain of image recognition and retrieval, leveraging the Intel PIRO content-based image retrieval framework as a motivating application. Specifically, we study the parallelization of classification algorithms for machine learning, and develop parallelization techniques to improve the performance of these algorithms and applications on the emerging manycore platforms.