Projects

 

Fall 2011 - Spring 2012

Topic
clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs
Content
This is the first framework from the OLOV project. It can be used in iterative sparse solvers, including eigensolver, linear solver, optimization methods. The framework is implemented in OpenCL, and it is optimized on both Nvidia and AMD platforms. SpMV is a memory bounded computation, so a very important step of optimizing it is to find the best representation of a sparse matrix. The clSpMV framework proposes the Cocktail Format, which is a combination of multiple different formats, to represent specialized submatrices using specialized formats. The performance is 83% higher than the vendor optimized code, and 17% higher than all single sprase matrix formats. The source code more information can be accessed here.

Fall 2011 - Current

Topic
OLOV: OpenCL for OpenCV
Content
Based on the experience on computer vision applications, we are developing OpenCV modules using OpenCL for cross-platform considerations. To differentiate from the current CUDA efforts happening in OpenCV, we only targeted computationally intensive algorithms such as eigenvalue and eigenvector decomposition.

Fall 2010 - Spring 2011

Topic
Object Recognition on Mobile Platforms
Content
The computing capability of mobile platforms is evolving in a fast pace. In this project, we implemented the region based object recognition system on mobile platforms. We analyzied the performance difference among a selection of desktop platforms and mobile platforms, and explained the results by comparing the processor speed, memory bandwidth, and power consumption of different platforms. Finally, we proposed the ideal cloud-client collaboration model for the object recognition system.

Summer 2009 - Spring 2010

Topic
A Parallel Object Recognition System Based on Region Matchings
Content
The region based object recognition system developed by Gu et al. achieves the highest quality on the ETHZ shape benchmark. However, it is time consuming and spends about 6.5 minutes for classifying an image with 0.15 mega pixels. The training procedure takes about 40 minutes even if the segmentation on all training images are given beforehand. We parallelized the entire system, and achieved 110-120x speedup on both the classification and training stages.
Fall 2008 - Spring 2009
Topic
Damascene: An Efficient, High Quality Image Conotur Detector
Content
Current leader of the image contour detection problem requires 4 minutes to detect the contour of an image with 0.15 mega pixels. Damascene is the tool that we developed for the image contour detection problem. It takes only 1.8 seconds for images with the same size and achieves the same accuracy as the leading algorithm.
Summer 2008
Topic
Partitioning for Parallelizing Post-Layout Timing Optimization
Content

The saturation of single-thread performance on microprocessors will drive computationally intensive applications toward parallel implementations. Post-layout timing optimization is a computation demanding problem which typically takes several hours to finish on a design with 0.5 million cells. Multiple problem instances may be run on distributed computers; however, the only way to further improve the computational performance on the problem is to explore parallel algorithms. We believed the performance of fine-grained parallel timing optimization will depend on the quality of the partitions induced on the circuit graphs. In this project, we showed how the partitions influence the parallel timing optimization algorithms, and proposed several partitioning algorithms that will help the timing optimization algorithm run on future platform with hundreds of processing units. A pleasant surprise is that these parallel approaches also improve final circuit quality relative to sequential approaches.