Research Projects

[Back to Homepage]

Hashing Algorithms for Image Search
Metric Learning
Scalable Semidefinite Programming
Graph Clustering and Image Segmentation
Topic Tracking


Hashing Algorithms for Image Search


A common problem in large-scale data is that of quickly extracting nearest neighbors to a query from a large database. In computer vision, for example, this problem arises in content-based image retrieval, 3-d image reconstructions, human body pose estimation, and object recognition problems.

My work has developed algorithms for quickly and accurately performing large-scale image searches using hashing techniques. We developed algorithms for incorporating hashing methods for learned metrics (CVPR 2008) as well as for performing locality-sensitive hashing over arbitrary kernel functions (ICCV 2009), two prominent scenarios arising in modern computer vision applications. We have applied our algorithms to several large-scale data sets, including the 80 million images of the Tiny Image data set.

Some representative publications:

  • Kernelized Locality-Sensitive Hashing for Scalable Image Search
    Brian Kulis & Kristen Grauman
    In Proc. 12th International Conference on Computer Vision (ICCV), 2009.
    [pdf] [webpage and code]

  • Fast Image Search for Learned Metrics
    Prateek Jain, Brian Kulis, & Kristen Grauman
    In. Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008.
    CVPR 2008 Best Student Paper Award
    [pdf]

  • Metric Learning


    Metric learning is the problem of learning an appropriate distance function for a given task. For example, similarity searches over large-scale data rely on the underlying metric, which may change from task to task or over time. A majority of existing methods for metric learning learn a Mahalanobis distance, but are limited to low-dimensional and/or small data sets, making them impractical for many real-world problems.

    In my recent work, I have shown how to learn Mahalanobis metrics in a variety of settings. These include: learning metrics for high-dimensional data and non-linear transformations via kernelization (ICML 2007), learning metrics in online scenarios (NIPS 2008), and learning metrics for large-scale data (CVPR 2008). This work has been applied in conjunction with hashing techniques to a variety of problems in computer vision.

    Some of our metric learning code is available here.

    Some representative publications:

  • Low-Rank Kernel Learning with Bregman Matrix Divergences
    Brian Kulis, Matyas Sustik, & Inderjit Dhillon
    Journal of Machine Learning Research, 10 (Feb): 341--376, 2009.
    [pdf]

  • Online Metric Learning and Fast Similarity Search
    Prateek Jain, Brian Kulis, Inderjit Dhillon, & Kristen Grauman
    In Neural Information Processing Systems (NIPS), 2008.
    Oral Presentation: 2.7% Acceptance Rate
    [pdf], Longer version: [pdf]

  • Information-Theoretic Metric Learning
    Jason Davis, Brian Kulis, Prateek Jain, Suvrit Sra, & Inderjit Dhillon
    In Proc. 24th International Conference on Machine Learning (ICML), 2007.
    ICML 2007 Best Student Paper Award
    [pdf] [code]

  • Scalable Semidefinite Programming


    Semidefinite programming arises in a number of machine learning problems, including recommender systems, graph cut problems, metric learning, and others. One of the disadvantages to semidefinite programming is the lack of scalability to large problems.

    I have studied algorithms for scaling semidefinite programming to large data sets, with a focus on semidefinite programming problems arising in practical machine learning applications. With colleagues at Microsoft Research, I developed algorithms for low-rank semidefinite programming (SDPs with an additional low-rank constraint) for embedding and clustering problems (AISTATS 2007), and with colleagues at UT Austin and Max Planck Institute, I developed a general, scalable first-order SDP algorithm by analyzing convex perturbations for SDPs (AISTATS 2009).

    Some representative publications:

  • Convex Perturbations for Scalable Semidefinite Programming
    Brian Kulis, Suvrit Sra, & Inderjit Dhillon
    In Proc. 12th Intl. AISTATS Conference, 2009.
    [pdf]

  • Fast Low-Rank Semidefinite Programming for Embedding and Clustering
    Brian Kulis, Arun Surendran, & John Platt
    In Proc. 11th Intl. AISTATS Conference, 2007.
    [pdf]

  • Graph Clustering and Image Segmentation


    Spectral clustering partitions the nodes of a graph by minimizing one of several graph cut objectives; it has received a significant amount of attention recently and is used in numerous applications. I worked on Graclus, the first algorithm for optimizing spectral clustering objectives that does not use any eigenvector computation. This is possible due to a mathematical connection between spectral clustering and weighted kernel k-means. Our resulting multilevel algorithm outperforms the best existing spectral methods in terms of quality, speed, and memory usage. Applications include fast image segmentation, speech segmentation, and gene network clustering.

    Graclus is available for download here.

    Some representative publications:

  • Weighted Graph Cuts without Eigenvectors: A Multilevel Approach
    Inderjit Dhillon, Yuqiang Guan, & Brian Kulis
    IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 11, pp. 1944--1957, 2007.
    [pdf] [code]

  • Semi-Supervised Graph Clustering: A Kernel Approach
    Brian Kulis, Sugato Basu, Inderjit Dhillon, & Raymond Mooney
    In Proc. 22nd International Conference on Machine Learning (ICML), 2005.
    ICML 2005 Best Student Paper Award
    [pdf]

  • Kernel k-means, Spectral Clustering, and Normalized Cuts
    Inderjit Dhillon, Yuqiang Guan, & Brian Kulis
    In Proc. 10th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2004.
    [pdf]

  • Topic Tracking


    As an undergraduate, I worked on tracking topics in networked data over time. Our goal was to understand how research topics in computer science evolve by analyzing the citation structure of computer science publication networks. We developed an algorithm based on finding natural communities---clusters that are insensitive to small perturbations. We applied our techniques to the CiteSeer network.

    Some representative publications:

  • Tracking Evolving Communities in Large Linked Networks
    John Hopcroft, Omar Khan, Brian Kulis, & Bart Selman
    Proc. of the National Academy of Sciences, vol. 101, pp. 5249--5253, April 2004.
    [pdf]

  • Natural Communities in Large Linked Networks
    John Hopcroft, Omar Khan, Brian Kulis, & Bart Selman
    In Proc. 9th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2003.
    [pdf]