SBA-term: Sparse Bilingual Association for Terms

  • Authors: Xinyu Dai and Jinzhu Jia and Laurent El Ghaoui and Bin Yu.

  • Abstract: Bilingual semantic term association is very useful in cross-language information retrieval, statistical machine translation, and many other applications in natural language processing. In this paper, we present a method, named SBA-term, which applies sparse linear regression (Lasso, Least Squares with l_1 penalty) and l^{2} rescaling for design matrix to the task of bilingual term association. The approach hinges on formulating the task as a feature selection problem within a classification framework. Our experimental results indicate that our novel proposed method is more efficient than co-occurrence at extracting relevant bilingual terms semantic associations. In addition, our approach connects the vibrant area of sparse machine learning to an important problem of natural language processing.

  • Bibtex reference:

@inproceedings{DJEB:11,
   Author = {X. Dai and J. Jia and L. {El Ghaoui} and B. Yu},
   Title = {{SBA}-term: Sparse Bilingual Association for Terms},
   BookTitle = {Fifth IEEE International Conference on Semantic Computing},
   Address= {Palo Alto, CA, USA},
	 Month = sep,
   Year = {2011}
}