Imagine that we wish to train classifiers for related tasks in a high dimensional space but that we only have a few labelled samples per task. To overcome this limitation we might have labelled data for hundreds of related tasks that share relevant features. The focus of this work is on multi-task learning methods that aim at exploiting this fact to successfully learn hundreds of tasks for which we only have a few labelled samples.
Recent approaches to multi-task learning have investigated the use of a variety of matrix norm regularization schemes for promoting feature sharing across tasks. In essence, these approaches aim at extending the l1 framework for sparse single task approximation to the multi-task setting.
In this talk I will focus on the computational aspects of training a jointly regularized model and propose an optimization algorithm whose complexity is O(n log n) with n being the number of parameters of the joint model. Our algorithm is based on setting jointly regularized loss minimization as a convex constrained optimization problem for which we develop an efficient projected gradient algorithm.