Excess Risk Bounds for Multi-Task Learning
Ambuj Tewari and Peter Bartlett
The idea that it should be easier to learn several tasks if they are related in some way is quite intuitive and has been found to work in many practical settings. There has been some interest in obtaining theoretical results to better understand this phenomenon (e.g., [1,2]). Maurer  considers the case when the "relatedness" of the tasks is captured by requiring that all tasks share a common "preprocessor." Different linear classifiers are learned for the tasks where these classifiers all operate on the "preprocessed" input. Maurer obtains dimension-free and data-dependent bounds in this setting.
We study this setting when we also have a loss function that is used to measure the performance of the selected classifiers. Our aim is to obtain bounds for the difference between the average risk (i.e., expected loss) per task of the classifiers learned from the data and the least possible value of the average risk per task. We wish to derive tight bounds analogous to those obtained in a single-task setting .
- A. Maurer, "Bounds for Linear Multi-Task Learning," Journal of Machine Learning Research 7, 2006, pp. 117-139.
- J. Baxter, "A Model of Inductive Bias Learning," Journal of Artificial Intelligence Research 12, 2000, pp. 149-198.
- P. L. Bartlett, O. Bousquet, and S. Mendelson, "Local Rademacher Complexities," Annals of Statistics, Vol. 33, No. 4, 2005, pp. 1497-1537.