Prediction using linearly parametrized function classes is a topic of fundamental importance to machine learning. The study of the generalization properties of linear prediction occupies an important place within this topic. In this talk, I will present sharp Rademacher and Gaussian complexity bounds for a large family of linearly parametrized function classes. These complexity measures play a key role in deriving generalization bounds. These bounds are derived using conjugate duality, a key tool from convex analysis. They make short work of providing a number of corollaries including: risk bounds for linear prediction (including settings where the weight vectors are constrained by either L_2 or L_1 constraints), margin bounds (including both L_2 and L_1 margins, along with more general notions based on relative entropy), a proof of the PAC-Bayes theorem, and L_2 covering numbers (with L_p norm constraints and relative entropy constraints). In addition to providing a unified analysis, these results provide some of the sharpest risk and margin bounds (improving upon a number of previous results by logarithmic factors). Interestingly, our results show that the uniform convergence rates of empirical risk minimization algorithms tightly match the regret bounds of online learning algorithms for linear prediction (up to a constant factor of 2).
(Joint work with Sham M. Kakade and Karthik Sridharan)