Surrogate loss functions, divergences and decentralized detection
Most of the machine learning literature on detection and
is abstracted away from considerations of an underlying
communication-theoretic infrastructure, constraints from which may
prevent an algorithm from aggregating all relevant data at a central
site. In many real-life applications, however, resource limitations
make it necessary to transmit only partial descriptions of data.
Examples include sensor networks, in which each sensor operates
under power or bandwidth constraints.
In this talk, I shall describe an algorithmic framework for
decentralized detection from empirical data. In constrast to
work on classification, we need to learn both quantization rules at
individual sensors and a classification rule at the central
The key ingredients of our framework are the use of surrogate
functions and marginalized kernels, which result in a
efficient learning algorithm.
In the second part of the talk, I'll show that there is a class of
surrogate convex losses for which our learning procedure is
it is guaranteed to find optimal decision rules with respect to
the 0-1 loss.
This is due to a precise correspondence between surrogate loss
and a class of divergence functionals known as Ali-Silvey
f-divergences. This correspondence has implications far beyond the
decentralized detection setting. In particular, it motivates a
nonparametric M-estimation method for estimating f-divergence
and the density ratio of two probability distributions, given only
the empirical data. I shall describe theoretical properties and an
empirical evaluation of this estimator.