Surrogate loss functions, divergences and decentralized detection
Abstract
Most of the machine learning literature on detection and
classification
is abstracted away from considerations of an underlying
communication-theoretic infrastructure, constraints from which may
prevent an algorithm from aggregating all relevant data at a central
site. In many real-life applications, however, resource limitations
make it necessary to transmit only partial descriptions of data.
Examples include sensor networks, in which each sensor operates
under power or bandwidth constraints.
In this talk, I shall describe an algorithmic framework for
nonparametric
decentralized detection from empirical data. In constrast to
classical
work on classification, we need to learn both quantization rules at
individual sensors and a classification rule at the central
coordinator.
The key ingredients of our framework are the use of surrogate
convex loss
functions and marginalized kernels, which result in a
computationally
efficient learning algorithm.
In the second part of the talk, I'll show that there is a class of
surrogate convex losses for which our learning procedure is
consistent, i.e.,
it is guaranteed to find optimal decision rules with respect to
the 0-1 loss.
This is due to a precise correspondence between surrogate loss
functions
and a class of divergence functionals known as Ali-Silvey
distances or
f-divergences. This correspondence has implications far beyond the
decentralized detection setting. In particular, it motivates a
nonparametric M-estimation method for estimating f-divergence
functionals
and the density ratio of two probability distributions, given only
the empirical data. I shall describe theoretical properties and an
empirical evaluation of this estimator.
Maintained by:
Fei Sha