Recovery of discrete data corrupted by discrete noise is an increasingly encountered problem in contexts as diverse as digital communications, bio-molecular sequence analysis, and the Internet. Perfect recovery is sometimes possible but, more generally, performance is measured under a given fidelity criterion.
This talk will focus on the often realistic scenario where the corruption mechanism is a "discrete memoryless channel", meaning that the components of the corrupted data are independent given the underlying clean data. I will introduce an algorithm for this setting which performs essentially as well as a genie that has access, in addition to the noisy data, also to the underlying clean data, and can choose to switch between "sliding-window" denoisers in a way that optimizes the overall performance. This will be shown to be the case in several strong statistical senses, within a "semi-stochastic" setting where the underlying clean data are deterministic and the only randomness is due to the noise. Further, when the clean data form a piecewise stationary stochastic process or field, the algorithm achieves the optimum distribution-dependent performance. These performance guarantees are contingent on a certain growth rate condition that must be imposed on the number of switches, which is necessary in the sense that any scheme fails to compete in the above senses when the condition does not hold.
The key issue in implementing our scheme is to efficiently learn the best segmentation of the data and the associated denoisers that the genie is using, based solely on the noisy data. We will describe our approach to this problem, which results in a practical algorithm: implementable with complexity (time and memory) growing linearly with the data size and the number of switches. Preliminary experimental results suggest that the new scheme has the capacity to improve on prior art in applications where the nature of the data abruptly changes in time (or space), as is often the case in practice. I will conclude with a discussion of some remaining challenges.
Joint work with Professor Tsachy Weissman (Stanford University).