# Practical Distributed Source Coding and Its Application to the Compression of Encrypted Data

### Daniel Hillel Schonberg

###
EECS Department

University of California, Berkeley

Technical Report No. UCB/EECS-2007-93

July 25, 2007

### http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-93.pdf

Distributed source codes compress multiple data sources by leveraging the correlation between those sources, even without access to the realization of each source, through the use of a joint decoder. Though distributed source coding has been a topic of research since the 1970s (Slepian and Wolf, 1973), there has been little practical work. In this thesis, we present a practical and flexible framework for distributed source coding and its applications. Only recently, as applications have arisen, did practical work begin in earnest. The applications range from dense sensor networks to robust video coding for wireless transmission to the compression of encrypted data. Unfortunately, most published work offering practical codes consider only a limited scenario. They focus on simple idealized data models, and they assume a priori knowledge of the underlying joint statistics. These assumptions are made to ease the description and analysis of their solutions. We argue though that a full solution requires a construction flexible enough to be adapted to real-world distributed source coding scenarios. This solution must be able to accommodate any source and unknown statistics. In this thesis, we develop practical distributed source codes, applicable to idealized sources and real world sources such as images and videos, that are rate adaptable to all scenarios. This thesis is broken into two halves. In the first half we discuss analytic considerations for generating practical distributed source codes. We begin by assuming a priori knowledge of the underlying source statistics, and then consider source coding with side information, a special case of distributed source coding. As a basis for our solution, we develop codes for independent and identically distributed (i.i.d.) sources and then adapt the codes to parallel sources. We then generalize this framework to a complete distributed source coding construction, flexible enough for arbitrary scenarios and adaptable to important special cases. Finally we conclude this half by eliminating our assumption of a priori knowledge of the statistics. We discuss a protocol for rate adaptation, ensuring that our codes can operate blind of the source statistics. Our protocol converges rapidly to the entropy-rate of a stationary source. In the second half of this thesis, we consider distributed source coding of real world sources. As a motivating application, we consider the compression of encrypted images and video. Since encryption masks the source, traditional compression algorithms are ineffective. However, through the use of distributed source-coding techniques, the compression of encrypted data is possible. It is possible to reduce data size without requiring data be compressed prior to encryption. We develop algorithms for the practical lossless compression of encrypted data. Our methods leverage the statistical correlations of the source, even without direct access to their instantiations. For video, we compare our performance to a state-of-the-art motion-compensated lossless video encoder that requires unencrypted video as input. It compresses each unencrypted frame of the ``Foreman'' test video sequence by 59% on average. In comparison, our proof-of-concept implementation, working on encrypted data, compresses the same sequence by 33%.

**Advisor:** Kannan Ramchandran

BibTeX citation:

@phdthesis{Schonberg:EECS-2007-93, Author = {Schonberg, Daniel Hillel}, Title = {Practical Distributed Source Coding and Its Application to the Compression of Encrypted Data}, School = {EECS Department, University of California, Berkeley}, Year = {2007}, Month = {Jul}, URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-93.html}, Number = {UCB/EECS-2007-93}, Abstract = {Distributed source codes compress multiple data sources by leveraging the correlation between those sources, even without access to the realization of each source, through the use of a joint decoder. Though distributed source coding has been a topic of research since the 1970s (Slepian and Wolf, 1973), there has been little practical work. In this thesis, we present a practical and flexible framework for distributed source coding and its applications. Only recently, as applications have arisen, did practical work begin in earnest. The applications range from dense sensor networks to robust video coding for wireless transmission to the compression of encrypted data. Unfortunately, most published work offering practical codes consider only a limited scenario. They focus on simple idealized data models, and they assume a priori knowledge of the underlying joint statistics. These assumptions are made to ease the description and analysis of their solutions. We argue though that a full solution requires a construction flexible enough to be adapted to real-world distributed source coding scenarios. This solution must be able to accommodate any source and unknown statistics. In this thesis, we develop practical distributed source codes, applicable to idealized sources and real world sources such as images and videos, that are rate adaptable to all scenarios. This thesis is broken into two halves. In the first half we discuss analytic considerations for generating practical distributed source codes. We begin by assuming a priori knowledge of the underlying source statistics, and then consider source coding with side information, a special case of distributed source coding. As a basis for our solution, we develop codes for independent and identically distributed (i.i.d.) sources and then adapt the codes to parallel sources. We then generalize this framework to a complete distributed source coding construction, flexible enough for arbitrary scenarios and adaptable to important special cases. Finally we conclude this half by eliminating our assumption of a priori knowledge of the statistics. We discuss a protocol for rate adaptation, ensuring that our codes can operate blind of the source statistics. Our protocol converges rapidly to the entropy-rate of a stationary source. In the second half of this thesis, we consider distributed source coding of real world sources. As a motivating application, we consider the compression of encrypted images and video. Since encryption masks the source, traditional compression algorithms are ineffective. However, through the use of distributed source-coding techniques, the compression of encrypted data is possible. It is possible to reduce data size without requiring data be compressed prior to encryption. We develop algorithms for the practical lossless compression of encrypted data. Our methods leverage the statistical correlations of the source, even without direct access to their instantiations. For video, we compare our performance to a state-of-the-art motion-compensated lossless video encoder that requires unencrypted video as input. It compresses each unencrypted frame of the ``Foreman'' test video sequence by 59% on average. In comparison, our proof-of-concept implementation, working on encrypted data, compresses the same sequence by 33%.} }

EndNote citation:

%0 Thesis %A Schonberg, Daniel Hillel %T Practical Distributed Source Coding and Its Application to the Compression of Encrypted Data %I EECS Department, University of California, Berkeley %D 2007 %8 July 25 %@ UCB/EECS-2007-93 %U http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-93.html %F Schonberg:EECS-2007-93