Efficient Internet Video Streaming

In the current deployment of the Internet, switches in the network are typically oblivious to the structure or content of the packets that they process and switch. Typical network switches provide only a simple first-in-first-out (FIFO) queueing/scheduling policy, and discard incoming packets when output queues are full. Moreover, a network switch does not provide explicit congestion feedback to the senders when the queue starts to fill up. Thus, the sender typically estimates the available connection capacity by ``probing'' the network (i.e. periodically incrementing the transmission rate) till some network switch drops a packet due to queue overflow. Upon detecting this loss (via end-to-end feedback from the receiver), the sender throttles the transmission rate by a typically large multiplicative factor. This simplistic network design and capacity estimation strategy is ill-matched to, and adversely impacts, video packet flows because of two important reasons.

First, as alluded to, typical video-encoded bitstreams are highly structured, e.g., characterized by a natural hierarchy of importance layers or resolutions. Examples of ``highly important'' encoded video signal descriptors include motion information used to reduce temporal redundancy, anchor-frame (e.g. MPEG I-frames) data, ``coarse'' information such as DC and low-frequency AC transform data etc. Examples of ``less critical'' descriptors include motion-prediction error high-frequency ``detail,'' etc. Secondly, the connection experiences non-negligible packet losses and consequent rate variations even if the network resources available for the connection remain invariant. This can lead to a potentially high variance in the delivered video quality.

We see that there is an inherent personality mismatch between the source coding algorithms -- that are priority-oriented or multiresolution (MR) in character -- and the network layer mechanisms in the Internet that are not endowed with the ``smarts'' to discern prioritized classes. This therefore underlines the need for an efficient ``transcoding'' mechanism that converts the scalable MR-based prioritized video bitstream into a non-prioritized one that is better matched to the Internet. We propose a novel way of doing this conversion based on Multiple Description (MD) coding principles, anchored on Forward Error Correction (FEC) channel codes. While MR-to-MD transcoding addresses the issue of the sensitivity of a packet flow to the relative position of packet losses, we also need to address the issue of fluctuating rate.

From the end user's point of view, a multimedia application that delivers a medium but constant quality is generally more desirable compared to one that provides quality that is high in amplitude but exhibits large variations over time. In this paper, we propose a congestion control algorithm that achieves low rate fluctuations when the connection capacity is invariant, as well as quick response to sudden changes in the connection capacity (Linear Increase, Graded Multiplicative Decrease). The novelty of our congestion control algorithm is that it achieves lower fluctuation (hence higher throughput or average transmission rate) than the state-of-the-art congestion control algorithms without compromising the responsiveness to the onset of congestion. The novelty of our approach lies in the fact that the source transcoding module and the congestion control module exchange connection state in a very simple manner, and yet the transcoder uses this information to generate an MD stream that results in the optimal expected quality at the receiver in spite of the dynamically fluctuating connection capacity. A block diagram of the end-end streaming system follows.




A Block Diagram of the End-End Streaming System