Auditory researchers believe that the human auditory system computes many different representations of sound, reflecting different time and frequency resolutions. However, automatic speech recognition systems tend to be based on a single representation of the short-term speech spectrum.
We are attempting to improve the robustness of automatic speech recognition systems by using a set of two-dimensional Gabor filters with varying extents in time and frequency and varying ripple rates to analyze a spectrogram . These filters have some characteristics in common with the responses of neurons in the auditory cortex of primates, and can also be seen as two-dimensional frequency analyzers.
Promising results have been obtained in a noisy digit recognition task , especially when this analysis method was combined with more conventional analysis. Work is ongoing in the use of this approach for larger-vocabulary recognition tasks, and in the use of the Gabor filters in a multi-stream, multi-classifier architecture.
When a data source is to be transmitted across an insecure, bandwidth-constrained channel, the standard solution is to first compress the data and then encrypt it. We examine the problem of reversing the order of these steps, first encrypting and then compressing. Such a scheme could be used in a scenario where the data generator and the compressor are not co-located, and the link between them is vulnerable to eavesdropping.
We encrypt a binary data sequence by forming the mod-2 sum with a pseudo-random binary key. Because the decoder will also have access to the same key, we can compress the encrypted data using distributed source coding principles. The DISCUS framework  provides a constructive method for generating codes that approach the Slepian-Wolf bound. By modifying the DISCUS decoding procedure, we should be able to incorporate more complex encryption algorithms into our scheme.
Figure 1: Secure compression framework
We address the problem of compressing audio sources that are noisy filtered versions of the same audio signal. Exploiting the correlations between the remote sources will lead to better quality while maintaining the same transmission rate, as compared to a scheme that neglects these correlations. The objective is to develop algorithms for distributed compression of these correlated audio sources to attempt to achieve the gains predicted in theory . The algorithms are based on the distributed source coding using syndromes (DISCUS)  framework and incorporate the use of perceptual masks, which are common in the compression of audio signals. We are currently in the process of implementing our algorithm. Initial results have been very encouraging.
We expect our algorithms to operate in a very power- and bandwidth-constrained environment. As such, our algorithms are designed to operate in medium to low bit rate regimes. Furthermore, we allow little or no communication among the sensors. Since the sensors may also have stringent power requirements and low computing power, the algorithms need to have sufficiently low complexity and also need to be optimized for the specific architectures of the audio sensors.
Figure 1: System set-up
We consider the problem of seamless distribution of multimedia data over high bandwidth wireless networks. Specifically, we focus on the popular IEEE 802.11b wireless LAN. In addition to the wireless channel related challenges, such as fluctuations in channel quality and high bit error rate, there is also the problem of heterogeneity among receivers, since each user will have different channel conditions, power limitations, processing capabilities, etc. We investigate novel algorithms for real-time multimedia transmission in such scenarios. This problem is complicated due to the close interaction of system parameters including source coding, channel coding, networking, transport layers, etc. Here, we address an integrated, end-to-end implementation of a multimedia dissemination system, rather than optimizing a component in isolation. To tackle this problem, we use progressive source coding together with the unequal error protection ideas presented in [1,2] to make the video bit stream insensitive to the position of packet loss. For the multicast case, we formulate the problem as an optimization of a maximum regret cost function across the multicast user space . We have developed a modular server that works on a Linux platform and presents a transparent interface that enables it to support application layer error resilience with a wide range of video coders and transport protocols. We have successfully integrated the server with a real-time version of the packetization and transcoding ideas presented in . We multicast the video stream to receivers with different channel characteristics and observe graceful quality degradation with the variation in packet loss rate and bandwidth. The end-to-end system operates in the 3 Mb/s regime and can be configured to run at higher rates. For details please refer to .
In this project, we address the problem of streaming real-time video from multiple servers in a distributed manner. We invoke a scalable video encoding format (e.g., H.263+, MPEG-4 FGS (fine granular scalability)), and use a network-friendly distributed streaming algorithm. The interface between the video content and the network is provided by a distributed packetization strategy. This packetization approach uses an FEC-based multiple description coding framework which is rate distortion efficiently matched dynamically to both the video source content and the network channel conditions.
Due to the diversity effect of multiple servers, our distributed algorithm is immune to single points of failure, provides natural load-balancing on the multiple servers, and provides a graceful degradation in quality with an increase in multi-server load or a decrease in network throughput.
We have developed an algorithm to deliver a nearly constant perceptual video quality to the client by streaming simultaneously from multiple servers leading to a satisfying user experience. The algorithm accomplishes this task while minimizing the net aggregate bandwidth used from all the servers put together. The input video sequence is assumed to be a scalable video bit-stream. Our solution is inherently distributed, so that the servers need not communicate for encoding purposes. Synchronization among the servers is achieved through the receiver.
We have implemented our distributed video streaming algorithm and are currently running tests over the Internet. We are in the process of developing solutions for multicast. For details please refer to .
Figure 1: Block diagram of the VISDOM system
Wavelet thresholding has been a popular technique for denoising since the seminal work by Donoho and Johnstone . The basic principle of wavelet thresholding is to identify and zero out wavelet coefficients of a signal which are likely to contain mostly noise. By preserving the most significant coefficients, wavelet thresholding preserves important highpass features of a signal such as discontinuities. This property is useful, for example, in image denoising to maintain the sharpness of the edges in an image .
It has long been recognized that the periodic shift-invariance of the wavelet transform implies that denoising by wavelet thresholding is itself periodically shift-invariant. Thus, various choices of shifting, denoising, and shifting back give different denoised estimates, and there is no simple way to choose the best from among these estimates. Coifman and Donoho  introduced cycle spinning, a technique estimating the true signal as the linear average of individual estimates derived from wavelet-thresholded translated versions of the noisy signal.
In this work, it is demonstrated that such an average can be dramatically improved upon . In the conceptually simplest form, the proposed algorithm is to apply wavelet thresholding recursively by repeatedly translating, denoising the input via basic wavelet denoising, and then translating back. (Significant computational improvements are possible.) Exploiting the convergence properties of projections, the proposed algorithm can be regarded as a sequence of denoising projections that converge to the projection of the original noisy signal to a small subspace containing the true signal. Certain local and global converge properties are proven, and simulations on piecewise polynomial signals show marked improvement over both basic wavelet thresholding and standard cycle spinning. Figure 1 shows the improvement in denoising typically provided by our algorithm when the true signal is a piecewise polynomial. See  for details. Further work involves investigating various extensions to sets of wavelet bases and sets of observations.
Figure 1: Piecewise polynomial denoising results
In this work we present an error concealment scheme for damaged spatial blocks in image/video sequences. A promising error concealment approach in such a scenario is based on the projection onto convex sets (POCS)  method. If the two convex sets intersect, this algorithm guarantees a reconstructed block that will be smoothly connected with adjacent blocks without altering the correctly received adjacent blocks. If the two sets don't intersect, the algorithm oscillates between two points that are close to both properties. The POCS method has the additional benefit of edge continuity and preservation via edge detection and adaptive filters. However, one shortcoming of the algorithm is that it assumes a damaged block is surrounded by undamaged blocks. In this work we deal with the case where damaged blocks often are contiguous or located on the boundary of the image. This has potential application in some recently proposed low-complexity video coding algorithms [2,3] that rely on the coding with side information approach  to achieve compression as opposed to conventional motion compensation based approaches. We seek to generalize the POCS algorithm to incorporate these cases. Furthermore, we are investigating various methods of providing an initial estimate of the missing block for the POCS algorithm. Simple averaging techniques have been remarkably successful for individual missing blocks, but more elaborate methods are needed as the number of contiguous missing blocks increases.
In the context of bandlimited signals, there are two ways to sample a signal. One is a PCM method, in which the sampling rate is slightly above the Nyquist rate, and each sample is quantized with a precise A/D converter. To reduce the cost there are schemes which use a 1 bit A/D converter with oversampling. Recently, it was shown [1,2] that one could achieve similar accuracy from the oversampling and PCM methods for the same rate. (PCM accuracy is so far the best known result.) This indicates that one should be able to trade off the oversampling with the A/D precision. We have some partial results in this area. At first the dither-based schemes  were generalized to trade off average oversampling with the accuracy of an A/D converter. Also, the one-bit dither method and the PCM method turned out to be a special case of our scheme, so it connects two completely separate looking schemes as well. We also have a distributed and energy efficient algorithm for sampling a one-dimensional bandlimited field.
A further related problem that we are pursuing is the distributed sampling of finite rate innovation signals . It requires selection of suitable models, followed by schemes/algorithms and their analyses. Our main goals are good/best reconstruction and low energy consumption.
Figure 1: An example of conservation of bits principle
Connexions is a new approach to authoring, teaching, and learning that aims to fully exploit modern information technology. Available free of charge to anyone under open-content and open-source licenses, Connexions offers custom-tailored, current course material, is adaptable to a wide range of learning styles, and encourages students to explore the links among courses and disciplines.
In contrast to the traditional process of textbook writing and publishing, Connexions fosters world-wide, cross-institution communities of authors, instructors, and students, who collaboratively and dynamically fashion "modules" from which the courses are constructed.
We believe the ideas and philosophy embodied by Connexions have a potential that is pedagogically sound, both time and cost efficient, and fun.
Multiple description (MD) source coding has recently emerged as an attractive framework for robust multimedia transmission over unreliable channels, such as the Internet and the wireless medium. The basic idea in MD coding is to generate multiple descriptions of the source, such that each description independently describes the source with a certain fidelity, and when more than one description is available, they can be synergistically combined to enhance the quality.
In this work, we present new achievable rate regions and code constructions for the symmetric n-channel multiple descriptions (MD) coding problem for n>2. Our approach is inspired by unexplored connections between MD and the problem of distributed source coding.
We describe the underlying information-theoretic framework, and then formulate practical code constructions based on scalar/vector quantizers and linear channel codes to emulate the information theoretic performance.
We study the problem of seamless digital upgrade of analog transmission systems in an information-theoretic setting. Our notion of seamlessness is in two regards: (1) backward compatibility with existing analog receivers, and (2) no extra digital spectrum for the upgrade. In this work, we consider Gaussian sources with memory (parallel Gaussian source model) and arbitrarily colored Gaussian channels (parallel Gaussian channel model) with the goal of optimizing (under a squared-error metric) the delivered analog and digital signal quality under a fixed (total) transmission power constraint. We show that a constructive scheme consisting of the cascade of source coding with side information (SCSI) and channel coding with side information (CCSI) can be used to come up with the jointly optimal end-to-end delivered digital and analog signal qualities corresponding to the peak digital upgrade quality deliverable.
We address duality in a variety of multiple-input-multiple-output (MIMO) source and channel coding problems of interest under different scenarios of "one-sided" inter-terminal collaboration at either the transmitter or at the receiver, including certain cases of duality between (1) broadcast channel coding and distributed source coding, and (2) multi-access channel coding and multiple-descriptions source coding. Our notion of duality in this project is in a functional sense, where the optimal encoder mapping for a MIMO source coding problem becomes identical to the optimal decoder mapping for the dual MIMO channel coding problem, and vice versa. For ease of illustration we give the formulation only for two-input-two-output systems, which can be easily extended to the MIMO case. We present the precise mathematical conditions under which these encoder-decoder mappings are swappable in the two dual MIMO problems, identifying the key roles played by the source distortion and channel cost measures respectively in the MIMO source and channel coding problems in capturing this duality.
Since the first observation by Shannon in 1959  that source and channel coding problems can be studied as information-theoretic duals of each other, a number of researchers have made significant contribution to furthering this understanding, as noted in excellent textbooks such as [2,3]. In , this duality has been studied in the context of quantization and modulation. As multiuser information theory has gained significant interest recently, there has been a corresponding interest in extending this duality to these scenarios. Duality between source and channel coding with side information was first reported in , and later studied in .
A mathematical formulation of the functional duality between conventional point-to-point source and channel coding problems was given in , where the important roles of distortion and cost measures for these two problems were highlighted, inspired by seemingly unrelated work on the optimality of uncoded transmission in .
This concept was then extended to the case of source and channel coding with side information. The notion of duality addressed in  is in a functional sense, i.e., the optimal encoder-decoder mappings for one problem become the optimal decoder-encoder mappings for the dual problem. The precise conditions under which such swappable mappings are feasible involve dual relationships between distortion and cost measures in the source and channel coding problems respectively. Study of this functional duality serves two important purposes: (1) it provides new insights into these problems from the different perspectives of source and channel coding, and allows for cross-leveraging of advances in the individual fields; (2) more importantly, it provides a basis for sharing efficient constructions of the encoder and decoder functions in the two problems, e.g., through the use of structured algebraic codes, turbo-like codes, trellis-based codes, etc.
In this work we extend this notion of functional duality to more instances of MIMO source and channel coding problems. We study various MIMO structures admitting different scenarios of collaboration among multi-terminal inputs and/or outputs. (To keep the exposition simple, in this work we describe only two-input-two-output systems.) The collaboration scenarios we consider involve those where either the multi-terminal encoders or the multi-terminal decoders can collaborate, i.e., be joint, but not both. (The case of collaboration at both ends degenerates to point-to-point MIMO systems.) Under this one-sided collaboration abstraction, we address four problems of interest in source and channel coding: (1) distributed source coding; (2) broadcast channel coding; (3) multiple description source coding with no excess sum-rate; and (4) multiple access channel coding with independent message sets. In (1) and (4), the decoders collaborate, whereas in (2) and (3), the encoders collaborate. These four problems have been studied in the literature extensively. In this project we point out that for a given distributed source coding problem, under certain cases, we can obtain a specific dual broadcast channel coding problem and vice versa. Similarly, for a given multiple description coding problem, we can find a specific dual multiple access channel coding problem and vice versa.
We explore a mathematical characterization of the functional duality between classical source and channel coding, formulating the precise conditions under which the optimal encoder for one problem is functionally identical to the optimal decoder for the other problem. We then extend this functional duality to the case of coding with side information. We consider several examples which correspond to both discrete-valued and continuous-valued cases to illustrate our formulation.
Classical source coding under a rate-distortion (R-D) constraint and channel coding under a channel cost constraint have long been considered information-theoretic duals (this should not be confused for Lagrangian duality in optimization theory) of each other starting from Shannon's landmark paper in 1959. In the source coding (rate-distortion) problem, the goal is to find, for a given source distribution, the conditional distribution between the input source and the output reconstruction that minimizes the mutual information  between the two subjects to a target distortion constraint corresponding to a given distortion metric. In the channel coding (capacity-cost) problem, the goal is to find, for a given channel input-output conditional distribution, the channel input distribution that maximizes the mutual information between input and output subject to a target channel cost (or power) constraint corresponding to a channel cost metric. Cover and Thomas  point out that these two problems are information-theoretic duals of each other. This duality has also been exploited in the formulation of numerical optimization algorithms for the channel capacity and rate-distortion problems respectively using the Blahut-Arimoto algorithm. In  (see Sec 13.5), a packing versus covering interpretation of duality in Gaussian source and channel coding has been formulated to illustrate this duality. Indeed, some of the solutions to practical source coding (channel coding) have been inspired by solutions to channel coding (source coding): e.g., trellis-coded quantization in source coding has been inspired by trellis-coded modulation in channel coding.
We address the duality between source and channel coding with the objective of providing a mathematical characterization of their functional duality. To be specific, given an optimal source (channel) coding scheme (encoder and decoder), we detail the conditions under which such a scheme is a functional dual to a channel (respectively source) coding scheme (encoder and decoder), in the sense that the optimal encoder mapping for one problem is functionally identical to the optimal decoder mapping for the other problem. We show in the sequel that this functional duality is enabled through appropriate choices of metrics and constraints for the dual source (or channel) coding problem. Our inspiration for this formulation is derived from recent work by Gastpar et al. on the optimality conditions for uncoded communications . We then extend these results to coding with side information and obtain the conditions under which the source and channel coding with side information are exact functional duals.
In this work we propose the concept of successive refinement of information for multiple users. We give achievable rate-distortion regions for the Gaussian source. The performance of the proposed scheme is shown to be superior to conventional approaches based on multiplexed solutions of optimal point-to-point successively refinable transmission strategies. We also provide a more universal approach to multiuser successive refinement based on the Wyner-Ziv method of coding with side-information, where the source reconstruction based on the base layer is treated as side-information during the refinement phase.
With the exponential growth in communication bandwidth and computing speed, more users are getting networked than ever before for seamless transmission of information between each other. Furthermore, heterogeneity of connectivity in the networking world is ever-increasing. The bandwidth of a communication link might vary from a few tens of kilobits per second in hand-held devices to a few megabits per second in high speed data links in ADSL lines. In such scenarios, the management of information storage and transmission has to incorporate interaction of information signals of multiple users. This provides the motivation for the study undertaken in this paper.
Consider an encoder wishing to transmit an information source X to two users as shown in Figure 1. The two receivers are connected to the encoder through errorless channels of unequal capacities. We wish to have a successively refinable transmission of information to these two receivers. We assume the broadcasting of information such that only a part of the information transmitted to the user with higher capacity is received by the user with lower capacity. Thus this is a generalization of the successive refinement of information for a single user. Since the capacities of the channels are unequal, the refinement of the users must also be unequal and proportional to the capacities of the individual links. Note that this work deals with source coding only, and the individual links are assumed to be error-free.
Figure 1: Multiuser successive refinement: the encoder sends an information sequence at rate R bits/sample. This information is intended for both the decoders. Each decoder has access to a quantized version of X.
1Professor, University of Michigan
Microarrays are a powerful tool in bioinformatics because they enable quantitative measures of gene expression levels in the transcription phase of thousands of genes simultaneously. As DNA sequencing techniques become automated, understanding the function of genes is still a huge challenge. For example, given that most cells in our body carry the same set of genetic material, how do cells differentiate? How are complex biological pathways genetically regulated? What are genetic causes of disease?
Microarrays are a new and evolving technology. Experiments are costly and large quantities of 40 MB image files are generated routinely, yet there are no standardized methods for image processing and information extraction. Consequently, compression is a necessary tool, and introduces an interesting question: what is the effect of compression loss on extracted features?
We propose a practical lossless and lossy with refinement compression scheme, SLOCO. We show empirically that the error in extracted gene expression levels caused by our compression loss is smaller than the variability in repeated experiments, which in turn is smaller than the variability caused by different image processing methods . In fact, compression has a denoising effect on the extracted features. We further study rate distortion performance using multiple non-linear distortion measures.
Figure 1: This display is .5/100 of an inkjet microarray image displayed in false color: red indicates high intensities and blue indicates low intensities. Each spot corresponds to one gene.
The focus of this project is the fast reconstruction of photo-realistic 3D city models, in order to facilitate interactive walk-, drive-, and fly-throughs. This modeling effort occurs at two levels: (1) ground-based modeling of building facades using 2D laser scanners and a video cameras, and (2) airborne modeling using airborne laser scans and aerial images. We are currently working on merging these two models into a single one.
For ground-based modeling, an acquisition system has been built that can acquire 3D information from the facades of the buildings. Our experimental set up consists of a truck equipped with one color camera and two fast, inexpensive 2D laser scanners. One scanner is mounted vertically in order to scan the building facades. The other one is mounted horizontally and captures 1800-scans while traveling on city streets under normal traffic conditions. The horizontal scans are used to determine an estimate of the vehicle’s motion in a scan matching process, and relative motion estimates are concatenated to an initial path. Assuming that features such as buildings are visible from both ground-based and airborne view, this initial path is corrected by using probabilistic Monte-Carlo localization. Specifically, the final global pose is obtained by utilizing an aerial photograph or a DSM as a global map, to which the ground-based horizontal laser scans are matched. Figure 1 shows the reconstructed acquisition path superimposed over a digital surface map of Berkeley.
We have developed a set of completely automated data processing algorithms to handle the large data size, and to cope with imperfections and non-idealities inherent in laser scanning systems such as occlusions and reflections from glass surfaces. Dominant building structures are detected, and points are classified into a foreground (trees, cars, etc,) and background layer (building facades). Large holes in the facades, caused by occlusion from foreground objects, are filled in by adaptive interpolation techniques, and further processing removes isolated points and fills remaining small holes. The processed scans are triangulated and texture mapped using the camera images. Applying our technique, we have reconstructed photo-realistic, texture-mapped 3D facade models of five downtown Berkeley city blocks, as shown in Figures 2 and 3. For this a highly detailed model, the acquisition time was 11 minutes, and the total computation time for the completely automated reconstruction was 151 minutes. For airborne modeling, airborne laser scans are acquired and resampled to obtain a digital surface map (DSM), containing roof and terrain shape complementary to the ground-based facades. The DSM is processed in order to sharpen edges and remove erroneous points, triangulated, and texture mapped with an aerial image, in order to obtain a airborne surface mesh as shown in Figure 4 for the east Berkeley campus. Finally, facade models are merged merged with the DSM by removing redundant parts and filling gaps. The result is a 3D city model usable for both walk-, and fly-throughs, as shown in Figure 5.
Figure 1: Reconstructed acquisition path in Berkeley
Figure 2: Downtown Berkeley facade models
Figure 3: Downtown Berkeley facade models
Figure 4: East Berkeley campus
Figure 5: Merged model as seen from a bird's eye view
Video streaming from a wired network to a wireless network involves many challenges. The first question is the rate at which a fixed host should stream video to a mobile host. The rate must be optimal in two ways. First, the rate should be fair to other flows (e.g., TCP flows) in a wired network, which has been recognized as TCP-friendly property. On the other hand, the rate should use the wireless bandwidth efficiently. In order to achieve these two goals, the endhost (i.e., sender or receiver) should apply a congestion control mechanism only when there is congestion. The problem then translates into congestion detection in the presense of a wireless channel error. This problem is not easy to solve since both congestion and wireless channel error produce packet loss, which is traditionally a sign of congestion in the Internet.
This problem is well recognized in a wireless TCP senario, and there are some excellent not-end-to-end solutions toward the problem [1,2]. However, these solutions break end-to-end property, and thus introduce scalability problems. Further thinking reveals that those solutions fail when there are asymmetric routes between the sender and receivers. In summary, this problem is not well solved in wireless TCP.
In this project, we will try to explore the possibility of detecting congestion in the presence of a wireless channel error using active UDP probing. This solution is different from many previous solutions in that it is based on pure end-to-end observation. Although it is hard to have an end-to-end solution for TCP, we think it might be possible in a UDP senario, since it is possible to fully control the sending pattern while doing active probing.
Früh and Zakhor developed efficient techniques for close range modeling of urban environments [1-3]. Combining the close range 3D model with a
Far range modeling of urban environments presents some distinct challenges due the operational disparity in the sensors that acquire information. To fuse the information from different sensors such as an airborne laser scanner and airborne camera, efficient techniques are being developed for fast and accurate registration, segmentation, and polygonalization. Some of the issues in far range modeling are: low sampling density of airborne laser scanner and aerial images, sampling inconsistency between camera images and laser scanner, loss of boundary and edge detail while rebinning the aerial scans, difficulty in segmenting the aerial scan based digital surface map (DSM) to identify buildings and vegetation, registration between aerial scans and images, etc. Our research concentrates on rebinning the scan points using edge preserving resampling techniques for laser scan points and combining information from aerial images to rectify the boundaries and improve the segmentation performance. These steps lead to an improved polygonal approximation of building rooftops, which are then texture mapped using the aerial images for accurate 3D models of building roofs and the terrain shape.
With the recent proliferation of broadband Internet access, video streaming applications are becoming quite commonplace. From news clips on sites such as CNN, to video on demand from sites such as www.movieflix.com, the current Internet offers a much richer multimedia experience than in the past. Traditionally, most video streaming applications have shied away from TCP and have instead utilized UDP for greater flexibility. To prevent congestion-collapse and insure fairness with TCP, much research has gone into creating congestion-controlled UDP protocols such as .
There have been several recent proposals [2-4] which challenge the notion that TCP is unsuitable for multimedia streaming. It has been noted that TCP is the most widely used protocol, and has undergone countless enhancements for host and network performance. Fairness and congestion control are non-issues if the streaming application uses TCP itself. Furthermore, at times it is necessary to use TCP due to client-imposed firewalls, which may only permit HTTP traffic.
In recent work, , we have shown that, given the assumption that the user’s access link is a bottleneck, it is possible for a modified TCP receiver to control the amount of bandwidth received by certain applications. Specifically, we created a receiver-controlled bandwidth sharing system (BWSS), which allowed the user to specify a minimum rate, and a share of the overall bandwidth that different applications should receive. The BWSS operated by controlling the receiver’s advertised window to limit the throughput of less important flows, as specified by the user’s profile. Hence by using the native TCP-flow control mechanisms, it is possible to provide additional quality-of-service for applications. Preliminary results indicate that it is possible to perform efficient video streaming, with simple client side modifications, such as the BWSS, which allow rapid and easy deployment. Figure 1 shows the results of attempting to stream a video at 480 Kbps (60 KB/s) over TCP, with interfering UDP traffic inject from a time of 50 to 150 seconds. Figure 2 demonstrates the same experiment over TCP using the BWSS, and giving the video stream higher priority than the ftp traffic.
Figure 1: Video streaming at 480 Kbps using standard TCP
Figure 2: Video streaming at 480 Kbps using the BWSS
At the Video and Image Processing Laboratory (VIP Lab), we are engaged in automated 3D model generation for urban environments . Ground based modeling involves a setup of two 2D laser scanners and a digital camera mounted on top of a truck. As we drive the truck in a city the laser scans give us depth information using the LIDAR time-of-flight principle. These laser scans are then subjected to accurate localization and 3D data processing algorithms to create a 3D mesh of the urban environment. The resulting mesh is then texture mapped with camera images to produce photo-realistic models.
However, objects such as trees, cars, lamposts, etc., occlude parts of the buildings from the laser scanners and the digital camera and thus leave holes both in the geometry (mesh) and texture (camera images). For a 3D model the user should be able to view the building facade that was hiden behind a tree or some other obstacle. Hole filling in geometry is discussed in [1,2].
We present a simple and efficient method for hole filling in images. Given an image with regions of unknown rgb values (holes) our task is to determine the rgb values (fill the holes) as sensibly as we can from the information available to us from the rest of the image. We use our method to fill holes in the texture atlases generated during automated 3D modeling of urban environments. Hole filling can also be used for other applications such as restoring old and damaged photographs, removing objects from images, and special effects.
We first fill in regions of low frequency by doing a pass of 1D horizontal interpolation in which for each row we try to interpolate the missing columns (corresponding to the holes) if they lie in a region of low frequency. This is followed by a pass of 1D vertical interpolation. We then employ a copy-paste method based on the idea of texture synthesis in  and illustrated in Figure 1. We take a window around the hole, find a matching region in the image, and fill the hole by copying the matching region and pasting it over the hole.
The approach is found to work well on most images and does not suffer from the limitations of local inpainting in traditional hole filling schemes . Figure 2 shows part of a texture atlas with holes marked in red. Figure 3 shows the hole-filled image.
Figure 1: Illustrating the copy-paste method
Figure 2: Part of a texture atlas with holes
Figure 3: Hole-filled image
With the explosive growth of streaming and interactive multimedia applications over the Internet, many approaches have been proposed to stream data effectively over packet switched, best-effort networks. Many use techniques from source and channel coding, implement transport protocols, or modify system architecture in order to deal with the delay, loss, and time-varying nature of the Internet. In this research, we address the packet loss and delay associated with the Internet by proposing a path diversity framework in which multimedia data is streamed via multiple paths to the receiver, leading to higher tolerance against loss and delay.
We investigate two approaches for achieving path diversity for streaming and real time applications over the Internet. For streaming applications, we use multiple senders to stream data to a single receiver while for real time applications, disjoint paths from a sender to a receiver are created using a collection of relay nodes. In the latter approach, we propose a heuristic scheme for selecting a redundant path between a sender and a receiver via a relay node based on information returned from a network tool traceroute. We show with simulations for many Internet-like topologies, that our heuristic scheme is able to find a highly disjoint, redundant path. We further demonstrate that substantial reduction in packet loss can be achieved by dividing packets between the default and the redundant paths.
Within the path diversity framework, we propose a TCP-friendly, receiver-driven protocol for simultaneous video streaming via multiple paths to a single receiver. The TCP-friendly, receiver-driven protocol employs a novel rate allocation scheme and packet partition algorithm. The rate allocation scheme determines the sending rate for each sender based on available network bandwidth, amount of forward error correction (FEC), and channel characteristics in such a way as to minimize the probability of packet loss. The packet partition algorithm ensures that every packet is sent using one and only one path, and at the same time, minimizes the probability of late packets. Using both simulations and actual Internet experiments, we demonstrate the effectiveness of our protocol in reducing packet loss, and hence, achieving higher quality for the streamed media.
Achieving sub-100 nm device fabrication requires a shift of paradigm from today's optical lithography techniques to other alternatives. Examples of such alternatives include X-ray, EUV, and E-beam lithography. The goal of this project is to apply data/signal/image processing/compression techniques to solve data handling problems that are common to a variety of different future lithography techniques ranging from nano-mirror arrays to multiple probes made of 2D array of cantilevers, to ion beam and electron beam lithography. We explore suitable data organizations as well as processing algorithms to prepare the data stream for delivery to massively parallel arrays of writing instruments.
The data handling challenges facing future lithography technologies are similar to those facing the mask writing industry today, except that the data is usually written directly on product wafers instead of masks. A state of the art pattern generator, which can write 4X mask plates at the rate of one plate per hour, could in principle be sped up to write directly onto wafers. However, today's optical lithography projection systems maintain a throughput of one wafer per minute. A hundred-fold speed-up is needed to go from one plate per hour to one plate per minute. Secondly, the design data on a wafer is about 100 times more than that on a plate; i.e., after projection, the image on a plate only covers about 1/100 of the wafer area. Thus the challenge of data handling for maskless lithography is to accomplish a 10,000 fold throughout improvement over today's state of the art mask writers.
To translate this into typical data rates needed in maskless lithography, assume a wafer 300 millimeters in diameter and a writing pixel size of 25 nanometers. For the wafer to be written in 60 seconds, data rates of 1.9 tera-pixels per second are needed. These tera-pixel writing rates and terabit storage force the adoption of a massively parallel writing strategy and system architecture.
The goal of the data handling system is to bring a chip's design data stored on disk to a massive array of parallel writers at a data rate of 1.9 tera-pixels per second. Based on memory, processing power, and throughput requirements, we propose a system architecture consisting of storage disks, a processor board, and decode circuitry fabricated on the same chip as the hardware writers, as shown in Figure 1.
The critical bottleneck of this design is the transfer of data from the processor board to the on-chip hardware, which is limited in throughput to 400 Gb/s by the number of pins on the chip and the frequency at which the pins can operate, e.g., 1,000 pins operating at 400 MHz. Another critical bottleneck is the real-time decode that must be done on-chip, which precludes such complex operations as rasterization. Considering that the writers require about ten terabits per second of data, and the processor board can deliver at most 400 Gb/s to the on-chip hardware, we estimate that a compression ratio of 25 is necessary to achieve the data rates desired.
We have tested several compression algorithms capable of achieving high compression ratios on modern layout data, including a lossless version of SPIHT image compression, Ziv-Lempel (LZ77), our own 2D-LZ which extends LZ matching to two dimensions, and the Burrows-Wheeler transform (BWT) as implemented by BZIP2. The results are presented in Table 1 (Figure 4). Clearly, for the test data, LZ77, 2D-LZ, and BZIP2 consistently achieve a compression ratio larger than 25, with BZIP2 outperforming 2D-LZ, which in turn outperforms LZ77. In terms of implementation complexity, LZ77 is simpler than 2D-LZ, which is in turn simpler than BZIP2, which can be seen from the decoder buffer size requirements of each algorithm listed in the last row of Table 1: LZ77 requires a 2 KB buffer for decoding, 2D-LZ requires a 200 KB buffer, and BZIP requires a 900 KB buffer. These compression results demonstrate that there is, in fact, a tradeoff between decoding complexity and compression efficiency. Because different maskless lithography systems have different implementation requirements, we seek to develop a spectrum of techniques that can trade off compression efficiency for lower implementation complexity.
Combinatorial coding (CC) is a new compression technique we developed which achieves the compression efficiency of arithmetic coding with the speed and simplicity of Huffman coding. It has a low-complexity implementation, requiring a set of small fixed code tables, and the ability to perform 32-bit integer addition, subtraction, and comparison operations at the decoder. To test the capabilities of combinatorial coding, we apply arithmetic, combinatorial, and Huffman coding in conjunction with standard context-based modeling of binary (bi-level) images to a binary rasterized image of a VLSI layout with 3694x3078 pixels for a total size of 11370 Kb. Samples of this image are shown in Figure 2, and a simple 3-pixel context used to model this layout is shown in Figure 3. The resulting file sizes are shown in Table 2 (Figure 5), along with compression ratios in parenthesis, and encoding/decoding run times on a 800 MHz Pentium 3. The principal limitation of CC is that it is a binary code. In order to apply CC to non-binary sources, proper binarization techniques still need to be developed.
Figure 1: System architecture
Figure 2: Samples of the binary rasterized image of VLSI layout 132x150 in size
Figure 3: 3-pixel context model for VLSI layout
Figure 4: Table 1: Compression ratios of SPIHT, LZ77, 2D-LZ, BZIP
Figure 5: Table 2: Result of 3-pixel context based binary image compression
The objective of this project is to develop efficient techniques for reliable video streaming over wireless ad hoc networks. A wireless ad hoc network is a collection of wireless mobile nodes which dynamically form a temporary network without infrastructure. Possible applications of video streaming over ad hoc networks include: a quick set-up for video conferencing in a place without a wireless infrastructure, transmitting video on the battlefield, and search and rescue operations after a disaster. There are many challenges for supporting video traffic over ad hoc networks. Due to the mobility of wireless nodes, the topology of an ad hoc network may change often. Thus the established connection route between a source and a destination is likely to be broken during the period of the transmission, which may cause interruption, pause, or jerkiness in the received video signal. Other factors that influence the quality include random packet loss caused by wireless channel errors and the small capacity of a wireless network.
We propose a cross-layer scheme to address the above problems. In the network layer, the routing protocol will treat video traffic and other traffic differently. It will establish multiple paths between the source node and the destination for video applications to enhance the robustness of the system and increase the bandwidth for an end-to-end connection. When one path is broken during the transmission, the routing protocol can still use other paths to transmit video traffic, and use alternative paths in the middle nodes to rescue the packets that are stuck in the broken path. In the application layer, the agents will monitor the state of the paths and allocate video traffic adaptively.
We will verify our results using both NS simulations and real experiments on an experimental testbed.