We are developing an interactive rendering engine to visualize large, detailed city models acquired with laser scanners and digital cameras. These models are considerably more complex than those created by hand or acquired via semi-automatic methods, and require significant complexity reduction for interactive visualization. In particular, huge quantities of texture data expose bottlenecks in the graphics pipeline that are not addressed by previous rendering algorithms.
Our rendering engine is based on three separate strategies that enable large and complex 3D models to be rendered interactively: (1) frustum culling, (2) levels of detail (LODs), and (3) management of data. We extend LODs to create a hierarchical structure and implement an algorithm that traverses the hierarchy to render frames. Our system adaptively selects the appropriate LODs by slicing the available time. Lastly, our LODs are managed in memory to avoid swapping. This is done by loading only the most simplistic LODs in memory and incrementally pre-fetching other LODs from disk with a background thread based on a priority heuristic. Experimental results are shown to demonstrate the effectiveness of our approach.
Figure 1: The highest level of detail texture map of a block in Berkeley with one in every ten triangles overlaid in blue on top of the texture. Note that the texture is not tiled and the triangles are not simplified or collapsed to form larger ones.
Figure 2: The entire city model, as seen through our rendering system, has 27 blocks. Each block consists mostly of the building facades and some of the street.
Figure 3: Closeup of one of the blocks in our city model.
The focus of this project is the fast reconstruction of photo-realistic 3D city models, in order to facilitate interactive walk-, drive-, and fly-throughs. This modeling effort occurs at two levels: (1) ground-based modeling of building facades using 2D laser scanners and a video cameras, and (2) airborne modeling using airborne laser scans and aerial images. We are currently working on merging these two models into a single one.
For ground-based modeling, an acquisition system has been built that can acquire 3D information from the facades of the buildings. Our experimental set up consists of a truck equipped with one color camera and two fast, inexpensive 2D laser scanners. One scanner is mounted vertically in order to scan the building facades. The other one is mounted horizontally and captures 1800-scans while traveling on city streets under normal traffic conditions. The horizontal scans are used to determine an estimate of the vehicle’s motion in a scan matching process, and relative motion estimates are concatenated to an initial path. Assuming that features such as buildings are visible from both ground-based and airborne view, this initial path is corrected by using probabilistic Monte-Carlo localization. Specifically, the final global pose is obtained by utilizing an aerial photograph or a DSM as a global map, to which the ground-based horizontal laser scans are matched. Figure 1 shows the reconstructed acquisition path superimposed over a digital surface map of Berkeley.
We have developed a set of completely automated data processing algorithms to handle the large data size, and to cope with imperfections and non-idealities inherent in laser scanning systems such as occlusions and reflections from glass surfaces. Dominant building structures are detected, and points are classified into a foreground (trees, cars, etc,) and background layer (building facades). Large holes in the facades, caused by occlusion from foreground objects, are filled in by adaptive interpolation techniques, and further processing removes isolated points and fills remaining small holes. The processed scans are triangulated and texture mapped using the camera images. Applying our technique, we have reconstructed photo-realistic, texture-mapped 3D facade models of five downtown Berkeley city blocks, as shown in Figures 2 and 3. For this a highly detailed model, the acquisition time was 11 minutes, and the total computation time for the completely automated reconstruction was 151 minutes. For airborne modeling, airborne laser scans are acquired and resampled to obtain a digital surface map (DSM), containing roof and terrain shape complementary to the ground-based facades. The DSM is processed in order to sharpen edges and remove erroneous points, triangulated, and texture mapped with an aerial image, in order to obtain a airborne surface mesh as shown in Figure 4 for the east Berkeley campus. Finally, facade models are merged merged with the DSM by removing redundant parts and filling gaps. The result is a 3D city model usable for both walk-, and fly-throughs, as shown in Figure 5.
Figure 1: Reconstructed acquisition path in Berkeley
Figure 2: Downtown Berkeley facade models
Figure 3: Downtown Berkeley facade models
Figure 4: East Berkeley campus
Figure 5: Merged model as seen from a bird's eye view
Video streaming from a wired network to a wireless network involves many challenges. The first question is the rate at which a fixed host should stream video to a mobile host. The rate must be optimal in two ways. First, the rate should be fair to other flows (e.g., TCP flows) in a wired network, which has been recognized as TCP-friendly property. On the other hand, the rate should use the wireless bandwidth efficiently. In order to achieve these two goals, the endhost (i.e., sender or receiver) should apply a congestion control mechanism only when there is congestion. The problem then translates into congestion detection in the presense of a wireless channel error. This problem is not easy to solve since both congestion and wireless channel error produce packet loss, which is traditionally a sign of congestion in the Internet.
This problem is well recognized in a wireless TCP senario, and there are some excellent not-end-to-end solutions toward the problem [1,2]. However, these solutions break end-to-end property, and thus introduce scalability problems. Further thinking reveals that those solutions fail when there are asymmetric routes between the sender and receivers. In summary, this problem is not well solved in wireless TCP.
In this project, we will try to explore the possibility of detecting congestion in the presence of a wireless channel error using active UDP probing. This solution is different from many previous solutions in that it is based on pure end-to-end observation. Although it is hard to have an end-to-end solution for TCP, we think it might be possible in a UDP senario, since it is possible to fully control the sending pattern while doing active probing.
Früh and Zakhor developed efficient techniques for close range modeling of urban environments [1-3]. Combining the close range 3D model with a
Far range modeling of urban environments presents some distinct challenges due the operational disparity in the sensors that acquire information. To fuse the information from different sensors such as an airborne laser scanner and airborne camera, efficient techniques are being developed for fast and accurate registration, segmentation, and polygonalization. Some of the issues in far range modeling are: low sampling density of airborne laser scanner and aerial images, sampling inconsistency between camera images and laser scanner, loss of boundary and edge detail while rebinning the aerial scans, difficulty in segmenting the aerial scan based digital surface map (DSM) to identify buildings and vegetation, registration between aerial scans and images, etc. Our research concentrates on rebinning the scan points using edge preserving resampling techniques for laser scan points and combining information from aerial images to rectify the boundaries and improve the segmentation performance. These steps lead to an improved polygonal approximation of building rooftops, which are then texture mapped using the aerial images for accurate 3D models of building roofs and the terrain shape.
With the recent proliferation of broadband Internet access, video streaming applications are becoming quite commonplace. From news clips on sites such as CNN, to video on demand from sites such as www.movieflix.com, the current Internet offers a much richer multimedia experience than in the past. Traditionally, most video streaming applications have shied away from TCP and have instead utilized UDP for greater flexibility. To prevent congestion-collapse and insure fairness with TCP, much research has gone into creating congestion-controlled UDP protocols such as .
There have been several recent proposals [2-4] which challenge the notion that TCP is unsuitable for multimedia streaming. It has been noted that TCP is the most widely used protocol, and has undergone countless enhancements for host and network performance. Fairness and congestion control are non-issues if the streaming application uses TCP itself. Furthermore, at times it is necessary to use TCP due to client-imposed firewalls, which may only permit HTTP traffic.
In recent work, , we have shown that, given the assumption that the user’s access link is a bottleneck, it is possible for a modified TCP receiver to control the amount of bandwidth received by certain applications. Specifically, we created a receiver-controlled bandwidth sharing system (BWSS), which allowed the user to specify a minimum rate, and a share of the overall bandwidth that different applications should receive. The BWSS operated by controlling the receiver’s advertised window to limit the throughput of less important flows, as specified by the user’s profile. Hence by using the native TCP-flow control mechanisms, it is possible to provide additional quality-of-service for applications. Preliminary results indicate that it is possible to perform efficient video streaming, with simple client side modifications, such as the BWSS, which allow rapid and easy deployment. Figure 1 shows the results of attempting to stream a video at 480 Kbps (60 KB/s) over TCP, with interfering UDP traffic inject from a time of 50 to 150 seconds. Figure 2 demonstrates the same experiment over TCP using the BWSS, and giving the video stream higher priority than the ftp traffic.
Figure 1: Video streaming at 480 Kbps using standard TCP
Figure 2: Video streaming at 480 Kbps using the BWSS
Streaming media applications are becoming increasingly popular on the Internet. For streaming applications the media data must be encoded without the benefit of knowing the state of the channel during transmission. For this reason, in streaming media systems, the encoding must be flexible and the server must adaptively select and transmit the appropriate data to the client, as a function of the state of the network. Since many media streaming applications use pre-roll buffer, there is large flexibility in packet scheduling algorithms, significantly affecting rate-distortion performance [1,2].
In our study, we analyze the expected global distortion in video streaming assuming error concealment at the receiver. We also propose a simple and real-time applicable rate-distortion optimization method based on the effective bandwidth approach. We evaluate the effective bandwidth of VBR video data by computing the autocovariance of data size in a group of picture (GOP), taking into account the variance of channel bandwidth fluctuation. Assigning a different priority to motion vector and texture, we obtain SNR scalability effectively . We also adopt temporal scalability, resulting in graceful degradation in channels with heavy loss. Experimental results show that our scheduling algorithm, motion-texture discrimination, and temporal scalability yield a performance improvement of several dBs in PSNR compared to conventional sequential transmission of video data without scalability.
At the Video and Image Processing Laboratory (VIP Lab), we are engaged in automated 3D model generation for urban environments . Ground based modeling involves a setup of two 2D laser scanners and a digital camera mounted on top of a truck. As we drive the truck in a city the laser scans give us depth information using the LIDAR time-of-flight principle. These laser scans are then subjected to accurate localization and 3D data processing algorithms to create a 3D mesh of the urban environment. The resulting mesh is then texture mapped with camera images to produce photo-realistic models.
However, objects such as trees, cars, lamposts, etc., occlude parts of the buildings from the laser scanners and the digital camera and thus leave holes both in the geometry (mesh) and texture (camera images). For a 3D model the user should be able to view the building facade that was hiden behind a tree or some other obstacle. Hole filling in geometry is discussed in [1,2].
We present a simple and efficient method for hole filling in images. Given an image with regions of unknown rgb values (holes) our task is to determine the rgb values (fill the holes) as sensibly as we can from the information available to us from the rest of the image. We use our method to fill holes in the texture atlases generated during automated 3D modeling of urban environments. Hole filling can also be used for other applications such as restoring old and damaged photographs, removing objects from images, and special effects.
We first fill in regions of low frequency by doing a pass of 1D horizontal interpolation in which for each row we try to interpolate the missing columns (corresponding to the holes) if they lie in a region of low frequency. This is followed by a pass of 1D vertical interpolation. We then employ a copy-paste method based on the idea of texture synthesis in  and illustrated in Figure 1. We take a window around the hole, find a matching region in the image, and fill the hole by copying the matching region and pasting it over the hole.
The approach is found to work well on most images and does not suffer from the limitations of local inpainting in traditional hole filling schemes . Figure 2 shows part of a texture atlas with holes marked in red. Figure 3 shows the hole-filled image.
Figure 1: Illustrating the copy-paste method
Figure 2: Part of a texture atlas with holes
Figure 3: Hole-filled image
With the explosive growth of streaming and interactive multimedia applications over the Internet, many approaches have been proposed to stream data effectively over packet switched, best-effort networks. Many use techniques from source and channel coding, implement transport protocols, or modify system architecture in order to deal with the delay, loss, and time-varying nature of the Internet. In this research, we address the packet loss and delay associated with the Internet by proposing a path diversity framework in which multimedia data is streamed via multiple paths to the receiver, leading to higher tolerance against loss and delay.
We investigate two approaches for achieving path diversity for streaming and real time applications over the Internet. For streaming applications, we use multiple senders to stream data to a single receiver while for real time applications, disjoint paths from a sender to a receiver are created using a collection of relay nodes. In the latter approach, we propose a heuristic scheme for selecting a redundant path between a sender and a receiver via a relay node based on information returned from a network tool traceroute. We show with simulations for many Internet-like topologies, that our heuristic scheme is able to find a highly disjoint, redundant path. We further demonstrate that substantial reduction in packet loss can be achieved by dividing packets between the default and the redundant paths.
Within the path diversity framework, we propose a TCP-friendly, receiver-driven protocol for simultaneous video streaming via multiple paths to a single receiver. The TCP-friendly, receiver-driven protocol employs a novel rate allocation scheme and packet partition algorithm. The rate allocation scheme determines the sending rate for each sender based on available network bandwidth, amount of forward error correction (FEC), and channel characteristics in such a way as to minimize the probability of packet loss. The packet partition algorithm ensures that every packet is sent using one and only one path, and at the same time, minimizes the probability of late packets. Using both simulations and actual Internet experiments, we demonstrate the effectiveness of our protocol in reducing packet loss, and hence, achieving higher quality for the streamed media.
Achieving sub-100 nm device fabrication requires a shift of paradigm from today's optical lithography techniques to other alternatives. Examples of such alternatives include X-ray, EUV, and E-beam lithography. The goal of this project is to apply data/signal/image processing/compression techniques to solve data handling problems that are common to a variety of different future lithography techniques ranging from nano-mirror arrays to multiple probes made of 2D array of cantilevers, to ion beam and electron beam lithography. We explore suitable data organizations as well as processing algorithms to prepare the data stream for delivery to massively parallel arrays of writing instruments.
The data handling challenges facing future lithography technologies are similar to those facing the mask writing industry today, except that the data is usually written directly on product wafers instead of masks. A state of the art pattern generator, which can write 4X mask plates at the rate of one plate per hour, could in principle be sped up to write directly onto wafers. However, today's optical lithography projection systems maintain a throughput of one wafer per minute. A hundred-fold speed-up is needed to go from one plate per hour to one plate per minute. Secondly, the design data on a wafer is about 100 times more than that on a plate; i.e., after projection, the image on a plate only covers about 1/100 of the wafer area. Thus the challenge of data handling for maskless lithography is to accomplish a 10,000 fold throughout improvement over today's state of the art mask writers.
To translate this into typical data rates needed in maskless lithography, assume a wafer 300 millimeters in diameter and a writing pixel size of 25 nanometers. For the wafer to be written in 60 seconds, data rates of 1.9 tera-pixels per second are needed. These tera-pixel writing rates and terabit storage force the adoption of a massively parallel writing strategy and system architecture.
The goal of the data handling system is to bring a chip's design data stored on disk to a massive array of parallel writers at a data rate of 1.9 tera-pixels per second. Based on memory, processing power, and throughput requirements, we propose a system architecture consisting of storage disks, a processor board, and decode circuitry fabricated on the same chip as the hardware writers, as shown in Figure 1.
The critical bottleneck of this design is the transfer of data from the processor board to the on-chip hardware, which is limited in throughput to 400 Gb/s by the number of pins on the chip and the frequency at which the pins can operate, e.g., 1,000 pins operating at 400 MHz. Another critical bottleneck is the real-time decode that must be done on-chip, which precludes such complex operations as rasterization. Considering that the writers require about ten terabits per second of data, and the processor board can deliver at most 400 Gb/s to the on-chip hardware, we estimate that a compression ratio of 25 is necessary to achieve the data rates desired.
We have tested several compression algorithms capable of achieving high compression ratios on modern layout data, including a lossless version of SPIHT image compression, Ziv-Lempel (LZ77), our own 2D-LZ which extends LZ matching to two dimensions, and the Burrows-Wheeler transform (BWT) as implemented by BZIP2. The results are presented in Table 1 (Figure 4). Clearly, for the test data, LZ77, 2D-LZ, and BZIP2 consistently achieve a compression ratio larger than 25, with BZIP2 outperforming 2D-LZ, which in turn outperforms LZ77. In terms of implementation complexity, LZ77 is simpler than 2D-LZ, which is in turn simpler than BZIP2, which can be seen from the decoder buffer size requirements of each algorithm listed in the last row of Table 1: LZ77 requires a 2 KB buffer for decoding, 2D-LZ requires a 200 KB buffer, and BZIP requires a 900 KB buffer. These compression results demonstrate that there is, in fact, a tradeoff between decoding complexity and compression efficiency. Because different maskless lithography systems have different implementation requirements, we seek to develop a spectrum of techniques that can trade off compression efficiency for lower implementation complexity.
Combinatorial coding (CC) is a new compression technique we developed which achieves the compression efficiency of arithmetic coding with the speed and simplicity of Huffman coding. It has a low-complexity implementation, requiring a set of small fixed code tables, and the ability to perform 32-bit integer addition, subtraction, and comparison operations at the decoder. To test the capabilities of combinatorial coding, we apply arithmetic, combinatorial, and Huffman coding in conjunction with standard context-based modeling of binary (bi-level) images to a binary rasterized image of a VLSI layout with 3694x3078 pixels for a total size of 11370 Kb. Samples of this image are shown in Figure 2, and a simple 3-pixel context used to model this layout is shown in Figure 3. The resulting file sizes are shown in Table 2 (Figure 5), along with compression ratios in parenthesis, and encoding/decoding run times on a 800 MHz Pentium 3. The principal limitation of CC is that it is a binary code. In order to apply CC to non-binary sources, proper binarization techniques still need to be developed.
Figure 1: System architecture
Figure 2: Samples of the binary rasterized image of VLSI layout 132x150 in size
Figure 3: 3-pixel context model for VLSI layout
Figure 4: Table 1: Compression ratios of SPIHT, LZ77, 2D-LZ, BZIP
Figure 5: Table 2: Result of 3-pixel context based binary image compression
The objective of this project is to develop efficient techniques for reliable video streaming over wireless ad hoc networks. A wireless ad hoc network is a collection of wireless mobile nodes which dynamically form a temporary network without infrastructure. Possible applications of video streaming over ad hoc networks include: a quick set-up for video conferencing in a place without a wireless infrastructure, transmitting video on the battlefield, and search and rescue operations after a disaster. There are many challenges for supporting video traffic over ad hoc networks. Due to the mobility of wireless nodes, the topology of an ad hoc network may change often. Thus the established connection route between a source and a destination is likely to be broken during the period of the transmission, which may cause interruption, pause, or jerkiness in the received video signal. Other factors that influence the quality include random packet loss caused by wireless channel errors and the small capacity of a wireless network.
We propose a cross-layer scheme to address the above problems. In the network layer, the routing protocol will treat video traffic and other traffic differently. It will establish multiple paths between the source node and the destination for video applications to enhance the robustness of the system and increase the bandwidth for an end-to-end connection. When one path is broken during the transmission, the routing protocol can still use other paths to transmit video traffic, and use alternative paths in the middle nodes to rescue the packets that are stuck in the broken path. In the application layer, the agents will monitor the state of the paths and allocate video traffic adaptively.
We will verify our results using both NS simulations and real experiments on an experimental testbed.