Achieving sub-100 nm device fabrication requires a shift of paradigm from today's optical lithography techniques to other alternatives. Examples of such alternatives include X-ray, EUV, and E-beam lithography. The goal of this project is to apply data/signal/image processing/compression techniques to solve data handling problems that are common to a variety of different future lithography techniques ranging from nano-mirror arrays to multiple probes made of 2D array of cantilevers, to ion beam and electron beam lithography. We explore suitable data organizations as well as processing algorithms to prepare the data stream for delivery to massively parallel arrays of writing instruments.
The data handling challenges facing future lithography technologies are similar to those facing the mask writing industry today, except that the data is usually written directly on product wafers instead of masks. A state of the art pattern generator, which can write 4X mask plates at the rate of one plate per hour, could in principle be sped up to write directly onto wafers. However, today's optical lithography projection systems maintain a throughput of one wafer per minute. A hundred-fold speed-up is needed to go from one plate per hour to one plate per minute. Secondly, the design data on a wafer is about 100 times more than that on a plate; i.e., after projection, the image on a plate only covers about 1/100 of the wafer area. Thus the challenge of data handling for maskless lithography is to accomplish a 10,000 fold throughout improvement over today's state of the art mask writers.
To translate this into typical data rates needed in maskless lithography, assume a wafer 300 millimeters in diameter and a writing pixel size of 25 nanometers. For the wafer to be written in 60 seconds, data rates of 1.9 tera-pixels per second are needed. These tera-pixel writing rates and terabit storage force the adoption of a massively parallel writing strategy and system architecture.
The goal of the data handling system is to bring a chip's design data stored on disk to a massive array of parallel writers at a data rate of 1.9 tera-pixels per second. Based on memory, processing power, and throughput requirements, we propose a system architecture consisting of storage disks, a processor board, and decode circuitry fabricated on the same chip as the hardware writers, as shown in Figure 1.
The critical bottleneck of this design is the transfer of data from the processor board to the on-chip hardware, which is limited in throughput to 400 Gb/s by the number of pins on the chip and the frequency at which the pins can operate, e.g., 1,000 pins operating at 400 MHz. Another critical bottleneck is the real-time decode that must be done on-chip, which precludes such complex operations as rasterization. Considering that the writers require about ten terabits per second of data, and the processor board can deliver at most 400 Gb/s to the on-chip hardware, we estimate that a compression ratio of 25 is necessary to achieve the data rates desired.
We have tested several compression algorithms capable of achieving high compression ratios on modern layout data, including a lossless version of SPIHT image compression, Ziv-Lempel (LZ77), our own 2D-LZ which extends LZ matching to two dimensions, and the Burrows-Wheeler transform (BWT) as implemented by BZIP2. The results are presented in Table 1 (Figure 4). Clearly, for the test data, LZ77, 2D-LZ, and BZIP2 consistently achieve a compression ratio larger than 25, with BZIP2 outperforming 2D-LZ, which in turn outperforms LZ77. In terms of implementation complexity, LZ77 is simpler than 2D-LZ, which is in turn simpler than BZIP2, which can be seen from the decoder buffer size requirements of each algorithm listed in the last row of Table 1: LZ77 requires a 2 KB buffer for decoding, 2D-LZ requires a 200 KB buffer, and BZIP requires a 900 KB buffer. These compression results demonstrate that there is, in fact, a tradeoff between decoding complexity and compression efficiency. Because different maskless lithography systems have different implementation requirements, we seek to develop a spectrum of techniques that can trade off compression efficiency for lower implementation complexity.
Combinatorial coding (CC) is a new compression technique we developed which achieves the compression efficiency of arithmetic coding with the speed and simplicity of Huffman coding. It has a low-complexity implementation, requiring a set of small fixed code tables, and the ability to perform 32-bit integer addition, subtraction, and comparison operations at the decoder. To test the capabilities of combinatorial coding, we apply arithmetic, combinatorial, and Huffman coding in conjunction with standard context-based modeling of binary (bi-level) images to a binary rasterized image of a VLSI layout with 3694x3078 pixels for a total size of 11370 Kb. Samples of this image are shown in Figure 2, and a simple 3-pixel context used to model this layout is shown in Figure 3. The resulting file sizes are shown in Table 2 (Figure 5), along with compression ratios in parenthesis, and encoding/decoding run times on a 800 MHz Pentium 3. The principal limitation of CC is that it is a binary code. In order to apply CC to non-binary sources, proper binarization techniques still need to be developed.
Figure 1: System architecture
Figure 2: Samples of the binary rasterized image of VLSI layout 132x150 in size
Figure 3: 3-pixel context model for VLSI layout
Figure 4: Table 1: Compression ratios of SPIHT, LZ77, 2D-LZ, BZIP
Figure 5: Table 2: Result of 3-pixel context based binary image compression