Hsin-I Liu

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2010-47

April 29, 2010

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-47.pdf

Future lithography systems must produce chips with smaller feature sizes, while maintaining throughput comparable to today’s optical lithography systems. This places stringent data handling requirements on the design of any direct-write maskless system. To achieve the throughput of one wafer layer per minute with a direct-write maskless lithography system, using 22 nm pixels for 45 nm technology, a data rate of 12 Tb/s is required. A recently proposed datapath architecture for direct-write lithography systems shows that lossless compression could play a key role in reducing the system throughput requirements. This architecture integrates low complexity hardware-based decoders with the writers, in order to decode a compressed rasterized layout in real time. To this end, a spectrum of lossless compression algorithms have been developed for rasterized integrated circuit (IC) layout data to provide a tradeoff between compression efficiency and hardware complexity. In this thesis, I extend Block Context Copy Combinatorial Code (Block C4), a previously proposed lossless compression algorithm, to Block Golomb Context Copy Code (Block GC3), in order to reduce the hardware complexity, and to improve the system throughput. In particular, the hierarchical combinatorial code in Block C4 is replaced by Golomb run-length code to result in Block GC3. Block GC3 achieves minimum compression efficiency of 6.5 for 1024 × 1024, 5-bit Poly layer layouts in 130 nm technology. Even though this compression efficiency is 15% lower than that of Block C4, Block GC3 decoder is 40% smaller in area than Block C4 decoder.

In this thesis, I also illustrate hardware implementation of Block GC3 decoder with FPGA and ASIC synthesis flow. For one Block GC3 decoder with 8 × 8 block size, 3233 slice flipflops and 3086 4-input LUTs are utilized in a Xilinx Virtex II Pro 70 FPGA, corresponding to 4% of its resources. The decoder has 1.7 KB internal memory, which is implemented with 36 block memories, corresponding to 10% of the FPGA resources. The system runs at 100 MHz clock rate, with the overall output rate of 495 Mb/s for a single decoder. The corresponding ASIC implementation results in a 0.07 mm^2 design with the maximum output rate of 2.47 Gb/s.

I also explore the tradeoff between encoder complexity and compression efficiency, with a case study for reflective E-beam lithography (REBL) system. In order to accommodate REBL’s rotary writing system, I introduce Block RGC3, a variant of Block GC3, in order to adapt to the diagonal repetition of the rotated layout images. By increasing the encoding complexity, Block RGC3 achieves minimum compression efficiency of 5.9 for 256 × 2048, 5-bit Metal-1 layer layouts in 65 nm technology with 40 KB buffer; this outperforms Block GC3 and all existing lossless compression algorithms, while maintaining a simple decoder architecture.

Advisors: Avideh Zakhor


BibTeX citation:

@phdthesis{Liu:EECS-2010-47,
    Author= {Liu, Hsin-I},
    Title= {Architecture and Hardware Design of Lossless Compression Algorithms for Direct-Write Maskless Lithography Systems},
    School= {EECS Department, University of California, Berkeley},
    Year= {2010},
    Month= {Apr},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-47.html},
    Number= {UCB/EECS-2010-47},
    Abstract= {Future lithography systems must produce chips with smaller feature sizes, while maintaining throughput comparable to today’s optical lithography systems. This places stringent data handling requirements on the design of any direct-write maskless system. To achieve the throughput of one wafer layer per minute with a direct-write maskless lithography system, using 22 nm pixels for 45 nm technology, a data rate of 12 Tb/s is required. A recently proposed datapath architecture for direct-write lithography systems shows that lossless compression could play a key role in reducing the system throughput requirements. This architecture integrates low complexity hardware-based decoders with the writers, in order to decode a
compressed rasterized layout in real time. To this end, a spectrum of lossless compression algorithms have been developed for rasterized integrated circuit (IC) layout data to provide a tradeoff between compression efficiency and hardware complexity. In this thesis, I extend Block Context Copy Combinatorial Code (Block C4), a previously proposed lossless compression algorithm, to Block Golomb Context Copy Code (Block GC3), in order to reduce the hardware complexity, and to improve the system throughput. In particular, the hierarchical combinatorial code in Block C4 is replaced by Golomb run-length code to result in Block GC3. Block GC3 achieves minimum compression efficiency of 6.5 for 1024 × 1024, 5-bit Poly layer layouts in 130 nm technology. Even though this compression efficiency is 15% lower than that of Block C4, Block GC3 decoder is 40% smaller in area than Block C4 decoder.

In this thesis, I also illustrate hardware implementation of Block GC3 decoder with FPGA and ASIC synthesis flow. For one Block GC3 decoder with 8 × 8 block size, 3233 slice flipflops and 3086 4-input LUTs are utilized in a Xilinx Virtex II Pro 70 FPGA, corresponding to 4% of its resources. The decoder has 1.7 KB internal memory, which is implemented with 36 block memories, corresponding to 10% of the FPGA resources. The system runs at 100 MHz clock rate, with the overall output rate of 495 Mb/s for a single decoder. The corresponding ASIC implementation results in a 0.07 mm^2 design with the maximum output rate of 2.47 Gb/s.

I also explore the tradeoff between encoder complexity and compression efficiency, with a case study for reflective E-beam lithography (REBL) system. In order to accommodate REBL’s rotary writing system, I introduce Block RGC3, a variant of Block GC3, in order to adapt to the diagonal repetition of the rotated layout images. By increasing the encoding complexity, Block RGC3 achieves minimum compression efficiency of 5.9 for 256 × 2048, 5-bit Metal-1 layer layouts in 65 nm technology with 40 KB buffer; this outperforms Block GC3 and all existing lossless compression algorithms, while maintaining a simple decoder architecture.},
}

EndNote citation:

%0 Thesis
%A Liu, Hsin-I 
%T Architecture and Hardware Design of Lossless Compression Algorithms for Direct-Write Maskless Lithography Systems
%I EECS Department, University of California, Berkeley
%D 2010
%8 April 29
%@ UCB/EECS-2010-47
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-47.html
%F Liu:EECS-2010-47