Chapter 2: Computer Aided Design for VLSI

The EECS Research Summary for 2003


Conflict-driven Learning in a 0-1 ILP Solver

Donald Chai
(Professors Robert K. Brayton and Andreas Kuehlmann)
(SRC) 683-003

Modern solvers for the Boolean Satisfiability (SAT) problem rely heavily on the analysis of conflicts in a backtrack search to learn useful constraints to guide the search. Other families of constraints such as linear pseudo-Boolean and cardinality constraints may more naturally express constraints normally found in many EDA applications, making the underlying structure more transparent. These constraints may be handled by generic integer linear programming (ILP) solvers, but they may ignore the Boolean nature of the variables.

We generalize learning schemes from recent SAT solvers to efficiently handle the more expressive constraints. Theoretical results have shown that there may be exponential improvements in runtime in moving from conjunctive normal form to cardinality constraints, which are exhibited in some of our experiments.


Send mail to the author : (donald@eecs.berkeley.edu)

Fishbone: A Block-level Placement and Routing Scheme

Fan Mo
(Professor Robert K. Brayton)
GSRC

Fishbone is a block-level placement and routing scheme that has been developed. The routing uses a two-layer spine topology. The pin locations are configurable and restricted to certain routing grids in order to ensure full routability and precise predictability. With this scheme, exact net topologies are determined by pin positions only. Hence, during block placement, net parameters such as wire length and delay can be derived directly. The construction of Fishbone nets is much faster than for Steiner trees. This enables the integration of block placement and routing. There is no separate routing stage.

[1]
F. Mo and R. K. Brayton, "Regular Fabrics in Deep Sub-Micron Integrated-Circuit Design," Int. Workshop on Logic and Synthesis, New Orleans, LA, June 2002.

More information (http://www-cad.eecs.berkeley.edu/~fanmo/Fishbone/index.html) or

Send mail to the author : (fanmo@eecs.berkeley.edu)

PLA-based Regular Structures and Their Synthesis

Fan Mo
(Professor Robert K. Brayton)
GSRC

We propose two regular circuit structures based on the programmable logic array (PLA). They provide alternatives to the widely used standard cell structure and have better predictability and simpler design methodologies. A whirlpool PLA is a cyclic four-level structure, which has a compact layout. Doppio-ESPRESSO, a four-level logic minimization algorithm, is developed for the synthesis of Whirlpool PLAs. A river PLA is a stack of multiple output PLAs, which uses river routing for the interconnections of the adjacent PLAs. A synthesis algorithm for river PLAs uses simulated annealing, targeting a combination of minimal area and delay.

[1]
F. Mo and R. K. Brayton, "River PLA: A Regular Circuit Structure," Design Automation Conf., New Orleans, LA, June 2002.
[2]
F. Mo and R. K. Brayton, "Whirlpool PLAs: A Regular Logic Structure and Their Synthesis," ICCAD, San Jose, CA, November 2002.

Send mail to the author : (fanmo@eecs.berkeley.edu)

MVSIS

Jie-Hong Jiang, Yunjian Jiang, Yinghua Li, Donald Chai, Alan Mishchenko1, and Tiziano Villa2
(Professor Robert K. Brayton)
(SRC) 683-004

MVSIS is an experimental software infrastructure for optimization and synthesis of sequential multi-valued logic networks. It is a network of nodes, where each node represents multi-valued relation. It is a structural representation of a non-deterministic finite automaton. The model is general enough for reasoning about formal verification, control synthesis, and synthesis for both hardware and software. It represents new synthesis challenges that were never addressed before. Some initial study is highlighted in [1].

Some synthesis results have been obtained for algebraic and Boolean optimizations, as presented in [2], which are extensions of corresponding algorithms in the binary domain. We intend to continue research in the technology independent optimization of such non-deterministic logic networks.

Technology dependent optimization techniques are also being studied for various applications. Binary encoding techniques have been studied for synchronous hardware implementations. We also research design flow and mapping techniques that produce delay-insensitive asynchronous circuits, which are more power efficient than their synchronous counterpart, and are correct by construction. For software synthesis in a hw/sw codesign context, we are studying efficient algorithms to generate fast code from an optimal generalized cofactoring structure.

[1]
A. Mishchenk and R. K. Brayton, "A Theory of Non-Deterministic Networks," Int. Symp. Boolean Problems, Freiberg Germany, 2002.
[2]
M. Gao, J-H. Jiang, Y. Jiang, Y. Li, A. Mishchenko, S. Sinha, T. Villa, and R. Brayton, "Optimization of Multi-Valued Multi-level Networks," Int. Symp. Multiple-Valued Logic, Boston, MA, May 2002.
[3]
Y. Jiang and R. K. Brayton, "Software Synthesis from Synchronous Specifications Using Logic Simulation Techniques," Design Automation Conf., New Orleans, LA, June 2002.
1Postdoctoral Researcher, Portland State University
2Visiting Researcher, PARADES EEIG, Rome, Italy

More information (http://www-cad.eecs.berkeley.edu/mvsis) or

Send mail to the author : (wjiang@eecs.berkeley.edu)

Reducing Multi-Valued Algebraic Operations to Binary

Jie-Hong R. Jiang and Alan Mishchenko1
(Professor Robert K. Brayton)
California State Micro program and (SRC) 682-004

Algebraic operations were developed for binary logic synthesis and extended later to apply to multi-valued (MV) logic. Operations in the MV domain were considered more complex and slower. We show that MV algebraic operations are essentially as easy as binary ones, with only a slight overhead (linear in the expressions) in transformation into and out of the binary domain.

By introducing co-singleton sets as a new basis, any MV sum-of-products expression can be rewritten and parsed to a binary logic synthesizer for fast execution. The optimized results can be directly interpreted in the MV domain. This process, called EBD, reduces MV algebraic operations to binary.

A pure MV operation differs mainly from its corresponding EBD one in that the former possesses "semi-algebraic" generality, which has not been implemented for binary logic. Preliminary experimental results show that the proposed methods are significantly faster, with little or no loss in quality when run in comparable scripts of sequences of logic synthesis operations.

We also show that in principle through the co-singleton transform, all MV semi-algebraic operations could be reproduced in the binary domain if the binary operations had semi-algebraic extensions. This result suggests that semi-algebraic operations in the binary domain should be investigated in the future.

1Postdoctoral Researcher, Portland State University

Send mail to the author : (jiejiang@eecs.berkeley.edu)

Sequential Equivalence Verification in the Sum State Space

Jie-Hong R. Jiang
(Professor Robert K. Brayton)
California State Micro program

The state explosion problem limits formal verification on large sequential circuits partly because BDD sizes heavily depend on the number of variables dealt with. In the worst case, a BDD size grows exponentially with the number of variables. Reducing this number can possibly enlarge the verification capacity. In particular, we show how sequential equivalence checking can be done in the sum state space.

Given two finite state machines M1 and M2 with numbers of state variables m1 and m2 respectively, conventional formal methods verify equivalence by traversing the state space of the product machine, with m1 + m2 registers. In contrast, we introduce a different possibility, based on partitioning the state space defined by a multiplexed machine, which can have merely max m1,m2 + 1 registers. This substantial reduction in state variables potentially enables the verification of larger instances. Experimental results show the approach can verify benchmarks with up to 312 registers, including all of the control outputs of microprocessor 8085.

We are currently exploring techniques to extend the verification capacity. One possiblity would be using output transformation such that equivalence classes of states induced by the new outputs are reduced and amortized. Another would be using structure similarities between two sequential circuits to reduce the verification complexity. In addition, we are investigating how to make our verification approach more scalable especially for high performance IC design methodologies.


Send mail to the author : (jiejiang@eecs.berkeley.edu)

Performance Estimation and Floorplanning for Globally Asynchronous Design

Philip Chong
(Professor Robert K. Brayton)
GSRC

Faster on-chip clock rates, increasing delay along global interconnects, and the need for IP reuse have provided an impetus for globally asynchronous design. In such a design environment, increased latency along global interconnects no longer affects functional correctness, but instead affects the performance of the design. Thus, performance estimation becomes a critical part of the floorplanning and physical layout of the design. Our research examines how abstract modeling of system behavior can be used to provide such performance estimation as an integrated part of the floorplanning process.


Send mail to the author : (pchong@eecs.berkeley.edu)

Wireplanning in Logic Synthesis

Satrajit Chatterjee
(Professor Robert K. Brayton)
Semiconductor Research Corporation

In this work we are studying techniques for logic synthesis which lead to circuits which are "better" for placement and for routing. In the current framework, we still separate logic synthesis from physical design. We are trying to characterize properties of Boolean networks which allow for better layout. Thus in addition to the traditional literal count, we seek other metrics for boolean networks which may be better indicators of the wiring complexity during physical design. We hope to modify the traditional logic synthesis algorithms (which optimize for literal count) to optimize for a different cost function which takes into account these new metrics in addition to literal count.


Send mail to the author : (satrajit@eecs.berkeley.edu)

Optimal Code Generation for Sequential Logic

Yunjian Jiang
(Professor Robert K. Brayton)
(SRC) 683-004

In a platform-based design context, we target (1) standardized or application specific processing platforms, and (2) automatic software generation from functional models. With a focus on control-intensive embedded systems, for instance, automobile engine, airplane, and network protocol controls, we use a network of extended finite state machines (EFSMs) as the model of computation. The research goal is to derive proper flow and point algorithms for code generation and mapping, given an existing architecture platform. Initial results are highlighted in [1,2]. Further research includes the following.

Generalized cofactoring from multi-valued relations: given a multi-valued relation, we want to find the optimal cofactoring structure that minimizes the functional evaluation cost. The generalized cofactoring structure is a DAG, where each node is associated with a binary cofactoring function, and the two out going edges represent the positive and negative cofactor. The goal is to find a set of cofactoring functions that minimize the average path length [3] of the structure.

State minimization and re-encoding: given a sequential logic network, we want to find sequential and parallel decompositions of the state machine, so that the output evaluation can be made faster.

[1]
Y. Jiang and R. K. Brayton, "Software Synthesis from Synchronous Specifications Using Logic Simulation Techniques," Design Automation Conf., New Orleans, LA, June 2002.
[2]
M. Baleani, F. Gennari, Y. Jiang, Y. Patel, R. K. Brayton, and A. Sangiovanni-Vincentelli, "HW/SW Partitioning and Code Generation of Embedded Control Applications on a Reconfigurable Architecture Platform," Int. Symp. Hardware/Software Codesign, Estes Park, CO, May 2002.
[3]
T. Sasao, Y. Iguchi, and M. Matsuura, "Comparison of Decision Diagrams for Multiple-Output Logic Functions," Int. Workshop on Logic Synthesis, New Orleans, LA, June 2002.

More information (http://www-cad.eecs.berkeley.edu/mvsis) or

Send mail to the author : (wjiang@eecs.berkeley.edu)

Approximate Refinement for Hybrid Systems

Arkadeb Ghosal, Marcin Jurdzinski1, Rupak Majumdar, and Vinayak Prabhu
(Professor Thomas A. Henzinger)
(AFOSR) MURI F49620-00-1-0327, (MARCO) GSRC 98-DT-660, and (NSF) CCR-9988172

The standard refinement approach for verifying systems consists of constructing an abstract model of the system, proving properties on the abstract level, and finally showing that the real model is a refinement of the abstract model, i.e., every trace in the actual model is present in the abstract one. In many control systems applications, the ultimate objective is to show that the system stays within some parameters. A slight deviation from the ideal behavior may be acceptable provided some constraints are not violated. We are working on a notion of refinement which considers metrics between traces. The standard refinement relation looks at the Boolean metric on traces--either the two traces are the same, or they are not. Approximate refinement uses metrics for which the distance between traces can have a gradient. It quantifies the degree to which the actual system model is close to the abstract one. The designers can decide what degree of closeness they desire. Exact trace inclusion may not be required as the model of the system may be only an approximation of the real world dynamics.

1Former Postdoctoral Researcher

Send mail to the author : (vinayak@eecs.berkeley.edu)

The Giotto Programming Language: Scheduling Theory and Helicopter Implementation

Benjamin Horowitz
(Professor Thomas A. Henzinger)
(DARPA) SEC F33615-C-98-3614

Giotto is a real-time programming language for control systems. A control application typically consists of communicating periodic tasks, together with mode switches that enable and disable tasks. We are pursuing two lines of research relating to Giotto:

(1) Scheduling theory for Giotto. Giotto is platform-independent, and thus does not specify in detail how Giotto programs are to be scheduled. The Giotto compiler's current scheduling algorithm interprets Giotto's operational semantics quite literally. We are investigating how this literal interpretation may be relaxed, allowing more Giotto programs to be scheduled. Our starting point is precedence-constrained scheduling models, drawn from the scheduling theory literature. Single-mode, single-processor Giotto programs may be modeled as periodic instances of 1 | rj; dj; prec; pmtn | - (the problem of scheduling dependent tasks with release times, deadlines, and preemption on a single processor). Using a novel extension of standard earliest deadline first scheduling algorithms, I have developed an algorithm for schedule synthesis that captures interesting schedule optimizations and is polynomial-time. One direction in which these results can be generalized is multi-mode programs. I have developed a novel model, of conditional scheduling problems, that captures the branching behavior of multi-mode programs, and permits the synthesis of schedules in polynomial time. Another direction in which these results might be generalized is multi-processor programs. For distributed and multi-CPU settings, almost all precedence-constrained scheduling models are NP-hard, so I am focusing instead on load-balancing approximation algorithms. This work is being done in conjunction with Tom Henzinger.

(2) A Giotto-based helicopter control system. In conjunction with this theoretical work, we are implementing a Giotto-based helicopter control system for a Yamaha RMAX helicopter, with two goals in mind. The first goal is to produce a modular implementation. The control computer must be prepared to communicate with a wide variety of sensors--global positioning system, inertial measurement unit, etc.--each with its own data format and timing behavior. To manage the complexity in the helicopter control system, we use platform-based design, which introduces multiple layers of abstraction between application and implementation. The second goal is to produce a testable implementation. Since testing the control system on the actual helicopter is time consuming and somewhat risky, we have built a hardware-in-the-loop simulator for the Yamaha RMAX. The idea is to replace the helicopter itself with a simulated model of the helicopter's dynamics. The control system, however, remains the same, and may be tested against the simulator rather than the helicopter itself. This work is being done in conjunction with Judy Liebman, Cedric Ma (who have both graduated), John Koo, Tom Henzinger, Alberto Sangiovanni-Vincentelli, and Shankar Sastry.


Figure 1: Precedence constraints for a single-mode Giotto program

[1]
T. A. Henzinger, B. Horowitz, and C. M. Kirsch, "Giotto: A Time-triggered Language for Embedded Programming," Proc. Int. Workshop on Embedded Software, Tahoe City, CA, October 2001.
[2]
B. Horowitz, J. Liebman, C. Ma, T. J. Koo, T. A. Henzinger, A. Sangiovanni-Vincentelli, and S. Sastry, "Embedded Software Design and System Integration for Rotorcraft UAV Using Platforms," Proc. IFAC World Congress, Barcelona, Spain, July 2002.

More information (http://www-cad.eecs.berkeley.edu/~bhorowit/) or

Send mail to the author : (bhorowit@cs.berkeley.edu)

Time Safety Analysis of Embedded Programs

Christoph Kirsch1, Marco Sanvido2, and Slobodan Matic
(Professor Thomas A. Henzinger)
(DARPA) SEC F33615-C-98-3614

We consider programs of the so-called embedded (E) machine, which is a virtual machine for coordinating the interaction between physical processes and software processes [1]. The E machine executes E code, the timing and control-flow part of a program that specifies the invocation of software processes relative to events. The software processes form the functional part of the program and among them we distinguish two kinds: drivers and tasks. All information between the environment and the tasks flows through drivers, software processes with negligible worst-case execution times (WCET). On the other hand, tasks model processes with nonnegligible WCETs. One way to generate E code is from Giotto, a language designed for multi-modal control applications, where each mode is specified with a set of tasks and drivers that are periodically invoked.

The project addresses the problem of checking whether all possible executions of a given E code program on a given platform satisfy certain timing requirements. The compiler that generates E code expects sufficient performance from the platform, so that the computation of a task always finishes before drivers update task inputs or read task outputs, and before another invocation of the task is scheduled. A trace that satisfies such implicit deadlines is called time safe because it meets all timing requirements of the E code.

In [2] we show that time-safety checking of arbitrary E code is computationally hard. However, a large class of real-time programs can be expressed as restricted subclasses of general E code. For example, a Giotto source program compiled into E code in a specific way can be checked in polynomial time. In this project we will try to exploit the E code structure further to construct efficient algorithms for time-safety checking. We envision an algorithm with two parts. It first computes the deadlines and then checks if the set of tasks with given WCETs and computed deadlines can be scheduled by a specified scheduling strategy. The algorithm does not interpret conditionals in the E code: the deadline is conservatively approximated as the minimum of the deadlines from the two branches of a conditional. The check is performed by building a program graph on E code extended with particular scheduler states.

[1]
T. A. Henzinger and C. M. Kirsch, "The Embedded Machine: Predictable, Portable Real-time Code," Proc. Conf. Programming Languages Design and Implementation, Berlin, Germany June 2002.
[2]
T. A. Henzinger, C. M. Kirsch, R. Majumdar, and S. Matic, "Time-Safety Checking for Embedded Programs," Proc. Int. Conf. Embedded Software, Grenoble, France, October 2002.
1Postdoctoral Researcher
2Postdoctoral Researcher

Send mail to the author : (matic@eecs.berkeley.edu)

Mapping Concurrent Applications onto Architectural Platforms

Andrew Mihal
(Professor Kurt Keutzer)
Gigascale Silicon Research Center

The embedded system design process is characterized by three challenges. One challenge is the task of modeling heterogeneous concurrent applications. Another is the task of finding a programming model for heterogeneous multiprocessor architectural platforms. Compounding each of these challenges is the task of implementing heterogeneous applications on heterogeneous architectures. We believe that the key problem underlying each of these challenges is the modeling of concurrency, and the key to modeling concurrency is to capture concurrent communication formally in models of computation [1].

The aim of this research is to develop a disciplined approach to the design and implementation of communication structures in embedded applications. Our approach combines the network-on-chip paradigm with the models of computation paradigm. We use models of computation to capture the communication requirements of an application as well as to abstract the capabilities of a communication architecture. Then, application requirements and architectural capabilities are matched using a discipline based on network-on-chip principles.

Our design methodology is based on the Y-chart [2]. The Y-chart is an iterative strategy for design space exploration that enables the co-evolution of hardware and software. We extend the multiple views methodology [3] to provide designers with the right abstractions for implementing heterogeneous, concurrent applications on heterogeneous multiprocessor programmable platforms.

[1]
E. A. Lee, "Computing for Embedded Systems," IEEE Instrumentation and Measurement Technology Conf., May 2001.
[2]
B. Kienhuis, E. Deprettere et al., "A Methodology to Design Programmable Embedded Systems," SAMOS: Systems, Architectures, Modeling, and Simulation, ed. F. Deprettere et al., LNCS series, Springer-Verlag, Vol. 2268, November 2001.
[3]
A. Mihal, C. Kulkarni et al., "A Disciplined Approach to the Development of Architectural Platforms," IEEE Design and Test of Computers, November/December 2002.

More information (http://www.gigascale.org/mescal) or

Send mail to the author : (mihal@eecs.berkeley.edu)

Identification of Architectural Bottlenecks in Programming Network Processors

Chidamber Kulkarni1, Christian Sauer2, and Matthias Gries3
(Professor Kurt Keutzer)
GSRC

Network processors exploit task and packet-level parallelism to achieve large throughputs. This has resulted in a huge diversity of architectures for similar applications [1]. From an ease of programmability perspective, these architectures can be classified along three main axes, namely in the type and organization of processing elements, different scheduling schemes between processing elements and other resources, like I/O interface buffers, and communication between different resources, like memory operations as well as bus configurations. Each of the above aspects significantly influence the ease of programmability.

A primary goal of this work is to implement one or more benchmarks on different network processors and understand the complex inter-play between different architectural components and programmability. In the process, we hope to define a broader abstraction for programming such heterogeneous application-specific multi-processors. Such an abstraction can be seen as a hardware abstraction layer across multiple network processors or as a starting point for exploring new heterogeneous ASIPs. It is important to note that this work differs from a typical programming model where the emphasis is on capturing application requirements as compared to ease of mapping or ease of porting across heterogeneous mutliprocessor ASIPs. In this work, we have currently implemented a 16-port IPv4 benchmark [1] on an Intel IXP1200 and Motorola C-PortC-5 and are currently investigating the different architectural influences for both of the implementations.

[1]
M. Tsai, C. Kulkarni, C. Sauer, N. Shah, and K. Keutzer, "A Benchmarking Methodology for Network Processors," Network Processors 2002: Design Principles and Practice, Morgan Kaufmann Publishers Inc., 2002.
1Postdoctoral Researcher
2Visiting Industrial Fellow, Infineon Technologies, CPR-ST, Munich
3Postdoctoral Researcher

More information (http://www.gigascale.org/mescal) or

Send mail to the author : (kulkarni@eecs.berkeley.edu)

Programmable Peripherals

Christian Sauer1, Chidamber Kulkarni2, and Matthias Gries3
(Professor Kurt Keutzer)

Building blocks for application-specific programmable system-on-chip architectures can be classified into (multi-) processors, memory hierarchies, interconnection networks and peripherals. While in the design process a lot of emphasis is put on the first three, peripherals, including communication interfaces as the largest subset, are often underestimated and neglected.

Our analysis [1] for multimedia and network domains clearly shows that (1) peripherals and communication interfaces are a significant share of the system (btween 30 and 40 percent of die area), (2) there is a large number of heterogeneous peripheral functions even on a single chip, and (3) a single function can be even more complex than a processor core.

The main goal of this research is to achieve a reduction in the diversity of peripherals, thereby making the design process more regular and potentially simpler. We propose to achieve the goal by means of a programmable peripheral processor that can be targeted at multiple peripheral functions. The advantages of such a system are not only the unified design process, but also an increase in robustness of the overall system due to a regular software interface for device drivers.

[1]
C. Sauer, "Modeling Peripherals," Fully Programmable Systems Theme, GSRC Workshop, Stanford, CA, March 2002.
1Visiting Industrial Fellow, Infineon Technologies
2Postdoctoral Researcher
3Postdoctoral Researcher

Send mail to the author : (sauer@eecs.berkeley.edu)

Closing the Gap between ASIC and Custom Power

David Chinnery
(Professor Kurt Keutzer)

We have carefully examined the speed gap between ASIC and custom processors. Typical ASIC designs are a factor of x3 to x8 slower than comparable custom designs. The contributing factors to this gap are microarchitecture, timing overhead, logic style, logic design, cell design and wire sizing, layout, and process variation and accessibility. With careful attention to reduce these contributions, some ASICs can close the speed gap to only a factor of x2 to x3 [1].

Power is also a significant concern to designers. The heat dissipated by a chip greatly impacts the cost of packaging and cooling methods (ranging from passive dissipation to refrigeration). The power consumption limits battery life and adds to the cost of running an instrument. Particularly for low cost embedded applications, power consumption directly limits the maximum performance.

Custom logic styles such as dynamic domino logic are perceived as being higher power consumption than static logic. However, as dynamic circuits achieve higher maximum speeds, they can be downsized more and can reduce the supply voltage to meet slower speed targets. Custom logic styles typically have less capacitance in the PMOS pull-up network of gates, due to alternative logic styles and skewed drive strengths. As the gate loads are smaller, the drive strengths required are smaller, doubly impacting the dynamic power consumption of switching capacitances within the circuit.

Given equivalent speed targets for ASIC and custom methodologies, custom approaches can exploit their speed advantage to substantially reduce power consumption. We quantify the contribution of each factor to the power gap between ASIC and custom designs. We then illustrate approaches to reducing ASIC power consumption.

[1]
D. G. Chinnery and K. Keutzer, Closing the Gap between ASIC and Custom: Tools and Techniques for High-Performance ASIC Design, Kluwer Academic Publishers, ISBN 1-4020-7113-2, May 2002.

Send mail to the author : (chinnery@eecs.berkeley.edu)

Globally Optimal Power Minimization with Multiple Supply Voltages, Multiple Threshold Voltages, and Gate Sizing with Skewed Drive Strengths

David Chinnery and Michael Orshansky1
(Professor Kurt Keutzer)
(SRC) 915.001

Some approaches to transistor sizing have used convex posynomial formulations to guarantee a solution that is globally optimal [1,2]. The convex posynomials are fit to experimental data characterizing a gate. More recent circuit optimization methods have generally avoided posynomial models, due to perceived inaccuracies in the models. We wish to find an optimization formulation that includes sizing with threshold voltage and gate voltage assignment.

Previous heuristic approaches [3-5] to transistor threshold voltage assignment to minimize power have not considered simultaneous sizing of transistors and threshold changes. We extend previous work by including supply voltage and performing simultaneous optimization of all gate parameters.

We chose posynomials to formulate the problem of simultaneous transistor sizing with supply voltage and threshold voltage assignment to combinational circuits. Using convex posynomials guarantees a global minimum, and avoids heuristic approaches where circuit variables are optimized independently. The convex posynomials are used for static timing analysis and power analysis. Power analysis includes all components of circuit power: switching capacitance, short-circuit power through a gate when it switches, and leakage current for the dominant leakage state (primarily subthreshold leakage). Posynomials are fit to a continuous range of supply voltage, threshold voltage, drive strength ratio, and drive strength of a gate for a range of input slew and output load capacitance conditions. The fidelity of the posynomial models for delay and power analysis is verified versus SPICE circuit simulations.

Given the globally optimal solution from the continuous sizing, continuous supply voltage, and continuous threshold voltage formulation, we discretize supply voltage and threshold voltage. Upper and lower bounds on the minimum power for a circuit can be easily determined from the posynomial formulation. A limited branch and bound during the iterative discretization ensures a discretized solution near the global optimum of the continuous problem.

[1]
J. P. Fishburn and A. E. Dunlop, "TILOS: A Posynomial Programming Approach to Transistor Sizing," Proc. ICCAD, November 1985.
[2]
M. Matson and L. Glasser, "Macromodeling and Optimization of Digital MOS VLSI Circuits," IEEE Trans. CAD, Vol. 5, No. 4, October 1986.
[3]
S. Sirichotiyakul et al., "Stand-by Power Minimization through Simultaneous Theshold Voltage Selection and Circuit Sizing," Design Automation Conf., New Orleans, LA, June 1999.
[4]
Q. Wang and S. B. K. Vrudhula, "Static Power Optimization of Deep Submicron CMOS Circuits for Dual Vt Technology," Proc. ICCAD, San Jose, CA, November 1998.
[5]
L. Wei, Z. Chen, K. Roy, Y. Ye, and V. De, "Mixed-Vth (MVT) CMOS Circuit Design Methodology for Low Power Applications," Design Automation Conf., New Orleans, LA, June 1999.
1Postdoctoral Researcher

Send mail to the author : (chinnery@eecs.berkeley.edu)

Low Power Circuit Synthesis for Advanced Manufacturing Technologies

David Nguyen, Abhijit Davare, and Michael Orshansky1
(Professor Kurt Keutzer)

Low power design methods are becoming more important as power consumption increases. The effects of higher power consumption include: (1) increased costs, because the extra heat generated must be combated by expensive packaging technology; (2) reduced reliability because of physical effects like electromigration; and (3) shortened overall battery life in portable devices. In our methodology, we propose to take advantage of advances in manufacturing technologies. Advanced manufacturing technology allows the production of circuits with dual supply and threshold voltage levels. Multiple voltage levels can be exploited to trade off power consumption and circuit speed. Because most combinational circuits contain many sub-critical paths, the delay of some circuit components can safely be increased, which results in reduced power consumption. Our research aims to develop a set of efficient algorithms to assign threshold and supply voltages to gates and perform gate sizing in a way that preserves the overall performance of the circuit while maximally reducing power. Our current approach is based on a heuristic that uses linear programming (LP) and integer-linear programming (ILP) to perform the assignment.

1Postdoctoral Researcher

Send mail to the author : (davare@eecs.berkeley.edu)

Low Power Design Strategies for High Performance ASSPs and ASICs

Farhana Sheikh
(Professor Kurt Keutzer)
GSRC

Nanometer designs are creating a variety of new challenges for the semiconductor industry in the areas of manufacturing, design tools, chip architectures, speed, reliability, power management, variability, and verification. Coupled with these technology challenges, there is an increasing demand for products that are low power, high performance, reliable, and portable. This has resulted not only in increased functionality on a single chip but in a more complex design cycle as many physical design and manufacturing issues must be considered in making design tradeoffs at earlier stages of the design process. Of these tradeoffs, a significant portion of design decisions are concerned with minimizing power consumption and power dissipation.

Recently, a number of circuit design and manufacturing techniques to manage and minimize power have been presented: (1) use of multiple supply voltages to minimize dynamic power; (2) use of multiple threshold voltages to minimize static power; (3) use of post-synthesis gate resizing to recover power on non-critical paths; (4) a combination of (1), (2), and (3); and (5) selective phase-shifting to provide richer design tradeoff space for synthesis and mapping.

The goal of this research project is to orchestrate all of these approaches simultaneously to maximally leverage them in automated design methodologies for application-specific standard products (ASSPs) and application-specific integrated circuits (ASICs). Specifically, the desired results of the research are to arrive at new algorithms and design flows that enable designers to create low power, high performance circuits in an automated fashion. Research is focused on both front-end and back-end VLSI CAD flows.

[1]
F. Sheikh, A. Kuehlmann, and K. Keutzer,"Minimum-Power Retiming for Dual-Supply CMOS Circuits," ACM/IEEE Int. Workshop on Timing Issues in the Specification and Synthesis of Digital Systems, December 2002.

More information (http://www-cad.eecs.berkeley.edu/~farhana) or

Send mail to the author : (farhana@eecs.berkeley.edu)

Tools, Languages, and Compilers for Constraint-based Functional Hardware and Software Design

Matthew Moskewicz
(Professor Kurt Keutzer)
GSRC

It is difficult to verify that the components of programmable systems will behave correctly with respect to either high level specifications or the intent of the designers that have implemented them. To focus on a concrete example, microprocessors traditionally use the notion of an instruction set (IS), specified at a high level, to form a boundary between the hardware designers and the compiler authors. While there are many variations on this theme, it has proved a long-lived approach to the complexities of programming and designing microprocessors. However, it is still difficult to verify that a particular implementation of a microprocessor implements a given IS, and the more hardware details the IS hides, the more difficult is it to use the IS as the sole input for an optimizing compiler. Thus, compiler authors must often be familiar with all of the low-level hardware details of a processor, and then carefully construct code using the hidden semantics of the IS with respect to a given processor or processor class to give good performance.

VLIW style instruction sets are an attempt to reduce this semantic gap by exposing more of the behavior of the processor to the compiler. The research question posed here is: "What is required to completely close the semantic gap between the hardware level description of a programmable component and the exported semantics of the same component?" The tentative answer is that as a first step it requires a more formal description of the device, its programability, and its concurrency than is offered by traditional HDLs. Furthermore, given this new description, it is necessary to reformulate much of the abstraction/mapping tool chain from the hardware description to the high-level programmer's model of the device. One way to explore this issue is to attempt to build such a system. The current approach combines a function hardware description language (sharing properties with other concurrent constraint languages such as Oz and Mercury) with the power of complex propositional reasoning (using high performance SAT solvers), and additional theories (a la CVC) including Presburger arithmetic, and any other ones that look cool or necessary. Currently, work is underway to develop a high performance SAT+Presburger solver while pushing the design of simply controlled (similar to a horizontal VLIW) heterogeneous processors to the level of automatic compiler generation. It is expected that the solver will be useful for the implementation of an efficient retargetable compiler, due to the fact that Presburger arithmetic can encode the semantics of the current constraint language that is used to model both the hardware and intermediate forms (at the compiler IR level) of the software.


Send mail to the author : (moskewcz@eecs.berkeley.edu)

Systematically Exploring the Design Space of Network Processors

Matthias Gries1, Chidamber Kulkarni2, and Christian Sauer3
(Professor Kurt Keutzer)
GSRC

Current designs of network processors reveal manifold micro-architectures [1]. Even processors targeted on the same application domain show a variety of differences in micro-architecture topology. It is unclear whether any of the current solutions actually represents a natural and efficient implementation of the required functionality for a given application such as IPv4 forwarding.

In order to enable the systematic evaluation of different existing innovative designs in a reasonable time-frame, we investigate the feasibility of a recently developed analytical performance estimation approach [2] to catch inherent characteristics of the design space and to automatically guide the designer toward optimal solutions in the network processing domain.

We focus in particular on the following issues: (1) extension of the analytical approach to the model and automatically exploring heterogeneous communication architectures; (2) comparison of results from the analytical model with simulation-based results [3]; (3) design space pruning by incorporating technology constraints as well as individual constraints defined by the designer; and (4) incorporation of domain-specific design decisions into an automated exploration framework, such as the partitioning of application tasks between micro-processor threads.

[1]
N. Shah, "Understanding Network Processors," master's thesis, UC Berkeley, September 2001.
[2]
L. Thiele, S. Chakraborty, M. Gries, A. Maxiaguine, and J. Greutert, "Embedded Software in Network Processors--Models and Algorithms," Workshop on Embedded Software, Tahoe City, CA, October 2001.
[3]
M. Tsai, C. Kulkarni, C. Sauer, N. Shah, and K. Keutzer, "A Benchmarking Methodology for Network Processors," Network Processors 2002: Design Principles and Practice, Morgan Kaufmann Publishers Inc., 2002.
1Postdoctoral Researcher
2Postdoctoral Researcher
3Visiting Industrial Fellow, Infineon Technologies

Send mail to the author : (gries@eecs.berkeley.edu)

A Probabilistic Model for Static Timing Analysis of Combinational Circuits

Michael Orshansky1 and Kaushik Ravindran
(Professor Kurt Keutzer)
GSRC

The traditional approach to worst-case static timing analysis uses deterministic delays for gates and wires. These fixed delay values are extracted from best- and worst-case process parameter sets obtained through repeated device simulations. However, with the advent of deep-submicron technology, the intra-chip variations of process components and interconnect geometries is expected to increase by as much as 65%. This, coupled with the presence of variable correlations between delay elements causes further deviation from the fixed delay model. As a consequence, typical worst-case timing analysis leads to overly conservative solutions as it assumes worst-case delay characteristics and fails to recognize the inherent statistical variation. The end product suffers due to lost performance and expensive over-design.

In order to reliably design modern ICs, the timing model must be freed from deterministic approximations and the underlying randomness must be exposed through a coherent probabilistic abstraction. The objective of this research is to propose a framework and methodology for statistical timing analysis. In particular, there are two driving questions: (1) choosing an appropriate probabilistic model to describe delays while trading off between accuracy and computational complexity and (2) implementing an efficient algorithm to compute delay distributions of combinational circuits based on the chosen model. The incorporation of such a probabilistic framework for timing analysis reduces the gap between timing constraints predicted by tools and the actual silicon performance, obviates over-design, and relieves the designer of unnecessary iterations to attain timing closure.

1Postdoctoral Researcher

Send mail to the author : (kaushikr@eecs.berkeley.edu)

A Programming Model for Network Processors

Niraj Shah and William Plishker
(Professor Kurt Keutzer)

The past five years have witnessed over 30 attempts at programmable solutions for packet processing. With these architectures, network processor designers have employed a large variety of hardware techniques to accelerate packet processing, including parallel processing, special-purpose hardware, memory architectures, on-chip communication mechanisms, and use of peripherals [1]. However, despite this architectural innovation, relatively little effort has been made to make these architectures easily programmable. The current practice of program network processors is to use assembly language or a subset of C. This low-level programming language places a large burden on the programmer to understand fine details of the architecture just to implement a packet processing application, let alone optimize it. We believe the programmer should be presented with an abstraction of the underlying hardware, or programming model, which exposes just enough detail to write efficient code for that platform. This work focuses on a programming model for network processors. Our approach starts with an abstraction based on Click, a domain-specific language for packet processing. We've demonstrated the promise of our programming model by implementing an IPv4 router on the Intel IXP1200, a common network processor.

[1]
N. Shah, "Understanding Network Processors," master's thesis, UC Berkeley, September 2001.

More information (http://www-cad.eecs.berkeley.edu/~niraj) or

Send mail to the author : (niraj@eecs.berkeley.edu)

On the Convergence of Switching Windows Computation in the Presence of Crosstalk Noise

Pinhong Chen
(Professor Kurt Keutzer)

Detecting overlap of switching windows between coupling nets is a major static technique to accurately locate crosstalk noise. However, due to the mutual dependency between switching windows, the computation requires iterations to converge. In this research, we discuss the issues and provide answers to the important questions involved in convergence and numerical properties, including the effect of coupling models, multiple convergence points, convergence rate, computational complexity, non-monotonicity, continuity, and the effectiveness of bounds. Numerical fixed-point computation results are used to explain these properties. Our contribution here builds a theoretical foundation for static crosstalk noise analysis [1].

[1]
P. Chen, Y. Kukimoto, C.-C. Teng, and K. Keutzer, "On Convergence of Switching Windows Computation in Presence of Crosstalk Noise," Int. Symp. Physical Design, Del Mar, CA, April 2002.

Send mail to the author : (pinhong@eecs.berkeley.edu)

Refining Switching Window by Time Slots for Crosstalk Noise Calculation

Pinhong Chen
(Professor Kurt Keutzer)

Coupling effects have become a serious concern in UDSM chip designs. Static timing analysis must take these effects into account to correctly ensure the signal integrity. But the conventional approaches give pessimistic results for noise calculation due to an over-conservative assumption on the switching windows.

For crosstalk noise calculation, computing the switching windows of a net helps us identify noise sources accurately. Traditional approaches use a single continuous switching window for a net. Under this model, signal switching is assumed to happen at any time within the window. Although conservative and sound, this model can result in too much pessimism since in reality the exact timing of signal switching is determined by a path delay up to the net, i.e., the underlying circuit structure does not always allow signal switching at an arbitrary time within the continuous switching window. To address this inherent inaccuracy of the continuous switching window, we propose a refinement of the traditional approaches, where signal switching is characterized by a set of discontinuous switching windows instead of a single continuous window. Each continuous switching window is divided into multiple windows, called time slots, and the signal switching activity of each slot is analyzed separately to calculate the maximum noise with more accuracy. By controlling the size of a time slot we can trade off accuracy and runtime, which makes this approach [1] highly scalable. We have confirmed with experiments on industrial circuits that up to 90% of the noise violations detected by the traditional approach can be unreal.

[1]
P. Chen, Y. Kukimoto, and K. Keutzer, "Refining Switching Window by Time Slots for Crosstalk Noise Calculation," ICCAD, San Jose, CA, November 2002.

Send mail to the author : (pinhong@eecs.berkeley.edu)

Correct by Construction Programmable Platforms

Scott Weber
(Professor Kurt Keutzer)
GSRC

We propose a method for describing programmable platforms in a constraint-based pure functional language. This description coupled with a well-defined model of computation allows us to constructively generate and restrict multiple views of the programmable platform. Our multiple views enable designers to abstract and refine the design while our system maintains consistency between the views.


Send mail to the author : (sjweber@eecs.berkeley.edu)

Automation and Optimizations for a Programming Model for Network Processors

William Plishker and Niraj Shah
(Professor Kurt Keutzer)
GSRC

Programmable solutions on network processors for packet processing have relied mostly on assembly or subsets of C as an interface. For practical implementations, programmers must effectively utilize hardware parallelism, arbitrate between shared resources, divide code among threads, lay out shared data, and interface with special function units and peripherals. This is significantly removed from the natural language with which application writers describe their design. To bridge the gap between an application language and the language for a target architecture, a programming model is necessary that hides the unimportant details of the target architecture and exposes those which allow the programmer to claim the full power of the device.

The current programming model we have developed for network processors requires the programmer to specify thread boundaries, lay out shared data, and manually eliminate redundant accesses and computation across elements. Since this presents a potentially enormous design space, this work will attempt to find optimal strategies and heuristics to automate traversing it. Furthermore, there are potential optimizations that may be found in a high-level language that are not easily found once assembler or C is generated. This work will explore the finding of such optimizations automatically.


Send mail to the author : (plishker@eecs.berkeley.edu)

Ptolemy II--Heterogeneous Concurrent Modeling and Design in Java

Adam Cataldo, Chris Chang, Elaine Cheong, Christopher Hylands1, Edward A. Lee2, Xiaojun Liu, Steve Neuendorffer, Claudius Ptolemaeus3, John Reekie4, Mary P. Stewart5, James Yeh, Yang Zhao, Haiyang Zheng, and Ye Zhou
(Professor Edward A. Lee)
Agilent Technologies, Cadence Design Systems, Defense Advanced Research Projects Agency (DARPA), Hitachi, Microelectronics Advanced Research Corporation (MARCO), National Semiconductor, and Philips

Ptolemy II [1] is a set of Java packages supporting heterogeneous, concurrent modeling, simulation, and design of component-based systems. The emphasis is on a clean, modular software architecture, divided into a set of coherent, comprehensible packages. The kernel package supports definition and manipulation of clustered hierarchical graphs, which are collections of entities and relations between those entities. The actor package extends the kernel so that entities have functionality and can communicate via the relations. The domains extend the actor package by imposing models of computation on the interaction between entities. Domains that have been created include:

The Ptolemy II software architecture supports interaction between domains (heterogeneous modeling). For example, FSM can be combined with CT to model hybrid systems.

Ptolemy II includes a number of support packages, such as

For more information about Ptolemy II, see .


Figure 1: Claudius Ptolemaeus

[1]
S. S. Bhattacharyya, E. Cheong, J. Davis II, M. Goel, C. Hylands, B. Kienhuis, E. A. Lee, J. Liu, X. Liu, L. Muliadi, S. Neuendorffer, J. Reekie, N. Smyth, J. Tsay, B. Vogel, W. Williams, Y. Xiong, and H. Zheng, Heterogeneous Concurrent Modeling and Design in Java, UC Berkeley Electronics Research Laboratory Memorandum No. UCB/ERL M02/23, August 2002.
1Staff
2Professor
3Staff
4Postdoctoral Researcher
5Staff

More information (http://ptolemy.eecs.berkeley.edu/ptolemyII) or

Send mail to the author : (eal@eecs.berkeley.edu)

TinyGALS: A Programming Model for Event-driven Embedded Systems

Elaine Cheong and Jie Liu1
(Professor Edward A. Lee)

Networked embedded systems such as wireless sensor networks are usually designed to be event-driven so that they are reactive and power-efficient. Programming embedded systems with multiple reactive tasks is difficult due to the complex nature of managing the concurrency of execution threads and consistency of shared states.

We have designed a globally asynchronous and locally synchronous model, called TinyGALS, for programming event-driven embedded systems [1]. Software components are composed locally through synchronous method calls to form modules, and asynchronous message passing is used between modules to separate the flow of control. In addition, we have designed a guarded yet synchronous model, TinyGUYS, which allows thread-safe sharing of a global state by multiple modules without explicitly passing messages. Our notions of synchrony and asynchrony, which are consistent with the usage of these terms in distributed programming paradigms, refer to whether the software flow of control is immediately transferred to a component.

With this highly structured programming model, all asynchronous message-passing code and module-triggering mechanisms can be automatically generated from a high-level specification. The programming model and code generation facilities have been implemented for a wireless sensor network platform known as the Berkeley motes. TinyOS is an event-based operating system for these networked sensors being developed by the group of Professor David Culler. Our implementation of TinyGALS uses the component model provided by TinyOS, which has an interface abstraction that is consistent with synchronous method calls. The TinyGALS code generator is designed to work with preexisting TinyOS components, thus enabling code reuse.

We are currently working on formalizing the dynamic properties of TinyGALS, as well as developing a more robust implementation for communication and scheduling. TinyGALS is currently intended for running on a single node. We later plan to extend the TinyGALS model to multiple nodes for distributed multi-tasking.

[1]
E. Cheong, J. Liebman, J. Liu, and F. Zhao, "TinyGALS: A Programming Model for Event-driven Embedded Systems," Proc. ACM Symp. Applied Computing, Melbourne, FL, March 2003.
1Palo Alto Research Center (PARC)

Send mail to the author : (celaine@eecs.berkeley.edu)

Study of Synchronous Dataflow

Haiyang Zheng
(Professor Edward A. Lee)
Mobies

The purpose of this project is to study some properties of synchronous dataflow (SDF) including the abstractions of actors and the composibility of SDF models. In a SDF model, actors have static firing rules: they consume and produce a fixed number of tokens in each firing. The token consumption and production rates of an actor are the abstraction of the actor. Based on this abstraction, an efficient, static schedule can be solved with the balance equations between actors. An important constraint about the firing rules of SDF is that an actor can't fire unless all its inputs have enough tokens. Sometimes this constraint is too conservative in that some executable models fail to find the executable schedules [1,2]. This is because the SDF abstraction of the actors isn't sufficient.

This project aims to find a sufficient abstraction with the exposure of the internals of actors. The basic idea is to separate the computation and communication of an actor, and to abstract the communication part. The abstraction focuses on what conditions the actor affects in the environment. The schedule based on this abstraction reflects the computation of actors better than the general SDF schedule. This abstraction can be regarded as a refinement of the SDF abstraction.

This abstraction can be used to study the composibility of SDF models. A composite SDF model has several SDF models embedded inside. The SDF abstraction of this composite model is based on the solution of the balance equations between the actors inside. This abstraction indicates the requirement for the composite model to restore its original state. It also abstracts away some of the internal communication behaviors, which is valuable in the schedulability study of the composition of SDF models. Our new abstraction catches the communication part and the schedulability analysis can be done without flattening the composite models.

[1]
T. M. Parks, J. L. Pino, and E. A. Lee, "A Comparison of Synchronous and Cycle-Static Dataflow," Proc. Signals, Systems and Computers Conf., Vol. 1, 1996.
[2]
G. Bilsen, M. Engels, R. Lauwereins, and J. A. Peperstraete, "Cyclo-static Dataflow," IEEE Trans. Signal Processing, Vol. 44, No. 2, February 1996.

Send mail to the author : (hyzheng@eecs.berkeley.edu)

CAL: A Novel Actor Language

Joern Janneck1, Chris Chang, and Yang Zhao
(Professor Edward A. Lee)

The concept of actors was first introduced by Carl Hewitt as a means of modeling distributed knowledge-based algorithms. Actors have since then become widely used.

CAL is a small domain-specific language for writing down the functionality of actors–specifically including their ports, parameters, typing constraints, and firing rules. It is designed to be embedded in an environment providing necessary infrastructure, such as data types, operations, and function libraries. The goal is to provide a concise high-level description of actors by providing statically analyzable information about the behavior of an actor, such as production and consumption rates, to facilitate scheduling, composition, and static checking of actor networks.

Current work includes:
(1) using program transformation techniques for manipulating CAL actor specifications, and compositions of CAL actors;
(2) generating executable code from an actor description, in order to simulate or eventually implement an actor on some target platform;
(3) scheduling;
(4) compatibility checking between connected actors; and
(5) CAL language developing and changing.

1Postdoctoral Researcher

Send mail to the author : (cbc@eecs.berkeley.edu)

Copernicus: Generating Code from Polymorphic Components and Heterogenous Models

Stephen Neuendorffer and Christopher Hylands1
(Professor Edward A. Lee)
Ptolemy Project

The high level of abstraction possible in component-based modeling offers many advantages, such as simulation speed, the strength of formal models of computation, etc. However, the fundamental weakness of high-level modeling is the difficulty of actual implementation. Traditionally the easiest way to get high performance has been to translate the model by hand into a low-level implementation language. Automatic code generation from the model is sometimes possible by assembling the component specifications, but only with serious performance penalties.

These penalties come from several sources:

  1. A component in the modeling environment is inevitably built to be easy to design with. Components can often accept a variety of data types (type-polymorphism), can be used in a variety of control or communication models (domain-polymorphism), can be configured with different parameters (operational-polymorphism), and can be used in any environment (context-polymorphism). This flexibility directly conflicts with the goals of most optimized implementations.
  2. A model consists of components, their ports, and the connections between those ports. The model of computation assocatied with a model determines how connected components communicate and control their execution. In some cases the boundaries of components correspond to physical boundaries of the implemented system. For example, there might be one component for each physical processor that is connected to a system bus. However, the boundaries of components often have no physical importance in a system. They are specified arbitrarily by the system designer, specified because of the social structure of the group designing the model, or because of the availability of reusable components. Simple code generation strategies blindly preserve the structure of the model, which can result in unnecessary overhead.
At some level, these problems can be ameliorated using traditional code generation strategies. If a component is too flexible to generate good code, then it can be replaced with a specialized hand-written version. If there is unnecessary structure in the model, then change the model so that the structure is removed. However, these solutions conflict directly with good engineering practice and greatly complicate the implementation procedure.

We are developing a code generation strategy that attempts to handle these difficulties automatically. The key idea is that we combine code generation from a model with compilation of the code for individual actors. We call this strategy co-compilation. This strategy directly addresses the difficulties above. We parse the code for an actor and specialize it according to its use in a particular model (the types, the particular domain, the values of parameters, and the connections that have been made to it). We can also perform cross-actor optimizations to eliminate or reorganize the structure of a model.

Co-compilation also offers a straightforward path to code generation from heterogenous models that contain different communication and control strategies organized in a hierarchical structure. We anticipate being able to generate code for a model at one level of the hierarchy and then use the generated code as a component at a higher level of the hierarchy. This can result in reduced overhead as well, since a system designer is not limited to a single model of computation.

We have implemented this code-generation strategy as the Copernicus package, which is part of Ptolemy II. Copernicus parses the Java bytecode for actors, optimizes it, combines it with code generation from the model, and outputs Java bytecode. The resulting generated code is currently useful for high-speed compiled code simulation. We are currently exploring how to generate code for embedded architectures and for FPGAs.

1Staff

More information (http://www.eecs.berkeley.edu) or

Send mail to the author : (neuendor@eecs.berkeley.edu)

An Actor-Oriented Framework for Distributed Software Systems

Xiaojun Liu and Yang Zhao
(Professor Edward A. Lee)

A large number of embedded systems applications require the coordination of physically distributed components or networked embedded sub-systems. Distributing system components across a network can improve the robustness of a system and simplify its architecture by allowing components to run concurrently and independently. However, designing and implementing these systems can be complex and difficult.

In recent years, several middleware technologies, such as the Common Object Request Broker Architecture (CORBA) and the Distributed Component Object Model (DCOM), have been introduced. They are based on object-oriented APIs that offer a level of abstraction and facilitate distributed systems development, but their programming models are too liberal to analyze their formal properties. It is difficult to port an application developed using one particular middleware technology to another.

In this project, we are interested in developing a framework for distributed software systems based on the Ptolemy II actor-oriented infrastructure. We will explore the design of common actor interfaces on top of today’s object-oriented middleware--actor components that comply with such interfaces can be deployed in different middleware environments. We will also explore the concept of models of computation in distributed systems and adapt the current Ptolemy II execution model to distributed architectures. By doing so, it will be possible to provide guarantees on the behavior of the distributed systems, rather than having properties emerge as a by-product of a specific middleware implementation.


Send mail to the author : (liuxj@eecs.berkeley.edu)

Component-based Design for Wireless Networked Systems

Ye Zhou
(Professor Edward A. Lee)
National Semiconductor

In this project, we are trying to provide a solution to support different cellular-network standards, namely GSM and 3G standards of WCDMA, TD-SCDMA, and CDMA2000. Our approach is to use an actor-oriented system-on-chip (SoC) architecture, where a number of small DSP cores are developed to support various programmable functions. The idea behind this is to develop a hardware platform for wireless communications, which can be used to derive various SoC products by changing only the software.

We begin with looking into the RAKE receiver, as it is the key part of CDMA systems. We also focus on two standards first: WCDMA and TD-SCDMA, which appear to be quite popular and promising in research and marketing. The work is planned to progress in the following steps: (1) the study starts with system functional analysis, where we explore a shared block-diagram between the two standards; (2) we will then develop some adaptive algorithms and analyze the system complexity; (3) based on these previous works, we are able to come up with decisions on system architecture and software/hardware partition, and at each step, different levels of simulations are involved to support our ideas; and (4) eventually, the architecture is designed to be at a fairly optimal point, considering the tradeoff among factors such as power, speed, flexibility, etc.

This wireless communications project is conducted in cooperation with National Semiconductor.


Send mail to the author : (zhouye@eecs.berkeley.edu)

A Practical and Adaptive Multi-Stroke Symbol Recognition System

Heloise Hse
(Professor A. Richard Newton)
Hewlett-Packard

With the increasing popularity and availability of devices like personal digital assistants (PDAs), pen computing is becoming more familiar to end users. However, many applications are still designed with moded windows, icons, menus, and a pointer (WIMP) interfaces that treat pens like pointing devices instead of using them for what they are good for: sketching. We are interested in providing a solid infrastructure and a set of utilities for developing sketch-based user interfaces for a variety of applications.

Sketching is simple and natural and is especially desirable for conceptual design either on an individual basis or in a collaborative environment. By embedding recognition engines in sketch-based programs, the resulting drawings can be interpreted and processed. Various computations can be applied to recognized sketches, therefore fully leveraging the power of electronic design tools while maintaining the ease of sketching.

Currently, we are working on multi-stroke symbol recognition for a class of shapes that are commonly used in slide creation and diagram editing. Our technique is independent of stroke order, number, and direction, as well as invariant to rotation, scaling, translation, and reflection. We take the statistical approach using local features to capture shape information and relative position of strokes, thus utilizing both structural and statistical information learned from examples. Furthermore, the recognition system is adaptive such that it learns upon user correction. It also provides feedback to the user so that the user can better understand why misrecognition took place and can adapt the drawing style accordingly.


Send mail to the author : (hwawen@eecs.berkeley.edu)

Distributed Design Data Management for EDA

Mark D. Spiller
(Professor A. Richard Newton)
MARCO/DARPA Gigascale Silicon Research Center

System-on-a-chip designs will have a tremendous impact on the efficient management of design data. Time-to-market constraints require hierarchy and component re-use, while the increasing chip complexity, design sizes, and specialization of components will make geographically distributed design teams more likely. While in an ideal world the definition of interfaces would allow these distributed teams to work in parallel on their respective modules with little interaction, in reality a successful design is likely to require intensive interaction and collaboration throughout the design process. Critical to this process will be the ability to build, integrate, and test the distributed design components, which will require a scalable and efficient data caching architecture.

We are working to identify the needs and requirements for an architecture that will provide integration between heterogeneous tools and efficient support for collaborative incremental design. Areas of interest include relaxed transaction models suited towards collaborative work and intelligent, distributed caching of design data. We are combining tool usage metrics, design data characteristics, and likely interactions throughout the design process to build a representative model of EDA data interactions. This model will be used in experiments that will analyze the tradeoffs of different data management techniques in various design scenarios and resource organizations. Our research goal is to find reliable design data management and transactional semantics for large datasets in a distributed, potentially unreliable, network environment.


More information (http://www-cad.eecs.berkeley.edu/~mds) or

Send mail to the author : (mds@eecs.berkeley.edu)

Hierarchical Power Management for Low-power Reactive Heterogeneous Communication systems

Suet-Fei Li
(Professor Jan M. Rabaey)
Gigascale Silicon Research Center

We aim to develop an efficient OS for complex real time, power-critical, reactive systems implemented on advanced heterogeneous architectures. Event-driven OS, developed specifically to target reactive event-driven systems, is much more efficient than traditional general-purpose OS. TinyOS, an existing event-driven OS, offers some very attractive concepts, but is insufficient to fulfill the ambitious software management role demanded. To overcome the limitations of TinyOS, we proposed an event-driven hierarchical power management framework. The hierarchical structure enhances design scalability, supports concurrency in both the application domain and architecture and enables power control at various granularities. The software management framework implements a hybrid power control policy and has the ability to formally devise and implement an optimal power scheduling policy.


Send mail to the author : (suetfei@eecs.berkeley.edu)

Interface Synthesis and Verification

Roberto Passerone, Jerry Burch1, and Luca de Alfaro2
(Professors Alberto L. Sangiovanni-Vincentelli and Thomas A. Henzinger)

Composing intellectual property blocks is an essential element of a design methodology based on re-use. The composition of these blocks when the IPs have been developed by different groups inside the same company, or by different companies, is notoriously difficult. Design rules in the form of interface specifications have been proposed which try to alleviate the problem by forcing the designers to be precise about the behavior of the individual components, and to verify this behavior under a number of assumptions about the environment in which they have to operate. In this work we study the problem of the specification, synthesis, and verification of interfaces for different models of computation and across models of computation. This work supports the Metropolis Meta-Model in the construction of a system integration platform that can integrate heterogeneous models.

Roughly speaking, two agents are compatible if they fit together as they are. In this work we formalize this notion by introducing, for each model of computation, a set of agents that “behave well,” and by deriving a refinement ordering that represents substitutability, called the conformance ordering. We show that under certain assumptions we can represent the set of compatible agents by a single maximal environment, called a mirror. Two agents are compatible if one conforms to the maximal environment of the other.

When components are taken from legacy systems or from third-party vendors, it is likely that their interface communication protocols are not compatible. In this case, an additional adapter can sometimes be introduced to convert from one agent to the other so that their composition is compatible. The notion of a maximal environment allows us to formulate a synthesis algorithm for an adapter that conforms to the specification of a desired communication.

We formulate the problem in particular for synchronous and asynchronous systems. We use an automata-based and a game-theoretic based approach to solving the adaptation problem and to synthesize a converter between two specified protocols.

[1]
R. Passerone, J. A. Rowson, and A. L. Sangiovanni-Vincentelli, "Automatic Synthesis of Interfaces between Incompatible Protocols," Proc. Design Automation Conf., San Francisco, CA, June 1998.
[2]
R. Passerone, L. de Alfaro, T. A. Henzinger, and A. L. Sangiovanni-Vincentelli, "Convertibility Verification and Converter Synthesis: Two Faces of the Same Coin," Proc. Int. Conf. Computer Aided Design, San Jose, CA, November 2002.
1Outside Adviser (non-EECS), Cadence Berkeley Laboratories
2Professor, UC Santa Cruz

Send mail to the author : (roby@eecs.berkeley.edu)

Path Categorization Criteria for Static Timing Analysis

Alessandra Nardi1 and Luca Carloni
(Professors Alberto L. Sangiovanni-Vincentelli and Kurt Keutzer)

Static timing analysis is the most used approach to critical path detection in combinational circuits. Since traditional criteria used in standard static timing analysis depend on circuit delay assignment, the analysis has to be repeated after each timing optimization step. A new approach to static timing analysis has been recently proposed in [1]. The cornerstone of the method consists of grouping paths into three categories: strongly perceptible, imperceptible, and weakly perceptible. This categorization allows definite distinction between paths that impact the circuit delay, and therefore need to be sped up, and those that do not, thus reducing the timing optimization effort. Furthermore, this new approach has the advantage of being independent on the circuit delay assignment and, therefore, static timing analysis need not be applied over and over after each timing optimization step. The main drawback of the method is that the criteria to check which category a path belongs to, which has been provided by the author, are partially ambiguous and not easy to compute. We identified new criteria for this path categorization based on the flattening of a circuit into an equivalent normal form (ENF). Our criteria, while being straightforward to check, clear any previous ambiguity, thus allowing the capture of the power of this new path categorization.

[1]
A. Saldanha, "Functional Timing Optimization," Proc. ICCAD, San Jose, CA, November 1999.
1Postdoctoral Researcher

More information (http://www-cad.eecs.berkeley.edu/~nardi) or

Send mail to the author : (nardi@eecs.berkeley.edu)

Low Power Network Discovery Methods for Sensor Networks

Abhijit Davare, David Nguyen, and Farinaz Koushanfar
(Professor Alberto L. Sangiovanni-Vincentelli)

Sensor networks present many opportunities for changing the way computers interact with the environment. Sensor network devices present major challenges because they must be functional under strict power requirements. Since wireless communication is the largest source of power consumption, we explore efficient algorithms for information gathering that try to minimize the amount of communication while still gathering the requisite amount of data.

Our research focuses on an efficient traversal method to gather information from the network. We assume that nodes have a fixed communication radius. Our model also assumes that every node has complete information about each of its neighbors. In addition, a node can go into sleep mode, and it will not take part in communication while in this state. Given the above constraints, we are attempting to design heuristic methods that will gather information from all nodes while actually visiting a minimal number of nodes.

Our approach involves a graph theoretical formulation coupled with geographic knowledge about the network. In this approach, each node is represented as a vertex and a communication path is represented as an edge. We are investigating the significance of network perimeter nodes, articulation points, and sub-areas as components of this heuristic approach.


Send mail to the author : (davare@eecs.berkeley.edu)

Upper Bound Estimation for the Noise Current Spectrum of Digital Circuit Blocks

Alessandra Nardi1 and Luca Daniel
(Professor Alberto L. Sangiovanni-Vincentelli)
Semiconductor Research Corporation

The complexity of the systems-on-a-chip design requires an aggressive re-use of intellectual property (IP) circuit blocks. However, IP blocks can be safely re-used only if they do not affect other sensitive components. The switching activity of CMOS digital circuit blocks typically produces high frequency current noise both into the Gnd/Vdd system, and into the substrate of integrated circuits. Such currents can potentially affect circuit reliability and performance of other sensitive components. For instance the Gnd/Vdd currents may produce electromigration, IR voltage drops, voltage oscillations due to resonances, and, possibly, electromagnetic interference. The substrate currents may couple noise to sensitive analog circuitry through body effect or direct capacitive coupling. An analysis of current injection due to switching activity is needed to properly account for all such effects during the design phase. Different effects require different types of current injection models. For instance, power consumption analysis requires time-domain average current estimation over several clock periods. IR drop, electromigration, and timing performance analysis require a time-domain noise current upper-bound with respect to all possible combinations of the inputs. Signal integrity, Gnd/Vdd grid resonances, electromagnetic interference, and substrate coupling on mixed-signal ICs, on the other hand, require an upper-bound on the spectrum of the current injected into the Gnd/Vdd system or into the substrate, respectively. Methodologies developed so far do not address this latter kind of analysis. Therefore, for these problems we are developing a methodology for estimating an upper bound over all possible input combinations for the spectrum of the noise current injected by the switching activity of digital blocks. The methodology identifies an upper bound for the spectrum of a circuit noise current by combining the noise current injected by each gate and accounting for the circuit logic functionality.

[1]
A. Nardi, L. Daniel, and A. Sangiovanni-Vincentelli, A Methodology for the Computation of an Upper Bound on Noise Current Spectrum of CMOS Switching Activity, UC Berkeley Electronics Research Laboratory, Memorandum No. UCB/ERL M02/20, June 2002.
1Postdoctoral Researcher

More information (http://www-cad.eecs.berkeley.edu/~nardi) or

Send mail to the author : (nardi@eecs.berkeley.edu)

Ulysses: Protocol Synthesis from Scenario-based Specifications

Alvise Bonivento and Marco Sgroi
(Professor Alberto L. Sangiovanni-Vincentelli)
GSRC

The design of protocols, i.e., the set of rules that define how the communication among two or more components of a distributed system takes place, requires the use of formal techniques and tools to obtain a correct and efficient implementation. Ulysses is a new approach for protocol synthesis from scenario-based specifications. It allows the designer to specify sequences of interactions called scenarios using message sequence nets (MSNs), a model that includes message sequence charts and explicitly shows the relations of ordering concurrency or conflict among multiple MSCs. It uses Petri Nets (PNs) as an underlying model to formally define the semantics of MSNs and to specify the internal protocol behavior. A covering procedure that uses common case patterns and derives a PN model consistent with the original MSN is defined. Independently-specified scenarios are then composed to derive a minimum cost implementation.


Send mail to the author : (sgroi@eecs.berkeley.edu)

DRAFTS: Distributed Real-Time Applications Fault Tolerant Scheduling

Claudio Pinello
(Professor Alberto L. Sangiovanni-Vincentelli)
Gigascale Silicon Research Center

Some applications are so critical that they need to be resilient to faults in the computing architecture. Typically this is achieved by redundantly scheduling the application on the architecture. Starting from a network of process, some or all of the processes and the data they exchange are replicated. Additional processes may be needed for voting on the results of different replicas of a same process to establish a common result: the consensus problem [1]. We must then devise an assignment and schedule of the augmented network onto the distributed architecture.

It seems profitable to relieve the designer from the burden of devising a fault-tolerant distributed schedule, and opt for an approach based on synthesis.

In order to use resources efficiently, we want to allow the flexible use of passive replicas (replicas of a process that run only when the main replica undergoes a fault). Preliminary results have shown the usefulness of this technique in achieving higher schedulability by "reclaiming" resources from non-running replicas [2]. A further venue of improvement may arise in the context of gracefully degrading applications, where replicas are not an exact copy of the original process. Rather, there may be simpler versions with reduced functionality and/or accuracy and likely less resource requirements [3]. This exposes an opportunity to achieve higher schedulability by requiring strong fault resilience only of the light-weight versions.

Moreover, we want to allow general architectures, removing the strict boundaries of the modules and busses found in the TTA. This also enables more general fault models for the communication subsystem. The resulting architecture is a full-fledged distributed multiprocessor system, where each node can be used per se and not as a mere duplicate of another one. All the parallelism in the hardware can then be exploited to speed up the execution of parallel processes of the application [4,5] without affecting the degree of fault tolerance. We note that most of the results cited above have been derived under very restrictive assumption on the fault model. We believe some of their founding principles can be rephrased in a more general framework. The expected outcome of this research is a systematization of a set of design techniques that could allow for an easy exploration of design alternatives arising from different fault models.

[1]
M. Barborak, M. Malek, and A. Dahbura, "The Consensus Problem in Fault-Tolerant Computing," ACM Computing Surveys, Vol. 25, No. 2, 1993.
[2]
K. Ahn, J. Kim, and S. Hong, "Fault-Tolerant Real-Time Scheduling Using Passive Replicas," Proc. Pacific Rim Int. Symp. Fault-Tolerant Systems, Taipei, Taiwan, December 1997.
[3]
M. Caccamo, G. Buttazzo, and L. Sha, "Capacity Sharing for Overrun Control," Proc. IEEE Real-Time Systems Symp., Orlando, FL, November 2000.
[4]
C. Dima, A. Girault, C. Lavarenne, and Y. Sorel, "Off-line Real-Time Fault-Tolerant Scheduling," Proc. Euromicro Workshop on Parallel and Distributed Processing, Mantova, Italy, February 2001.
[5]
J. Aguilar and M. Hernandez, "Fault Tolerance Protocols for Parallel Programs Based on Tasks Replication," Proc. MASCOTS, San Francisco, CA, August-September 2000.

More information (http://www-cad.eecs.berkeley.edu/~pinello/distributedsystems) or

Send mail to the author : (pinello@eecs.berkeley.edu)

The Fault Tolerant Data Flow platform in Metropolis

Claudio Pinello and Yosinori Watanabe1
(Professor Alberto L. Sangiovanni-Vincentelli)

A new platform library in Metropolis allows the use of the fault tolerant data flow (FTDF) model of computation. FTDF is amenable to specifying periodic feedback control applications. It is structured in such a way that it is possible to analyze it formally and to use automatic and semi-automatic synthesis techniques for obtaining an efficient fault-tolerant implementation of the application under design. The use of the platform is illustrated by a simple controller example: an inverted pendulum stabilizer.

Redundancy can be easily introduced using the Metropolis refinement mechanism. Finally it is possible to simulate and to inject faults in the design. Some of the properties of FTDF include self-synchronization and automatic degradation in case of faults.

1Outside Adviser (non-EECS), Cadence Berkeley Labs

More information (http://www.gigascale.org/metropolis/) or

Send mail to the author : (pinello@eecs.berkeley.edu)

CAMA: A Multi-Valued Satisfiability Solver

Cong Liu, Andreas Kuehlmann1, and Matthew W. Moskewicz
(Professor Alberto L. Sangiovanni-Vincentelli)
GSRC

Boolean satisfiabilty (SAT) has many applications in the area of electronic design automation. In this project, a genuine multi-valued (MV) SAT solver, CAMA, is developed. Unlike encoding approaches, it directly deals with MV variables. We extended the major speed-up techniques used in state-of-the-art binary SAT solvers, such as two literal watch-based Boolean constraint propagation (BCP), conflict-based learning with identifying unique interception point (UIP), non-chronological back-tracking, and so on. An efficient conflict analysis algorithm-based MV SAT is developed. It makes the learned clauses as compact as possible, which has a dramatic impact on the overall performance. Finally, the performance of CAMA is compared to that of Chaff [1], one of the best binary SAT solvers known so far, using a one hot encoding scheme. Our experimental results show that for hard MV SAT problems, CAMA outperforms Chaff. And our analysis also reveals some fundamental reasons why a MV SAT solver has inherent advantages over a binary SAT solver.

[1]
M. Moskewicz, C. Madigan, Y. Zhao, L. Zhang, and S. Malik, "Chaff: Engineering an Efficient SAT Solver," Proc. Design Automation Conf., Las Vegas, NV, June 2001.
1Adjunct Professor

More information (http://www-cad.eecs.berkeley.edu/~congliu/) or

Send mail to the author : (congliu@eecs.berkeley.edu)

Analog Platforms for Analog System-level Design

Fernando De Bernardinis
(Professor Alberto L. Sangiovanni-Vincentelli)
GSRC

System-level analog design is a process largely dominated by heuristics. Given a set of specifications/requirements that describe the system to be realized, the selection of an optimal implementation architecture comes mainly out of experience. Usually, what is achieved is just a feasible point at the system level, while optimality is sought locally at the circuit level. The reason for this is the difficulty in the analog world of knowing if something is realizable without actually attempting to design the circuit. The number of effects to consider and their complex interrelations make this problem approachable only through the experience coming from past designs. One of the main problems is that optimization at the system level is really hard because of the difficulties in abstracting single block behaviors at the system level and in estimating their performances (and feasibility) without further knowledge of the implementation architecture and topology.

The basic idea of the proposed methodology is to extend the concept of platforms developed in the digital world to analog contexts. With an analog platform (AP) [1] we indicate an architectural resource to map analog functionalities during early system level exploration phases and constrain exploration to the feasible space of the considered implementation (or set of implementations). Therefore, an AP consists of a set of behavioral models and of matching performance models. Behavioral models are abstracted parameterized models of the analog functionality which introduce at the functional level a number of non-idealities attributable to the implementation and not intrinsic in the functionality, such as distortion, bandwidth limitations, noise, etc.

We propose applying the AP paradigm to design cases from the BWRC and extending the methodology to capture more accurate design constraints. The activity should start by encapsulating the block architectures required for the candidate transceiver with the AP abstraction. For each block, a set of behavioral models has to be defined and implemented in some executable form, e.g., AMS languages or Simulink/Matlab. Then, a characterization policy and a sampling strategy have to be set up to generate performance models. Particular attention has to be focused on the interfaces of single blocks so that loading effects are properly considered and modeled. For example, the input stage of the mixer has to be considered when extracting the performance figures for the driving LNA, otherwise the loading effect present in the real implementation would not be estimated. For each block, different architectures should be modeled and merged in the platform stack. This allows having a level of abstraction in the platform stack where, for example, two or more LNA topologies are represented by the same functional model and by the union of the respective performance models. When operating at this level, a platform instance selection intrinsically performs architecture selection choosing a performance set achieved by architecture a rather than architecture b or c. At the system level, system optimization will be carried out using a non-linear optimization technique based on behavioral simulations. Possible candidates are simulated annealing and genetic algorithms, which are expected to perform well given the limited number of optimization parameters available at the system level. The challenging design of the reconfigurable radio-mobile terminal should emphasize the strengths and/or weaknesses of APs in evaluating tradeoffs over a wide range of opportunities.

[1]
L. Carloni, F. De Bernardinis, A. Sangiovanni-Vincentelli, and M. Sgroi, “The Art and Science of Integrated Systems Design,” European Solid-State Circuits Conf., Florence, Italy, September 2002.

More information (http://www.eecs.berkeley.edu/~fdb) or

Send mail to the author : (fdb@eecs.berkeley.edu)

Metropolis SystemC-based Simulator

Guang Yang, Daniele Gasperini1, Douglas Densmore, John Moondanos2, and Yosinori Watanabe3
(Professor Alberto L. Sangiovanni-Vincentelli)
GSRC

Metropolis is a design environment for heterogeneous embedded systems. It is more general than other design environments in the sense that it neither biases any application domain, nor favors any particular kind of model of computation. It defines a language called Metropolis MetaModel (MMM), which is one layer above other models of computation. Therefore, it is possible to model any model of computation with MMM. In fact, in Metropolis, MMM is taken as the design input and internal representation.

Starting from the MMM design input, validating the design is the first and most important step. For this purpose, the SystemC-based simulator is widely used in Metropolis. Currently, the SystemC-based simulator can generate simulation traces out of all possibilities resulting from the non-deterministic nature of MMM. The SystemC-based simulator is still under active development. The next step is having the ability to evaluate the performance and check the properties of a design. These require the handling of function-architecture mapping, quantity resolution, and constraints checking. The first two are coming from the concept of platform-based design, where function and architecture are completely separated. By doing a simulation of the mapping between them, it is possible to evaluate the cost of implementing one function by one particular architecture. Other than simulation, it is also sometimes desirable to check properties of a design formally. MMM provides such mechanisms and it would be nice if such mechanisms could be incorporated into the SystemC-based simulator.

1Visiting Scholar, Italy
2Visiting Industrial Fellow, Intel
3Outside Adviser (non-EECS), Cadence Berkeley Lab

Send mail to the author : (guyang@eecs.berkeley.edu)

Constraint-driven Synthesis of System-on-a-Chip Communication Architectures

Luca Carloni and Alessandro Pinto
(Professor Alberto L. Sangiovanni-Vincentelli)
Semiconductor Research Corporation

The design of the communication architecture is an essential step in the realization of a system-on-a-chip (SOC). The semiconductor industry is experiencing a paradigm shift from "computation-bound design" to "communication-bound design." The number of transistors that can be reached in a clock cycle, and not those that can be integrated on a chip, will drive the design process. In the context of a communication-based design methodology for SOC, we are developing the idea of constraint-driven communication synthesis. As logic synthesis was the key to the success of ASIC design in the VLSI era, we predict that the automatic synthesis of the on-chip communication architecture will be the key to the success of SOC design in the gigascale integration era. The goal of this research is to start with a system-level view of the SOC communication problem and automatically synthesize the best communication architecture among the SOC modules while considering both high level issues, such as communication topologies, protocol interfacing, resource sharing, as well as low level ones, such as metal layer allocation, wire buffering, and wire pipelining.

The first step in constraint-driven communication synthesis is the specification of the key communication parameters that govern the interactions among the SOC modules as a set of constraints for the synthesis problem. Then, the synthesis process proceeds in two stages. First, the optimal high-level architecture structure is derived while considering the different communication topologies, the possibility of mixing and matching them, the role of resource sharing (e.g., time multiplexing), the choice of the various communication protocols, and the needs of interfacing incompatible protocols. Notice that since the notion of optimality may vary from one design to another, depending on the designers' priorities (area, power, performance, complexity), the cost figure of a communication topology must be a configurable parameter. Once the best communication architecture is established, the focus shifts to finding the best low-level implementation based on the features that are available with the given technology process. In other words, this step corresponds to designing the global interconnect among the SOC modules and includes exploring the characteristics of the various metal layers, the insertion of buffers (stateless repeaters) on long wires, and the pipelining of (even) longer wires by means of latches (stateful repeaters).

The present research project relies on several results from our previous work. In particular, we have recently presented a generic theoretical framework to support constrained-driven communication synthesis as well as its application to the case of bus-based communication topologies [1]. Also, we plan to incorporate key results from our research on latency-insensitive design [2,3] and interface-based design [4].

[1]
A. Pinto, L. P. Carloni, and A. L. Sangiovanni-Vincentelli, "Constraint-driven Communication Synthesis," Proc. Design Automation Conf., Montreal, Canada, September-October 2002.
[2]
L. P. Carloni, K. L. McMillan, and A. L. Sangiovanni-Vincentelli, "Theory of Latency-Insensitive Design," IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, Vol. 20, No. 9, September 2001.
[3]
L. P. Carloni, K. L. McMillan, A. Saldanha, and A. L. Sangiovanni-Vincentelli, "A Methodology for Correct-by-Construction Latency-Insensitive Design," ed. A. Kuehlmann et al., The Best of ICCAD: 20 Years of Excellence in Computer-Aided Design, Kluwer Academic Publishers, 2003.
[4]
R. Passerone, J. A. Rowson, and A. L. Sangiovanni-Vincentelli, "Automatic Synthesis of Interfaces between Incompatible Protocols," Proc. Design Automation Conf., San Francisco, CA, June 1998.

More information (http://www-cad.eecs.berkeley.edu/~lcarloni) or

Send mail to the author : (lcarloni@eecs.berkeley.edu)

Coping with Latency in System-on-a-Chip Design

Luca Carloni
(Professor Alberto L. Sangiovanni-Vincentelli)
Semiconductor Research Corporation

As commercial demand for system-on-a-chip (SOC)-based products grows, the effective reuse of existing intellectual property design modules (also known as IP cores) is essential to meet the challenges posed by deep-submicron (DSM) technologies and to complete a reliable design within time-to-market constraints. An IP core must be both flexible, to collaborate with other modules within different environments, and independent from the particular details of one-among-many possible implementations. The prerequisite for easy trade, reuse, and assembly of IP cores is the ability to assemble predesigned components with little or no effort. The consequent challenge is addressing the communication and synchronization issues that naturally arise while assembling predesigned components. We believe that the semiconductor industry will experience a paradigm shift from computation- to communication-bound design: The number of transistors that a signal can reach in a clock cycle--not the number that designers can integrate on a chip--will drive the design process. The strategic importance of developing a communication-based design methodology naturally follows.

An essential element of communication-based design is the encapsulation of predesigned functional modules within automatically generated interface structures. Such a strategy ensures a correct-by-construction composition of the system. Latency-insensitive design [1] and the recycling paradigm [2] are a step in this direction. High-end microprocessor designers have traditionally anticipated the challenges that ASIC designers are going to face in working with the next process generation. Latency is increasingly affecting the design of state-of-the-art microprocessors (latency, for example, drove the design of so-called drive stages in the new hyperpipelined Netburst microarchitecture of Intel's Pentium 4) and will become a shaping force for SOC architectures [3]. On the one hand, the increasing impact of latency variations will drive architectures toward modular designs with an explicit global latency mechanism. In the case of multiprocessor architectures, latency variation will lead the designers to expose computation/communication tradeoffs to the software compilers. At the same time, the focus on latency will open the way to new synthesis tools that can automatically generate the hardware interfaces responsible for implementing the appropriate synchronization and communication strategies, such as channel multiplexing, data coherency, and so on. All these considerations lead us to propose a design methodology that guarantees the robustness of the system's functionality and performance with respect to arbitrary latency variations [4].

[1]
L. P. Carloni, K. L. McMillan, and A. L. Sangiovanni-Vincentelli, "Theory of Latency-Insensitive Design," IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, Vol. 20, No. 9, September 2001.
[2]
L. P. Carloni and A. L. Sangiovanni-Vincentelli, "Performance Analysis and Optimization of Latency Insensitive Systems," Proc. Design Automation Conf., Los Angeles, CA, June 2000.
[3]
L. P. Carloni and A. L. Sangiovanni-Vincentelli, "Coping with Latency in SOC Design," IEEE Micro, special issue on systems on chip, Vol. 22, No. 5, September-October 2002.
[4]
L. P. Carloni, K. L. McMillan, A. Saldanha, and A. L. Sangiovanni-Vincentelli, "A Methodology for Correct-by-Construction Latency-Insensitive Design," ed. A. Kuehlmann et al., The Best of ICCAD--20 Years of Excellence in Computer-Aided Design, Kluwer Academic Publishers, 2003.

More information (http://www-cad.eecs.berkeley.edu/~lcarloni) or

Send mail to the author : (lcarloni@eecs.berkeley.edu)

The Theory of Latency Insensitive Design

Luca Carloni and Kenneth L. McMillan1
(Professor Alberto L. Sangiovanni-Vincentelli)
Semiconductor Research Corporation

The theory of latency insensitive design is the foundation of a new correct-by-construction methodology to design complex systems by assembling intellectual property components [1]. Latency insensitive designs are synchronous distributed systems and are realized by composing functional modules that exchange data on communication channels according to an appropriate protocol [2]. The protocol works on the assumption that the modules are stallable, a weak condition to ask them to obey. The goal of the protocol is to guarantee that latency insensitive designs composed of functionally correct modules behave correctly independent of the channel latencies. This allows us to increase the robustness of a design implementation, because any delay variations of a channel can be "recovered" by changing the channel latency while the overall system functionality remains unaffected. As a consequence, an important application of the proposed theory is represented by the latency insensitive methodology to design large digital integrated circuits by using deep sub-micron technologies [3].

[1]
L. P. Carloni, K. L. McMillan, and A. L. Sangiovanni-Vincentelli, "Theory of Latency-Insensitive Design," IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, Vol. 20, No. 9, September 2001.
[2]
L. P. Carloni, K. L. McMillan, and A. L. Sangiovanni-Vincentelli, "Latency Insensitive Protocols," Proc. Int. Conf. Computer-Aided Verification, Trento, Italy, July 1999.
[3]
L. P. Carloni, K. L. McMillan, A. Saldanha, and A. L. Sangiovanni-Vincentelli, "A Methodology for Correct-by-Construction Latency-Insensitive Design," Proc. ICCAD, San Jose, CA, November 1999.
1Cadence Berkeley Laboratories

More information (http://www-cad.eecs.berkeley.edu/~lcarloni/) or

Send mail to the author : (lcarloni@eecs.berkeley.edu)

Simulation Techniques for Electromagnetic Interference (EMI) and Signal Integrity (SI) in High-Speed Electronic Circuits

Luca Daniel and Jacob White1
(Professor Alberto L. Sangiovanni-Vincentelli)
Semiconductor Research Corporation and MARCO Interconnect Focus Center

Electromagnetic interference (EMI) is becoming an exceptionally crucial issue in the design of modern electronic systems. Frequencies are continuously increasing, while integration trends are squeezing entire systems into extraordinarily high circuit densities. In the conventional design methodology, electromagnetic compatibility (EMC) issues are addressed only after a prototype is built. At that stage, the traditional EMC remedies are confined to adding extra components, metal shields, metal plans, or even redesigning the entire system, with a potentially significant impact both on the cost and on the time-to-market of the products. It is our firm opinion that EMC must instead be addressed as early as possible during the design phase, and not after. EMI simulation could shorten the design time, allowing the analysis of circuits before manufacturing.

Large and expensive electronic equipment can effectively employ standard EMC solutions. But smaller, high-volume, low-priced electronic devices, automotive electronics, and most of the embedded systems cannot afford those costly solutions. Such devices are the most susceptible to EMI problems and the most sensitive to post-prototype EMC added costs. Unfortunately, they are also the most difficult to simulate with respect to EMI problems since they require "full-board" capabilities, and no tool is available to date for such task.

Full-board EMI simulation capability would therefore be instrumental in promptly verifying EMC compliance during the very first stages of the design of high-speed PCBs and IC-packages. For this application, we are investigating the use of integral equation methods:

  1. Our research focuses on minimizing the number of unknowns needed to model each conductor in the system. We have found two alternative set of basis functions [1-3], requiring 16 to 20 times fewer unknowns than the classical thin filaments discretization, for the same final accuracy. In our method, we use as basis functions some of the physically admissible solutions of the diffusion equation for the current density in the interior of the conductors.
  2. We are currently implementing our method based on our new basis functions. We solve the resulting linear system using the very efficient Krylov subspace iterative methods. The complexity of the algorithm is dominated by an O(N2) matrix-vector product operation. We propose to further reduce the computational complexity of the iterative method. Fast algorithms exist for such products with complexity O(Nlog(N)). Such algorithms have been already successfully implemented in tools for the static capacitance extraction problem (FASTCAP) and for the magneto-quasi-static problem (FASTHENRY). We plan to modify and extend one such algorithm, Precorrected-FFT, to our full-wave EMI analysis problem.
Desirable results from an EMI analysis range from the characterization of the spectrum emissions observed on a measurement sphere at 10 meters, to the generation of a model of the interconnect structures for instance to be later used within a non-linear time domain simulator.

To address efficiently the first specification, we propose to use an adjoint method to calculate a set of transfer functions from the thousands of circuit inputs (e.g., the pins of the ICs) to the tens of observation points on the measurement sphere.

To address the second specification, we are working on the generation of "dynamical linear systems" that mimic the same behavior of the original interconnect structure, but have a state vector orders of magnitude smaller. In some recent papers of ours, we show how to generate guaranteed passive reduced order models from originally passive structures including dielectrics [4], and for distributed systems with frequency dependent matrices [5] (as generated for instance when accounting for substrate effects or for fullwave effects). Finally we are developping techniques for "parameterized" model order reduction techniques [6] for enabling optimization of interconnect structures.

[1]
L. Daniel, A. Sangiovanni-Vincentelli, and J. White, "Interconnect Electromagnetic Modeling Using Conduction Modes as Global Basis Functions," IEEE Topical Meeting on Electrical Performance of Electronic Packages, Scottsdale, AZ, October 2000.
[2]
L. Daniel, J. White, and A. Sangiovanni-Vincentelli, "Conduction Modes Basis Functions for Efficient Electromagnetic Analysis of On-Chip and Off-Chip Interconnect," Proc. Design Automation Conf., Las Vegas, NV, June 2001.
[3]
L. Daniel, A. Sangiovanni-Vincentelli, and J. White, "Proximity Templates for Modeling of Skin and Proximity Effects on Packages and High Frequency Interconnect," ," ICCAD, San Jose, CA, November 2002.
[4]
L. Daniel, A. Sangiovanni-Vincentelli, and J. White, "Techniques for Including Dielectrics when Extracting Passive Low-Order Models of High Speed Interconnect," ICCAD, San Jose, CA, November 2001.
[5]
L. Daniel, J. Phillips, "Guaranteed Passive Model Order Reduction for Strictly Passive and Causal Distributed Systems," Proc. Design Automation Conf., New Orleans, June 2002.
[6]
L. Daniel, C. Ong, S. Low, K. Lee, J. White, "Geometrically Parameterized Interconnect Performance Models for Interconnect Synthesis," International Symposium on Physical Design, San Diego, CA, May 2002.
1Professor, Massachusetts Institute of Technology

More information (http://www.eecs.berkeley.edu/~dluca) or

Send mail to the author : (dluca@eecs.berkeley.edu)

Synthesis of Robust Control Systems with Computation Constraints

Luigi Palopoli1, Claudio Pinello, and Antonio Bicchi2
(Professor Alberto L. Sangiovanni-Vincentelli)
GSRC

We address the problem of synthesizing real-time embedded controllers, taking into account constraints deriving from the implementation platform. Specifically, we address the problem of controlling a set of independent systems by using a shared single CPU processor board. Assuming a time-triggered model of computation for control processes and a real-time preemptive scheduling policy, the problem of optimizing robustness of the controlled systems is formulated. Decision variables of the optimization problem are the processes' activation periods and the feedback gains. The analytical formulation of the problem enables an efficient numerical solution based on a branch and bound scheme. The resulting design is directly implementable without performance degradations that may otherwise arise due to scheduling and execution delays.

1Postdoctoral Researcher, ReTiS Lab-Scuola Superiore S. Anna, Italy
2Professor, Università di Pisa, Italy

Send mail to the author : (pinello@eecs.berkeley.edu)

Interaction Between Heterogeneous Models: A Formal Approach

Roberto Passerone and Alessandro Pinto
(Professor Alberto L. Sangiovanni-Vincentelli)
GSRC

Embedded systems often exhibit behaviors that are best described using different formalisms and models of computation. While each model offers specialized and efficient techniques to solve particular problems in their domain of application, the interaction across models is made more difficult by their differences. As a result, this interaction is not yet well understood and is often not formally defined. In this project we seek to determine a consistent way to define interactions across different models.

We regard a system as the composition of computational domains. Each computational domain is an island in which the semantics of the coordination and communication among agents is defined, and differs from the semantics of other domains. This definition is generic and covers many possible applications: from communication between models of computation as mathematical domains, to co-simulation of applications described in different programming languages.

We use the framework of trace algebra to describe the semantic domain of each computational domain. We then investigate ways to compose domains in order to obtain additional hybrid domains that exhibit properties from their components. Constructions of this kind include products of algebras and their disjoint sums. We then study the composition of heterogenous agents (agents drawn from different computational domains) in the context of a hybrid domain, and define identity elements with respect to the operation of composition. The objective of this research is to derive correct conversion mechansisms, or communicators, from the identities in the hybrid domain.

[1]
J. Burch, R. Passerone, and A. Sangiovanni-Vincentelli, "Overcoming Heterophobia: Modeling Concurrency in Heterogeneous Systems," Proc. Application of Concurrency to System Design, Newcastle, UK, June 2001.

Send mail to the author : (apinto@eecs.berkeley.edu)

Semantic Foundations for Heterogeneous Systems

Roberto Passerone and Jerry Burch1
(Professor Alberto L. Sangiovanni-Vincentelli)

System level designs often include a wide range of hardware and software components (e.g., control logic, data paths, sensors, actuators, drivers, and data-flow engines). Such different components can be most naturally formalized using different models of computation (MoCs). Heterogeneous systems are those systems that are best described with two or more different MoCs. Typically, a formal methodology is only applicable to a particular MoC. The objective of this research is to develop semantic foundations that can be applied to heterogeneous systems involving an unlimited number of MoCs. This is crucial for the development of a system integration platform, like the Metropolis Meta-Model, that acts as the central repository of heterogeneous information (similar to the Genesis database for layout information). In particular, the ability to formally reason about the interaction of different MoCs is fundamental for system level design and for platforms to establish a solid foundation for analysis, including formal verification, synthesis, and simulation.

Informally, the semantics of a model of computation are given by a function that maps the model to a set of agents, called the semantic domain. In this work we develop a framework in which we can express semantic domains in a form that is close to their natural formulation (i.e., the form that is most convenient for a given model of computation), and yet structured enough to give us results that apply regardless of the particular domain in question. For each domain, we define functions that formalize concepts such as parallel and sequential composition of agents, scoping, and instantiation that are naturally expressed in the model of computation. Each semantic domain is also equipped with an order on the agents that represents a refinement relation. The structure composed of the set of agents and the functions and relations over the set of agents is called an agent algebra.

An important factor in the design of heterogeneous systems is the ability to flexibly use different levels of abstraction. In our work we emphasize the relationships that can be constructed between different semantic domains at different levels of abstraction, and how they are related to the functions and the refinement relation defined on the domain. The concept of a conservative approximation is introduced as a class of functions that preserve the notion of refinement across different semantic domains. These functions are powerful tools that we can use to study the problem of the integration of heterogeneous domains. In particular, two heterogeneous models can be composed and studied in the context of a third semantic domain that is detailed enough to refine both. Abstraction can be used again to convert the results in the respective domains.

[1]
J. R. Burch, R. Passerone, and A. L. Sangiovanni-Vincentelli, "Using Multiple Levels of Abstraction in Embedded Software Design," Proc. Int. Workshop on Embedded Software, Tahoe City, CA, October 2001.
[2]
J. R. Burch, R. Passerone, and A. L. Sangiovanni-Vincentelli, "Overcoming Heterophobia: Modeling Concurrency in Heterogeneous Systems," Proc. Int. Conf. Application of Concurrency to System Design, Newcastle upon Tyne, UK, June 2001.
[3]
J. R. Burch, R. Passerone, and A. L. Sangiovanni-Vincentelli, "Modeling Techniques in Design-by-Refinement Methodologies," Proc. World Conf. Integrated Design and Process Technology, Pasadena, CA, June 2002.
1Outside Adviser (non-EECS), Cadence Berkeley Laboratories

Send mail to the author : (roby@eecs.berkeley.edu)

Communication-driven Hardware Synthesis: A New Processor Design Methodology

Trevor Meyerowitz
(Professor Alberto L. Sangiovanni-Vincentelli)
SRC (Intel)

Processor design today is very much an art form. The architect comes up with a design to support an instruction set, creates a high level simulator, and then passes it on to the design team to implement. This process makes it hard for the architect to explore the design space, and requires the design team to rewrite much of the architect’s model. In this research we present a communication-aware methodology for turning a description of an instruction set architecture (ISA) into a potentially complicated microarchitectural implementation. The three major goals of this work are: reusability, correct by construction designs, and the automatic generation of control and binding logic.

Previous work has produced simple in-order pipelines along with their associated control logic. Through the methodology, we aim to achieve more complicated features such as out-of order execution, speculation, and caching. Instead of aiming for automatic synthesis, we break the process into a series of steps, allowing the design decisions to be considered one at a time, and be viewed separately from the functionality of the system. Each step transforms from the previous step through refinement, mapping, or relaxation


More information (http://www-cad.eecs.berkeley.edu/~tcm) or

Send mail to the author : (tcm@eecs.berkeley.edu)

Communication-based Analog Design

Yanmei Li and Fernando De Bernardinis
(Professor Alberto L. Sangiovanni-Vincentelli)
GSRC

To enable design reusability and system integration, we apply the communication based design (CBD) methodology to analog design, i.e., CBD-A. In CBD, formal specifications are used and communication adapters are introduced to deal with the mismatch between communicating blocks without modifying the incompatible blocks. Such adapters enable the interaction between the incompatible blocks. In CBD-A, similar adapters are inserted between analog blocks. To provide suitable adapters, a formal description of the analog interface is required, such as DC level, dynamic range, driving/loading capabilities, biasing requirements, and so on. On the other hand, since the analog block acting as an adapter introduces some non-ideal effects which will cause degradation on the signal (e.g., SNR degradation), we need to know the specifications of the adaptation block in terms of inserted noise, distortion, CMRR, etc.

Currently, we are exploring CBD-A based on an available hardware platform, field programmable analog array (FPAA). Its unique feature is to implement analog functions in reconfigurable architecture. So, it gives us the analog equivalent of FPGA. The FPAA is based on CMOS-based fully differential switched-capacitor technology with an analog switch fabric.

To implement analog interface adapters with FPAA, some dedicated simulation strategies based on FPAA behavioral models are developed to ensure fast and accurate estimations of the non-ideal effects introduced by specific FPAA configurations. Another part of this research is to formalize the analog interface specification from the electrical point of view and also formalize the signal degradation. We plan to develop a synthesis mechanism to map analog system descriptions and interface specifications onto an FPAA platform.


Send mail to the author : (yanmei@eecs.berkeley.edu)