Modern solvers for the Boolean Satisfiability (SAT) problem rely heavily on the analysis of conflicts in a backtrack search to learn useful constraints to guide the search. Other families of constraints such as linear pseudo-Boolean and cardinality constraints may more naturally express constraints normally found in many EDA applications, making the underlying structure more transparent. These constraints may be handled by generic integer linear programming (ILP) solvers, but they may ignore the Boolean nature of the variables.
We generalize learning schemes from recent SAT solvers to efficiently handle the more expressive constraints. Theoretical results have shown that there may be exponential improvements in runtime in moving from conjunctive normal form to cardinality constraints, which are exhibited in some of our experiments.
Fishbone is a block-level placement and routing scheme that has been developed. The routing uses a two-layer spine topology. The pin locations are configurable and restricted to certain routing grids in order to ensure full routability and precise predictability. With this scheme, exact net topologies are determined by pin positions only. Hence, during block placement, net parameters such as wire length and delay can be derived directly. The construction of Fishbone nets is much faster than for Steiner trees. This enables the integration of block placement and routing. There is no separate routing stage.
We propose two regular circuit structures based on the programmable logic array (PLA). They provide alternatives to the widely used standard cell structure and have better predictability and simpler design methodologies. A whirlpool PLA is a cyclic four-level structure, which has a compact layout. Doppio-ESPRESSO, a four-level logic minimization algorithm, is developed for the synthesis of Whirlpool PLAs. A river PLA is a stack of multiple output PLAs, which uses river routing for the interconnections of the adjacent PLAs. A synthesis algorithm for river PLAs uses simulated annealing, targeting a combination of minimal area and delay.
MVSIS is an experimental software infrastructure for optimization and synthesis of sequential multi-valued logic networks. It is a network of nodes, where each node represents multi-valued relation. It is a structural representation of a non-deterministic finite automaton. The model is general enough for reasoning about formal verification, control synthesis, and synthesis for both hardware and software. It represents new synthesis challenges that were never addressed before. Some initial study is highlighted in [1].
Some synthesis results have been obtained for algebraic and Boolean optimizations, as presented in [2], which are extensions of corresponding algorithms in the binary domain. We intend to continue research in the technology independent optimization of such non-deterministic logic networks.
Technology dependent optimization techniques are also being studied for various applications. Binary encoding techniques have been studied for synchronous hardware implementations. We also research design flow and mapping techniques that produce delay-insensitive asynchronous circuits, which are more power efficient than their synchronous counterpart, and are correct by construction. For software synthesis in a hw/sw codesign context, we are studying efficient algorithms to generate fast code from an optimal generalized cofactoring structure.
Algebraic operations were developed for binary logic synthesis and extended later to apply to multi-valued (MV) logic. Operations in the MV domain were considered more complex and slower. We show that MV algebraic operations are essentially as easy as binary ones, with only a slight overhead (linear in the expressions) in transformation into and out of the binary domain.
By introducing co-singleton sets as a new basis, any MV sum-of-products expression can be rewritten and parsed to a binary logic synthesizer for fast execution. The optimized results can be directly interpreted in the MV domain. This process, called EBD, reduces MV algebraic operations to binary.
A pure MV operation differs mainly from its corresponding EBD one in that the former possesses "semi-algebraic" generality, which has not been implemented for binary logic. Preliminary experimental results show that the proposed methods are significantly faster, with little or no loss in quality when run in comparable scripts of sequences of logic synthesis operations.
We also show that in principle through the co-singleton transform, all MV semi-algebraic operations could be reproduced in the binary domain if the binary operations had semi-algebraic extensions. This result suggests that semi-algebraic operations in the binary domain should be investigated in the future.
1Postdoctoral Researcher, Portland State University
The state explosion problem limits formal verification on large sequential circuits partly because BDD sizes heavily depend on the number of variables dealt with. In the worst case, a BDD size grows exponentially with the number of variables. Reducing this number can possibly enlarge the verification capacity. In particular, we show how sequential equivalence checking can be done in the sum state space.
Given two finite state machines M1 and M2 with numbers of state variables m1 and m2 respectively, conventional formal methods verify equivalence by traversing the state space of the product machine, with m1 + m2 registers. In contrast, we introduce a different possibility, based on partitioning the state space defined by a multiplexed machine, which can have merely max m1,m2 + 1 registers. This substantial reduction in state variables potentially enables the verification of larger instances. Experimental results show the approach can verify benchmarks with up to 312 registers, including all of the control outputs of microprocessor 8085.
We are currently exploring techniques to extend the verification capacity. One possiblity would be using output transformation such that equivalence classes of states induced by the new outputs are reduced and amortized. Another would be using structure similarities between two sequential circuits to reduce the verification complexity. In addition, we are investigating how to make our verification approach more scalable especially for high performance IC design methodologies.
Faster on-chip clock rates, increasing delay along global interconnects, and the need for IP reuse have provided an impetus for globally asynchronous design. In such a design environment, increased latency along global interconnects no longer affects functional correctness, but instead affects the performance of the design. Thus, performance estimation becomes a critical part of the floorplanning and physical layout of the design. Our research examines how abstract modeling of system behavior can be used to provide such performance estimation as an integrated part of the floorplanning process.
In this work we are studying techniques for logic synthesis which lead to circuits which are "better" for placement and for routing. In the current framework, we still separate logic synthesis from physical design. We are trying to characterize properties of Boolean networks which allow for better layout. Thus in addition to the traditional literal count, we seek other metrics for boolean networks which may be better indicators of the wiring complexity during physical design. We hope to modify the traditional logic synthesis algorithms (which optimize for literal count) to optimize for a different cost function which takes into account these new metrics in addition to literal count.
In a platform-based design context, we target (1) standardized or application specific processing platforms, and (2) automatic software generation from functional models. With a focus on control-intensive embedded systems, for instance, automobile engine, airplane, and network protocol controls, we use a network of extended finite state machines (EFSMs) as the model of computation. The research goal is to derive proper flow and point algorithms for code generation and mapping, given an existing architecture platform. Initial results are highlighted in [1,2]. Further research includes the following.
Generalized cofactoring from multi-valued relations: given a multi-valued relation, we want to find the optimal cofactoring structure that minimizes the functional evaluation cost. The generalized cofactoring structure is a DAG, where each node is associated with a binary cofactoring function, and the two out going edges represent the positive and negative cofactor. The goal is to find a set of cofactoring functions that minimize the average path length [3] of the structure.
State minimization and re-encoding: given a sequential logic network, we want to find sequential and parallel decompositions of the state machine, so that the output evaluation can be made faster.
The standard refinement approach for verifying systems consists of constructing an abstract model of the system, proving properties on the abstract level, and finally showing that the real model is a refinement of the abstract model, i.e., every trace in the actual model is present in the abstract one. In many control systems applications, the ultimate objective is to show that the system stays within some parameters. A slight deviation from the ideal behavior may be acceptable provided some constraints are not violated. We are working on a notion of refinement which considers metrics between traces. The standard refinement relation looks at the Boolean metric on traces--either the two traces are the same, or they are not. Approximate refinement uses metrics for which the distance between traces can have a gradient. It quantifies the degree to which the actual system model is close to the abstract one. The designers can decide what degree of closeness they desire. Exact trace inclusion may not be required as the model of the system may be only an approximation of the real world dynamics.
1Former Postdoctoral Researcher
Giotto is a real-time programming language for control systems. A control application typically consists of communicating periodic tasks, together with mode switches that enable and disable tasks. We are pursuing two lines of research relating to Giotto:
(1) Scheduling theory for Giotto. Giotto is platform-independent, and thus does not specify in detail how Giotto programs are to be scheduled. The Giotto compiler's current scheduling algorithm interprets Giotto's operational semantics quite literally. We are investigating how this literal interpretation may be relaxed, allowing more Giotto programs to be scheduled. Our starting point is precedence-constrained scheduling models, drawn from the scheduling theory literature. Single-mode, single-processor Giotto programs may be modeled as periodic instances of 1 | rj; dj; prec; pmtn | - (the problem of scheduling dependent tasks with release times, deadlines, and preemption on a single processor). Using a novel extension of standard earliest deadline first scheduling algorithms, I have developed an algorithm for schedule synthesis that captures interesting schedule optimizations and is polynomial-time. One direction in which these results can be generalized is multi-mode programs. I have developed a novel model, of conditional scheduling problems, that captures the branching behavior of multi-mode programs, and permits the synthesis of schedules in polynomial time. Another direction in which these results might be generalized is multi-processor programs. For distributed and multi-CPU settings, almost all precedence-constrained scheduling models are NP-hard, so I am focusing instead on load-balancing approximation algorithms. This work is being done in conjunction with Tom Henzinger.
(2) A Giotto-based helicopter control system. In conjunction with this theoretical work, we are implementing a Giotto-based helicopter control system for a Yamaha RMAX helicopter, with two goals in mind. The first goal is to produce a modular implementation. The control computer must be prepared to communicate with a wide variety of sensors--global positioning system, inertial measurement unit, etc.--each with its own data format and timing behavior. To manage the complexity in the helicopter control system, we use platform-based design, which introduces multiple layers of abstraction between application and implementation. The second goal is to produce a testable implementation. Since testing the control system on the actual helicopter is time consuming and somewhat risky, we have built a hardware-in-the-loop simulator for the Yamaha RMAX. The idea is to replace the helicopter itself with a simulated model of the helicopter's dynamics. The control system, however, remains the same, and may be tested against the simulator rather than the helicopter itself. This work is being done in conjunction with Judy Liebman, Cedric Ma (who have both graduated), John Koo, Tom Henzinger, Alberto Sangiovanni-Vincentelli, and Shankar Sastry.

Figure 1: Precedence constraints for a single-mode Giotto program
We consider programs of the so-called embedded (E) machine, which is a virtual machine for coordinating the interaction between physical processes and software processes [1]. The E machine executes E code, the timing and control-flow part of a program that specifies the invocation of software processes relative to events. The software processes form the functional part of the program and among them we distinguish two kinds: drivers and tasks. All information between the environment and the tasks flows through drivers, software processes with negligible worst-case execution times (WCET). On the other hand, tasks model processes with nonnegligible WCETs. One way to generate E code is from Giotto, a language designed for multi-modal control applications, where each mode is specified with a set of tasks and drivers that are periodically invoked.
The project addresses the problem of checking whether all possible executions of a given E code program on a given platform satisfy certain timing requirements. The compiler that generates E code expects sufficient performance from the platform, so that the computation of a task always finishes before drivers update task inputs or read task outputs, and before another invocation of the task is scheduled. A trace that satisfies such implicit deadlines is called time safe because it meets all timing requirements of the E code.
In [2] we show that time-safety checking of arbitrary E code is computationally hard. However, a large class of real-time programs can be expressed as restricted subclasses of general E code. For example, a Giotto source program compiled into E code in a specific way can be checked in polynomial time. In this project we will try to exploit the E code structure further to construct efficient algorithms for time-safety checking. We envision an algorithm with two parts. It first computes the deadlines and then checks if the set of tasks with given WCETs and computed deadlines can be scheduled by a specified scheduling strategy. The algorithm does not interpret conditionals in the E code: the deadline is conservatively approximated as the minimum of the deadlines from the two branches of a conditional. The check is performed by building a program graph on E code extended with particular scheduler states.
The embedded system design process is characterized by three challenges. One challenge is the task of modeling heterogeneous concurrent applications. Another is the task of finding a programming model for heterogeneous multiprocessor architectural platforms. Compounding each of these challenges is the task of implementing heterogeneous applications on heterogeneous architectures. We believe that the key problem underlying each of these challenges is the modeling of concurrency, and the key to modeling concurrency is to capture concurrent communication formally in models of computation [1].
The aim of this research is to develop a disciplined approach to the design and implementation of communication structures in embedded applications. Our approach combines the network-on-chip paradigm with the models of computation paradigm. We use models of computation to capture the communication requirements of an application as well as to abstract the capabilities of a communication architecture. Then, application requirements and architectural capabilities are matched using a discipline based on network-on-chip principles.
Our design methodology is based on the Y-chart [2]. The Y-chart is an iterative strategy for design space exploration that enables the co-evolution of hardware and software. We extend the multiple views methodology [3] to provide designers with the right abstractions for implementing heterogeneous, concurrent applications on heterogeneous multiprocessor programmable platforms.
Network processors exploit task and packet-level parallelism to achieve large throughputs. This has resulted in a huge diversity of architectures for similar applications [1]. From an ease of programmability perspective, these architectures can be classified along three main axes, namely in the type and organization of processing elements, different scheduling schemes between processing elements and other resources, like I/O interface buffers, and communication between different resources, like memory operations as well as bus configurations. Each of the above aspects significantly influence the ease of programmability.
A primary goal of this work is to implement one or more benchmarks on different network processors and understand the complex inter-play between different architectural components and programmability. In the process, we hope to define a broader abstraction for programming such heterogeneous application-specific multi-processors. Such an abstraction can be seen as a hardware abstraction layer across multiple network processors or as a starting point for exploring new heterogeneous ASIPs. It is important to note that this work differs from a typical programming model where the emphasis is on capturing application requirements as compared to ease of mapping or ease of porting across heterogeneous mutliprocessor ASIPs. In this work, we have currently implemented a 16-port IPv4 benchmark [1] on an Intel IXP1200 and Motorola C-PortC-5 and are currently investigating the different architectural influences for both of the implementations.
Building blocks for application-specific programmable system-on-chip architectures can be classified into (multi-) processors, memory hierarchies, interconnection networks and peripherals. While in the design process a lot of emphasis is put on the first three, peripherals, including communication interfaces as the largest subset, are often underestimated and neglected.
Our analysis [1] for multimedia and network domains clearly shows that (1) peripherals and communication interfaces are a significant share of the system (btween 30 and 40 percent of die area), (2) there is a large number of heterogeneous peripheral functions even on a single chip, and (3) a single function can be even more complex than a processor core.
The main goal of this research is to achieve a reduction in the diversity of peripherals, thereby making the design process more regular and potentially simpler. We propose to achieve the goal by means of a programmable peripheral processor that can be targeted at multiple peripheral functions. The advantages of such a system are not only the unified design process, but also an increase in robustness of the overall system due to a regular software interface for device drivers.
We have carefully examined the speed gap between ASIC and custom processors. Typical ASIC designs are a factor of x3 to x8 slower than comparable custom designs. The contributing factors to this gap are microarchitecture, timing overhead, logic style, logic design, cell design and wire sizing, layout, and process variation and accessibility. With careful attention to reduce these contributions, some ASICs can close the speed gap to only a factor of x2 to x3 [1].
Power is also a significant concern to designers. The heat dissipated by a chip greatly impacts the cost of packaging and cooling methods (ranging from passive dissipation to refrigeration). The power consumption limits battery life and adds to the cost of running an instrument. Particularly for low cost embedded applications, power consumption directly limits the maximum performance.
Custom logic styles such as dynamic domino logic are perceived as being higher power consumption than static logic. However, as dynamic circuits achieve higher maximum speeds, they can be downsized more and can reduce the supply voltage to meet slower speed targets. Custom logic styles typically have less capacitance in the PMOS pull-up network of gates, due to alternative logic styles and skewed drive strengths. As the gate loads are smaller, the drive strengths required are smaller, doubly impacting the dynamic power consumption of switching capacitances within the circuit.
Given equivalent speed targets for ASIC and custom methodologies, custom approaches can exploit their speed advantage to substantially reduce power consumption. We quantify the contribution of each factor to the power gap between ASIC and custom designs. We then illustrate approaches to reducing ASIC power consumption.
Some approaches to transistor sizing have used convex posynomial formulations to guarantee a solution that is globally optimal [1,2]. The convex posynomials are fit to experimental data characterizing a gate. More recent circuit optimization methods have generally avoided posynomial models, due to perceived inaccuracies in the models. We wish to find an optimization formulation that includes sizing with threshold voltage and gate voltage assignment.
Previous heuristic approaches [3-5] to transistor threshold voltage assignment to minimize power have not considered simultaneous sizing of transistors and threshold changes. We extend previous work by including supply voltage and performing simultaneous optimization of all gate parameters.
We chose posynomials to formulate the problem of simultaneous transistor sizing with supply voltage and threshold voltage assignment to combinational circuits. Using convex posynomials guarantees a global minimum, and avoids heuristic approaches where circuit variables are optimized independently. The convex posynomials are used for static timing analysis and power analysis. Power analysis includes all components of circuit power: switching capacitance, short-circuit power through a gate when it switches, and leakage current for the dominant leakage state (primarily subthreshold leakage). Posynomials are fit to a continuous range of supply voltage, threshold voltage, drive strength ratio, and drive strength of a gate for a range of input slew and output load capacitance conditions. The fidelity of the posynomial models for delay and power analysis is verified versus SPICE circuit simulations.
Given the globally optimal solution from the continuous sizing, continuous supply voltage, and continuous threshold voltage formulation, we discretize supply voltage and threshold voltage. Upper and lower bounds on the minimum power for a circuit can be easily determined from the posynomial formulation. A limited branch and bound during the iterative discretization ensures a discretized solution near the global optimum of the continuous problem.
Low power design methods are becoming more important as power consumption increases. The effects of higher power consumption include: (1) increased costs, because the extra heat generated must be combated by expensive packaging technology; (2) reduced reliability because of physical effects like electromigration; and (3) shortened overall battery life in portable devices. In our methodology, we propose to take advantage of advances in manufacturing technologies. Advanced manufacturing technology allows the production of circuits with dual supply and threshold voltage levels. Multiple voltage levels can be exploited to trade off power consumption and circuit speed. Because most combinational circuits contain many sub-critical paths, the delay of some circuit components can safely be increased, which results in reduced power consumption. Our research aims to develop a set of efficient algorithms to assign threshold and supply voltages to gates and perform gate sizing in a way that preserves the overall performance of the circuit while maximally reducing power. Our current approach is based on a heuristic that uses linear programming (LP) and integer-linear programming (ILP) to perform the assignment.
1Postdoctoral Researcher
Nanometer designs are creating a variety of new challenges for the semiconductor industry in the areas of manufacturing, design tools, chip architectures, speed, reliability, power management, variability, and verification. Coupled with these technology challenges, there is an increasing demand for products that are low power, high performance, reliable, and portable. This has resulted not only in increased functionality on a single chip but in a more complex design cycle as many physical design and manufacturing issues must be considered in making design tradeoffs at earlier stages of the design process. Of these tradeoffs, a significant portion of design decisions are concerned with minimizing power consumption and power dissipation.
Recently, a number of circuit design and manufacturing techniques to manage and minimize power have been presented: (1) use of multiple supply voltages to minimize dynamic power; (2) use of multiple threshold voltages to minimize static power; (3) use of post-synthesis gate resizing to recover power on non-critical paths; (4) a combination of (1), (2), and (3); and (5) selective phase-shifting to provide richer design tradeoff space for synthesis and mapping.
The goal of this research project is to orchestrate all of these approaches simultaneously to maximally leverage them in automated design methodologies for application-specific standard products (ASSPs) and application-specific integrated circuits (ASICs). Specifically, the desired results of the research are to arrive at new algorithms and design flows that enable designers to create low power, high performance circuits in an automated fashion. Research is focused on both front-end and back-end VLSI CAD flows.
It is difficult to verify that the components of programmable systems will behave correctly with respect to either high level specifications or the intent of the designers that have implemented them. To focus on a concrete example, microprocessors traditionally use the notion of an instruction set (IS), specified at a high level, to form a boundary between the hardware designers and the compiler authors. While there are many variations on this theme, it has proved a long-lived approach to the complexities of programming and designing microprocessors. However, it is still difficult to verify that a particular implementation of a microprocessor implements a given IS, and the more hardware details the IS hides, the more difficult is it to use the IS as the sole input for an optimizing compiler. Thus, compiler authors must often be familiar with all of the low-level hardware details of a processor, and then carefully construct code using the hidden semantics of the IS with respect to a given processor or processor class to give good performance.
VLIW style instruction sets are an attempt to reduce this semantic gap by exposing more of the behavior of the processor to the compiler. The research question posed here is: "What is required to completely close the semantic gap between the hardware level description of a programmable component and the exported semantics of the same component?" The tentative answer is that as a first step it requires a more formal description of the device, its programability, and its concurrency than is offered by traditional HDLs. Furthermore, given this new description, it is necessary to reformulate much of the abstraction/mapping tool chain from the hardware description to the high-level programmer's model of the device. One way to explore this issue is to attempt to build such a system. The current approach combines a function hardware description language (sharing properties with other concurrent constraint languages such as Oz and Mercury) with the power of complex propositional reasoning (using high performance SAT solvers), and additional theories (a la CVC) including Presburger arithmetic, and any other ones that look cool or necessary. Currently, work is underway to develop a high performance SAT+Presburger solver while pushing the design of simply controlled (similar to a horizontal VLIW) heterogeneous processors to the level of automatic compiler generation. It is expected that the solver will be useful for the implementation of an efficient retargetable compiler, due to the fact that Presburger arithmetic can encode the semantics of the current constraint language that is used to model both the hardware and intermediate forms (at the compiler IR level) of the software.
Current designs of network processors reveal manifold micro-architectures [1]. Even processors targeted on the same application domain show a variety of differences in micro-architecture topology. It is unclear whether any of the current solutions actually represents a natural and efficient implementation of the required functionality for a given application such as IPv4 forwarding.
In order to enable the systematic evaluation of different existing innovative designs in a reasonable time-frame, we investigate the feasibility of a recently developed analytical performance estimation approach [2] to catch inherent characteristics of the design space and to automatically guide the designer toward optimal solutions in the network processing domain.
We focus in particular on the following issues: (1) extension of the analytical approach to the model and automatically exploring heterogeneous communication architectures; (2) comparison of results from the analytical model with simulation-based results [3]; (3) design space pruning by incorporating technology constraints as well as individual constraints defined by the designer; and (4) incorporation of domain-specific design decisions into an automated exploration framework, such as the partitioning of application tasks between micro-processor threads.
The traditional approach to worst-case static timing analysis uses deterministic delays for gates and wires. These fixed delay values are extracted from best- and worst-case process parameter sets obtained through repeated device simulations. However, with the advent of deep-submicron technology, the intra-chip variations of process components and interconnect geometries is expected to increase by as much as 65%. This, coupled with the presence of variable correlations between delay elements causes further deviation from the fixed delay model. As a consequence, typical worst-case timing analysis leads to overly conservative solutions as it assumes worst-case delay characteristics and fails to recognize the inherent statistical variation. The end product suffers due to lost performance and expensive over-design.
In order to reliably design modern ICs, the timing model must be freed from deterministic approximations and the underlying randomness must be exposed through a coherent probabilistic abstraction. The objective of this research is to propose a framework and methodology for statistical timing analysis. In particular, there are two driving questions: (1) choosing an appropriate probabilistic model to describe delays while trading off between accuracy and computational complexity and (2) implementing an efficient algorithm to compute delay distributions of combinational circuits based on the chosen model. The incorporation of such a probabilistic framework for timing analysis reduces the gap between timing constraints predicted by tools and the actual silicon performance, obviates over-design, and relieves the designer of unnecessary iterations to attain timing closure.
1Postdoctoral Researcher
The past five years have witnessed over 30 attempts at programmable solutions for packet processing. With these architectures, network processor designers have employed a large variety of hardware techniques to accelerate packet processing, including parallel processing, special-purpose hardware, memory architectures, on-chip communication mechanisms, and use of peripherals [1]. However, despite this architectural innovation, relatively little effort has been made to make these architectures easily programmable. The current practice of program network processors is to use assembly language or a subset of C. This low-level programming language places a large burden on the programmer to understand fine details of the architecture just to implement a packet processing application, let alone optimize it. We believe the programmer should be presented with an abstraction of the underlying hardware, or programming model, which exposes just enough detail to write efficient code for that platform. This work focuses on a programming model for network processors. Our approach starts with an abstraction based on Click, a domain-specific language for packet processing. We've demonstrated the promise of our programming model by implementing an IPv4 router on the Intel IXP1200, a common network processor.
Detecting overlap of switching windows between coupling nets is a major static technique to accurately locate crosstalk noise. However, due to the mutual dependency between switching windows, the computation requires iterations to converge. In this research, we discuss the issues and provide answers to the important questions involved in convergence and numerical properties, including the effect of coupling models, multiple convergence points, convergence rate, computational complexity, non-monotonicity, continuity, and the effectiveness of bounds. Numerical fixed-point computation results are used to explain these properties. Our contribution here builds a theoretical foundation for static crosstalk noise analysis [1].
Coupling effects have become a serious concern in UDSM chip designs. Static timing analysis must take these effects into account to correctly ensure the signal integrity. But the conventional approaches give pessimistic results for noise calculation due to an over-conservative assumption on the switching windows.
For crosstalk noise calculation, computing the switching windows of a net helps us identify noise sources accurately. Traditional approaches use a single continuous switching window for a net. Under this model, signal switching is assumed to happen at any time within the window. Although conservative and sound, this model can result in too much pessimism since in reality the exact timing of signal switching is determined by a path delay up to the net, i.e., the underlying circuit structure does not always allow signal switching at an arbitrary time within the continuous switching window. To address this inherent inaccuracy of the continuous switching window, we propose a refinement of the traditional approaches, where signal switching is characterized by a set of discontinuous switching windows instead of a single continuous window. Each continuous switching window is divided into multiple windows, called time slots, and the signal switching activity of each slot is analyzed separately to calculate the maximum noise with more accuracy. By controlling the size of a time slot we can trade off accuracy and runtime, which makes this approach [1] highly scalable. We have confirmed with experiments on industrial circuits that up to 90% of the noise violations detected by the traditional approach can be unreal.
We propose a method for describing programmable platforms in a constraint-based pure functional language. This description coupled with a well-defined model of computation allows us to constructively generate and restrict multiple views of the programmable platform. Our multiple views enable designers to abstract and refine the design while our system maintains consistency between the views.
Programmable solutions on network processors for packet processing have relied mostly on assembly or subsets of C as an interface. For practical implementations, programmers must effectively utilize hardware parallelism, arbitrate between shared resources, divide code among threads, lay out shared data, and interface with special function units and peripherals. This is significantly removed from the natural language with which application writers describe their design. To bridge the gap between an application language and the language for a target architecture, a programming model is necessary that hides the unimportant details of the target architecture and exposes those which allow the programmer to claim the full power of the device.
The current programming model we have developed for network processors requires the programmer to specify thread boundaries, lay out shared data, and manually eliminate redundant accesses and computation across elements. Since this presents a potentially enormous design space, this work will attempt to find optimal strategies and heuristics to automate traversing it. Furthermore, there are potential optimizations that may be found in a high-level language that are not easily found once assembler or C is generated. This work will explore the finding of such optimizations automatically.
Ptolemy II [1] is a set of Java packages supporting heterogeneous, concurrent modeling, simulation, and design of component-based systems. The emphasis is on a clean, modular software architecture, divided into a set of coherent, comprehensible packages. The kernel package supports definition and manipulation of clustered hierarchical graphs, which are collections of entities and relations between those entities. The actor package extends the kernel so that entities have functionality and can communicate via the relations. The domains extend the actor package by imposing models of computation on the interaction between entities. Domains that have been created include:
Ptolemy II includes a number of support packages, such as

Figure 1: Claudius Ptolemaeus
Networked embedded systems such as wireless sensor networks are usually designed to be event-driven so that they are reactive and power-efficient. Programming embedded systems with multiple reactive tasks is difficult due to the complex nature of managing the concurrency of execution threads and consistency of shared states.
We have designed a globally asynchronous and locally synchronous model, called TinyGALS, for programming event-driven embedded systems [1]. Software components are composed locally through synchronous method calls to form modules, and asynchronous message passing is used between modules to separate the flow of control. In addition, we have designed a guarded yet synchronous model, TinyGUYS, which allows thread-safe sharing of a global state by multiple modules without explicitly passing messages. Our notions of synchrony and asynchrony, which are consistent with the usage of these terms in distributed programming paradigms, refer to whether the software flow of control is immediately transferred to a component.
With this highly structured programming model, all asynchronous message-passing code and module-triggering mechanisms can be automatically generated from a high-level specification. The programming model and code generation facilities have been implemented for a wireless sensor network platform known as the Berkeley motes. TinyOS is an event-based operating system for these networked sensors being developed by the group of Professor David Culler. Our implementation of TinyGALS uses the component model provided by TinyOS, which has an interface abstraction that is consistent with synchronous method calls. The TinyGALS code generator is designed to work with preexisting TinyOS components, thus enabling code reuse.
We are currently working on formalizing the dynamic properties of TinyGALS, as well as developing a more robust implementation for communication and scheduling. TinyGALS is currently intended for running on a single node. We later plan to extend the TinyGALS model to multiple nodes for distributed multi-tasking.
The purpose of this project is to study some properties of synchronous dataflow (SDF) including the abstractions of actors and the composibility of SDF models. In a SDF model, actors have static firing rules: they consume and produce a fixed number of tokens in each firing. The token consumption and production rates of an actor are the abstraction of the actor. Based on this abstraction, an efficient, static schedule can be solved with the balance equations between actors. An important constraint about the firing rules of SDF is that an actor can't fire unless all its inputs have enough tokens. Sometimes this constraint is too conservative in that some executable models fail to find the executable schedules [1,2]. This is because the SDF abstraction of the actors isn't sufficient.
This project aims to find a sufficient abstraction with the exposure of the internals of actors. The basic idea is to separate the computation and communication of an actor, and to abstract the communication part. The abstraction focuses on what conditions the actor affects in the environment. The schedule based on this abstraction reflects the computation of actors better than the general SDF schedule. This abstraction can be regarded as a refinement of the SDF abstraction.
This abstraction can be used to study the composibility of SDF models. A composite SDF model has several SDF models embedded inside. The SDF abstraction of this composite model is based on the solution of the balance equations between the actors inside. This abstraction indicates the requirement for the composite model to restore its original state. It also abstracts away some of the internal communication behaviors, which is valuable in the schedulability study of the composition of SDF models. Our new abstraction catches the communication part and the schedulability analysis can be done without flattening the composite models.
The concept of actors was first introduced by Carl Hewitt as a means of modeling distributed knowledge-based algorithms. Actors have since then become widely used.
CAL is a small domain-specific language for writing down the functionality of actors–specifically including their ports, parameters, typing constraints, and firing rules. It is designed to be embedded in an environment providing necessary infrastructure, such as data types, operations, and function libraries. The goal is to provide a concise high-level description of actors by providing statically analyzable information about the behavior of an actor, such as production and consumption rates, to facilitate scheduling, composition, and static checking of actor networks.
Current work includes:
(1) using program transformation techniques for manipulating CAL actor specifications, and compositions of CAL actors;
(2) generating executable code from an actor description, in order to simulate or eventually implement an actor on some target platform;
(3) scheduling;
(4) compatibility checking between connected actors; and
(5) CAL language developing and changing.
1Postdoctoral Researcher
The high level of abstraction possible in component-based modeling offers many advantages, such as simulation speed, the strength of formal models of computation, etc. However, the fundamental weakness of high-level modeling is the difficulty of actual implementation. Traditionally the easiest way to get high performance has been to translate the model by hand into a low-level implementation language. Automatic code generation from the model is sometimes possible by assembling the component specifications, but only with serious performance penalties.
These penalties come from several sources:
We are developing a code generation strategy that attempts to handle these difficulties automatically. The key idea is that we combine code generation from a model with compilation of the code for individual actors. We call this strategy co-compilation. This strategy directly addresses the difficulties above. We parse the code for an actor and specialize it according to its use in a particular model (the types, the particular domain, the values of parameters, and the connections that have been made to it). We can also perform cross-actor optimizations to eliminate or reorganize the structure of a model.
Co-compilation also offers a straightforward path to code generation from heterogenous models that contain different communication and control strategies organized in a hierarchical structure. We anticipate being able to generate code for a model at one level of the hierarchy and then use the generated code as a component at a higher level of the hierarchy. This can result in reduced overhead as well, since a system designer is not limited to a single model of computation.
We have implemented this code-generation strategy as the Copernicus package, which is part of Ptolemy II. Copernicus parses the Java bytecode for actors, optimizes it, combines it with code generation from the model, and outputs Java bytecode. The resulting generated code is currently useful for high-speed compiled code simulation. We are currently exploring how to generate code for embedded architectures and for FPGAs.
1Staff
A large number of embedded systems applications require the coordination of physically distributed components or networked embedded sub-systems. Distributing system components across a network can improve the robustness of a system and simplify its architecture by allowing components to run concurrently and independently. However, designing and implementing these systems can be complex and difficult.
In recent years, several middleware technologies, such as the Common Object Request Broker Architecture (CORBA) and the Distributed Component Object Model (DCOM), have been introduced. They are based on object-oriented APIs that offer a level of abstraction and facilitate distributed systems development, but their programming models are too liberal to analyze their formal properties. It is difficult to port an application developed using one particular middleware technology to another.
In this project, we are interested in developing a framework for distributed software systems based on the Ptolemy II actor-oriented infrastructure. We will explore the design of common actor interfaces on top of today’s object-oriented middleware--actor components that comply with such interfaces can be deployed in different middleware environments. We will also explore the concept of models of computation in distributed systems and adapt the current Ptolemy II execution model to distributed architectures. By doing so, it will be possible to provide guarantees on the behavior of the distributed systems, rather than having properties emerge as a by-product of a specific middleware implementation.
In this project, we are trying to provide a solution to support different cellular-network standards, namely GSM and 3G standards of WCDMA, TD-SCDMA, and CDMA2000. Our approach is to use an actor-oriented system-on-chip (SoC) architecture, where a number of small DSP cores are developed to support various programmable functions. The idea behind this is to develop a hardware platform for wireless communications, which can be used to derive various SoC products by changing only the software.
We begin with looking into the RAKE receiver, as it is the key part of CDMA systems. We also focus on two standards first: WCDMA and TD-SCDMA, which appear to be quite popular and promising in research and marketing. The work is planned to progress in the following steps: (1) the study starts with system functional analysis, where we explore a shared block-diagram between the two standards; (2) we will then develop some adaptive algorithms and analyze the system complexity; (3) based on these previous works, we are able to come up with decisions on system architecture and software/hardware partition, and at each step, different levels of simulations are involved to support our ideas; and (4) eventually, the architecture is designed to be at a fairly optimal point, considering the tradeoff among factors such as power, speed, flexibility, etc.
This wireless communications project is conducted in cooperation with National Semiconductor.
With the increasing popularity and availability of devices like personal digital assistants (PDAs), pen computing is becoming more familiar to end users. However, many applications are still designed with moded windows, icons, menus, and a pointer (WIMP) interfaces that treat pens like pointing devices instead of using them for what they are good for: sketching. We are interested in providing a solid infrastructure and a set of utilities for developing sketch-based user interfaces for a variety of applications.
Sketching is simple and natural and is especially desirable for conceptual design either on an individual basis or in a collaborative environment. By embedding recognition engines in sketch-based programs, the resulting drawings can be interpreted and processed. Various computations can be applied to recognized sketches, therefore fully leveraging the power of electronic design tools while maintaining the ease of sketching.
Currently, we are working on multi-stroke symbol recognition for a class of shapes that are commonly used in slide creation and diagram editing. Our technique is independent of stroke order, number, and direction, as well as invariant to rotation, scaling, translation, and reflection. We take the statistical approach using local features to capture shape information and relative position of strokes, thus utilizing both structural and statistical information learned from examples. Furthermore, the recognition system is adaptive such that it learns upon user correction. It also provides feedback to the user so that the user can better understand why misrecognition took place and can adapt the drawing style accordingly.
System-on-a-chip designs will have a tremendous impact on the efficient management of design data. Time-to-market constraints require hierarchy and component re-use, while the increasing chip complexity, design sizes, and specialization of components will make geographically distributed design teams more likely. While in an ideal world the definition of interfaces would allow these distributed teams to work in parallel on their respective modules with little interaction, in reality a successful design is likely to require intensive interaction and collaboration throughout the design process. Critical to this process will be the ability to build, integrate, and test the distributed design components, which will require a scalable and efficient data caching architecture.
We are working to identify the needs and requirements for an architecture that will provide integration between heterogeneous tools and efficient support for collaborative incremental design. Areas of interest include relaxed transaction models suited towards collaborative work and intelligent, distributed caching of design data. We are combining tool usage metrics, design data characteristics, and likely interactions throughout the design process to build a representative model of EDA data interactions. This model will be used in experiments that will analyze the tradeoffs of different data management techniques in various design scenarios and resource organizations. Our research goal is to find reliable design data management and transactional semantics for large datasets in a distributed, potentially unreliable, network environment.
We aim to develop an efficient OS for complex real time, power-critical, reactive systems implemented on advanced heterogeneous architectures. Event-driven OS, developed specifically to target reactive event-driven systems, is much more efficient than traditional general-purpose OS. TinyOS, an existing event-driven OS, offers some very attractive concepts, but is insufficient to fulfill the ambitious software management role demanded. To overcome the limitations of TinyOS, we proposed an event-driven hierarchical power management framework. The hierarchical structure enhances design scalability, supports concurrency in both the application domain and architecture and enables power control at various granularities. The software management framework implements a hybrid power control policy and has the ability to formally devise and implement an optimal power scheduling policy.
Composing intellectual property blocks is an essential element of a design methodology based on re-use. The composition of these blocks when the IPs have been developed by different groups inside the same company, or by different companies, is notoriously difficult. Design rules in the form of interface specifications have been proposed which try to alleviate the problem by forcing the designers to be precise about the behavior of the individual components, and to verify this behavior under a number of assumptions about the environment in which they have to operate. In this work we study the problem of the specification, synthesis, and verification of interfaces for different models of computation and across models of computation. This work supports the Metropolis Meta-Model in the construction of a system integration platform that can integrate heterogeneous models.
Roughly speaking, two agents are compatible if they fit together as they are. In this work we formalize this notion by introducing, for each model of computation, a set of agents that “behave well,” and by deriving a refinement ordering that represents substitutability, called the conformance ordering. We show that under certain assumptions we can represent the set of compatible agents by a single maximal environment, called a mirror. Two agents are compatible if one conforms to the maximal environment of the other.
When components are taken from legacy systems or from third-party vendors, it is likely that their interface communication protocols are not compatible. In this case, an additional adapter can sometimes be introduced to convert from one agent to the other so that their composition is compatible. The notion of a maximal environment allows us to formulate a synthesis algorithm for an adapter that conforms to the specification of a desired communication.
We formulate the problem in particular for synchronous and asynchronous systems. We use an automata-based and a game-theoretic based approach to solving the adaptation problem and to synthesize a converter between two specified protocols.
Static timing analysis is the most used approach to critical path detection in combinational circuits. Since traditional criteria used in standard static timing analysis depend on circuit delay assignment, the analysis has to be repeated after each timing optimization step. A new approach to static timing analysis has been recently proposed in [1]. The cornerstone of the method consists of grouping paths into three categories: strongly perceptible, imperceptible, and weakly perceptible. This categorization allows definite distinction between paths that impact the circuit delay, and therefore need to be sped up, and those that do not, thus reducing the timing optimization effort. Furthermore, this new approach has the advantage of being independent on the circuit delay assignment and, therefore, static timing analysis need not be applied over and over after each timing optimization step. The main drawback of the method is that the criteria to check which category a path belongs to, which has been provided by the author, are partially ambiguous and not easy to compute. We identified new criteria for this path categorization based on the flattening of a circuit into an equivalent normal form (ENF). Our criteria, while being straightforward to check, clear any previous ambiguity, thus allowing the capture of the power of this new path categorization.
Sensor networks present many opportunities for changing the way computers interact with the environment. Sensor network devices present major challenges because they must be functional under strict power requirements. Since wireless communication is the largest source of power consumption, we explore efficient algorithms for information gathering that try to minimize the amount of communication while still gathering the requisite amount of data.
Our research focuses on an efficient traversal method to gather information from the network. We assume that nodes have a fixed communication radius. Our model also assumes that every node has complete information about each of its neighbors. In addition, a node can go into sleep mode, and it will not take part in communication while in this state. Given the above constraints, we are attempting to design heuristic methods that will gather information from all nodes while actually visiting a minimal number of nodes.
Our approach involves a graph theoretical formulation coupled with geographic knowledge about the network. In this approach, each node is represented as a vertex and a communication path is represented as an edge. We are investigating the significance of network perimeter nodes, articulation points, and sub-areas as components of this heuristic approach.
The complexity of the systems-on-a-chip design requires an aggressive re-use of intellectual property (IP) circuit blocks. However, IP blocks can be safely re-used only if they do not affect other sensitive components. The switching activity of CMOS digital circuit blocks typically produces high frequency current noise both into the Gnd/Vdd system, and into the substrate of integrated circuits. Such currents can potentially affect circuit reliability and performance of other sensitive components. For instance the Gnd/Vdd currents may produce electromigration, IR voltage drops, voltage oscillations due to resonances, and, possibly, electromagnetic interference. The substrate currents may couple noise to sensitive analog circuitry through body effect or direct capacitive coupling. An analysis of current injection due to switching activity is needed to properly account for all such effects during the design phase. Different effects require different types of current injection models. For instance, power consumption analysis requires time-domain average current estimation over several clock periods. IR drop, electromigration, and timing performance analysis require a time-domain noise current upper-bound with respect to all possible combinations of the inputs. Signal integrity, Gnd/Vdd grid resonances, electromagnetic interference, and substrate coupling on mixed-signal ICs, on the other hand, require an upper-bound on the spectrum of the current injected into the Gnd/Vdd system or into the substrate, respectively. Methodologies developed so far do not address this latter kind of analysis. Therefore, for these problems we are developing a methodology for estimating an upper bound over all possible input combinations for the spectrum of the noise current injected by the switching activity of digital blocks. The methodology identifies an upper bound for the spectrum of a circuit noise current by combining the noise current injected by each gate and accounting for the circuit logic functionality.
The design of protocols, i.e., the set of rules that define how the communication among two or more components of a distributed system takes place, requires the use of formal techniques and tools to obtain a correct and efficient implementation. Ulysses is a new approach for protocol synthesis from scenario-based specifications. It allows the designer to specify sequences of interactions called scenarios using message sequence nets (MSNs), a model that includes message sequence charts and explicitly shows the relations of ordering concurrency or conflict among multiple MSCs. It uses Petri Nets (PNs) as an underlying model to formally define the semantics of MSNs and to specify the internal protocol behavior. A covering procedure that uses common case patterns and derives a PN model consistent with the original MSN is defined. Independently-specified scenarios are then composed to derive a minimum cost implementation.
Some applications are so critical that they need to be resilient to faults in the computing architecture. Typically this is achieved by redundantly scheduling the application on the architecture. Starting from a network of process, some or all of the processes and the data they exchange are replicated. Additional processes may be needed for voting on the results of different replicas of a same process to establish a common result: the consensus problem [1]. We must then devise an assignment and schedule of the augmented network onto the distributed architecture.
It seems profitable to relieve the designer from the burden of devising a fault-tolerant distributed schedule, and opt for an approach based on synthesis.
In order to use resources efficiently, we want to allow the flexible use of passive replicas (replicas of a process that run only when the main replica undergoes a fault). Preliminary results have shown the usefulness of this technique in achieving higher schedulability by "reclaiming" resources from non-running replicas [2]. A further venue of improvement may arise in the context of gracefully degrading applications, where replicas are not an exact copy of the original process. Rather, there may be simpler versions with reduced functionality and/or accuracy and likely less resource requirements [3]. This exposes an opportunity to achieve higher schedulability by requiring strong fault resilience only of the light-weight versions.
Moreover, we want to allow general architectures, removing the strict boundaries of the modules and busses found in the TTA. This also enables more general fault models for the communication subsystem. The resulting architecture is a full-fledged distributed multiprocessor system, where each node can be used per se and not as a mere duplicate of another one. All the parallelism in the hardware can then be exploited to speed up the execution of parallel processes of the application [4,5] without affecting the degree of fault tolerance. We note that most of the results cited above have been derived under very restrictive assumption on the fault model. We believe some of their founding principles can be rephrased in a more general framework. The expected outcome of this research is a systematization of a set of design techniques that could allow for an easy exploration of design alternatives arising from different fault models.
A new platform library in Metropolis allows the use of the fault tolerant data flow (FTDF) model of computation. FTDF is amenable to specifying periodic feedback control applications. It is structured in such a way that it is possible to analyze it formally and to use automatic and semi-automatic synthesis techniques for obtaining an efficient fault-tolerant implementation of the application under design. The use of the platform is illustrated by a simple controller example: an inverted pendulum stabilizer.
Redundancy can be easily introduced using the Metropolis refinement mechanism. Finally it is possible to simulate and to inject faults in the design. Some of the properties of FTDF include self-synchronization and automatic degradation in case of faults.
1Outside Adviser (non-EECS), Cadence Berkeley Labs
Boolean satisfiabilty (SAT) has many applications in the area of electronic design automation. In this project, a genuine multi-valued (MV) SAT solver, CAMA, is developed. Unlike encoding approaches, it directly deals with MV variables. We extended the major speed-up techniques used in state-of-the-art binary SAT solvers, such as two literal watch-based Boolean constraint propagation (BCP), conflict-based learning with identifying unique interception point (UIP), non-chronological back-tracking, and so on. An efficient conflict analysis algorithm-based MV SAT is developed. It makes the learned clauses as compact as possible, which has a dramatic impact on the overall performance. Finally, the performance of CAMA is compared to that of Chaff [1], one of the best binary SAT solvers known so far, using a one hot encoding scheme. Our experimental results show that for hard MV SAT problems, CAMA outperforms Chaff. And our analysis also reveals some fundamental reasons why a MV SAT solver has inherent advantages over a binary SAT solver.
System-level analog design is a process largely dominated by heuristics. Given a set of specifications/requirements that describe the system to be realized, the selection of an optimal implementation architecture comes mainly out of experience. Usually, what is achieved is just a feasible point at the system level, while optimality is sought locally at the circuit level. The reason for this is the difficulty in the analog world of knowing if something is realizable without actually attempting to design the circuit. The number of effects to consider and their complex interrelations make this problem approachable only through the experience coming from past designs. One of the main problems is that optimization at the system level is really hard because of the difficulties in abstracting single block behaviors at the system level and in estimating their performances (and feasibility) without further knowledge of the implementation architecture and topology.
The basic idea of the proposed methodology is to extend the concept of platforms developed in the digital world to analog contexts. With an analog platform (AP) [1] we indicate an architectural resource to map analog functionalities during early system level exploration phases and constrain exploration to the feasible space of the considered implementation (or set of implementations). Therefore, an AP consists of a set of behavioral models and of matching performance models. Behavioral models are abstracted parameterized models of the analog functionality which introduce at the functional level a number of non-idealities attributable to the implementation and not intrinsic in the functionality, such as distortion, bandwidth limitations, noise, etc.
We propose applying the AP paradigm to design cases from the BWRC and extending the methodology to capture more accurate design constraints. The activity should start by encapsulating the block architectures required for the candidate transceiver with the AP abstraction. For each block, a set of behavioral models has to be defined and implemented in some executable form, e.g., AMS languages or Simulink/Matlab. Then, a characterization policy and a sampling strategy have to be set up to generate performance models. Particular attention has to be focused on the interfaces of single blocks so that loading effects are properly considered and modeled. For example, the input stage of the mixer has to be considered when extracting the performance figures for the driving LNA, otherwise the loading effect present in the real implementation would not be estimated. For each block, different architectures should be modeled and merged in the platform stack. This allows having a level of abstraction in the platform stack where, for example, two or more LNA topologies are represented by the same functional model and by the union of the respective performance models. When operating at this level, a platform instance selection intrinsically performs architecture selection choosing a performance set achieved by architecture a rather than architecture b or c. At the system level, system optimization will be carried out using a non-linear optimization technique based on behavioral simulations. Possible candidates are simulated annealing and genetic algorithms, which are expected to perform well given the limited number of optimization parameters available at the system level. The challenging design of the reconfigurable radio-mobile terminal should emphasize the strengths and/or weaknesses of APs in evaluating tradeoffs over a wide range of opportunities.
Metropolis is a design environment for heterogeneous embedded systems. It is more general than other design environments in the sense that it neither biases any application domain, nor favors any particular kind of model of computation. It defines a language called Metropolis MetaModel (MMM), which is one layer above other models of computation. Therefore, it is possible to model any model of computation with MMM. In fact, in Metropolis, MMM is taken as the design input and internal representation.
Starting from the MMM design input, validating the design is the first and most important step. For this purpose, the SystemC-based simulator is widely used in Metropolis. Currently, the SystemC-based simulator can generate simulation traces out of all possibilities resulting from the non-deterministic nature of MMM. The SystemC-based simulator is still under active development. The next step is having the ability to evaluate the performance and check the properties of a design. These require the handling of function-architecture mapping, quantity resolution, and constraints checking. The first two are coming from the concept of platform-based design, where function and architecture are completely separated. By doing a simulation of the mapping between them, it is possible to evaluate the cost of implementing one function by one particular architecture. Other than simulation, it is also sometimes desirable to check properties of a design formally. MMM provides such mechanisms and it would be nice if such mechanisms could be incorporated into the SystemC-based simulator.
1Visiting Scholar, Italy
2Visiting Industrial Fellow, Intel
3Outside Adviser (non-EECS), Cadence Berkeley Lab
The design of the communication architecture is an essential step in the realization of a system-on-a-chip (SOC). The semiconductor industry is experiencing a paradigm shift from "computation-bound design" to "communication-bound design." The number of transistors that can be reached in a clock cycle, and not those that can be integrated on a chip, will drive the design process. In the context of a communication-based design methodology for SOC, we are developing the idea of constraint-driven communication synthesis. As logic synthesis was the key to the success of ASIC design in the VLSI era, we predict that the automatic synthesis of the on-chip communication architecture will be the key to the success of SOC design in the gigascale integration era. The goal of this research is to start with a system-level view of the SOC communication problem and automatically synthesize the best communication architecture among the SOC modules while considering both high level issues, such as communication topologies, protocol interfacing, resource sharing, as well as low level ones, such as metal layer allocation, wire buffering, and wire pipelining.
The first step in constraint-driven communication synthesis is the specification of the key communication parameters that govern the interactions among the SOC modules as a set of constraints for the synthesis problem. Then, the synthesis process proceeds in two stages. First, the optimal high-level architecture structure is derived while considering the different communication topologies, the possibility of mixing and matching them, the role of resource sharing (e.g., time multiplexing), the choice of the various communication protocols, and the needs of interfacing incompatible protocols. Notice that since the notion of optimality may vary from one design to another, depending on the designers' priorities (area, power, performance, complexity), the cost figure of a communication topology must be a configurable parameter. Once the best communication architecture is established, the focus shifts to finding the best low-level implementation based on the features that are available with the given technology process. In other words, this step corresponds to designing the global interconnect among the SOC modules and includes exploring the characteristics of the various metal layers, the insertion of buffers (stateless repeaters) on long wires, and the pipelining of (even) longer wires by means of latches (stateful repeaters).
The present research project relies on several results from our previous work. In particular, we have recently presented a generic theoretical framework to support constrained-driven communication synthesis as well as its application to the case of bus-based communication topologies [1]. Also, we plan to incorporate key results from our research on latency-insensitive design [2,3] and interface-based design [4].
As commercial demand for system-on-a-chip (SOC)-based products grows, the effective reuse of existing intellectual property design modules (also known as IP cores) is essential to meet the challenges posed by deep-submicron (DSM) technologies and to complete a reliable design within time-to-market constraints. An IP core must be both flexible, to collaborate with other modules within different environments, and independent from the particular details of one-among-many possible implementations. The prerequisite for easy trade, reuse, and assembly of IP cores is the ability to assemble predesigned components with little or no effort. The consequent challenge is addressing the communication and synchronization issues that naturally arise while assembling predesigned components. We believe that the semiconductor industry will experience a paradigm shift from computation- to communication-bound design: The number of transistors that a signal can reach in a clock cycle--not the number that designers can integrate on a chip--will drive the design process. The strategic importance of developing a communication-based design methodology naturally follows.
An essential element of communication-based design is the encapsulation of predesigned functional modules within automatically generated interface structures. Such a strategy ensures a correct-by-construction composition of the system. Latency-insensitive design [1] and the recycling paradigm [2] are a step in this direction. High-end microprocessor designers have traditionally anticipated the challenges that ASIC designers are going to face in working with the next process generation. Latency is increasingly affecting the design of state-of-the-art microprocessors (latency, for example, drove the design of so-called drive stages in the new hyperpipelined Netburst microarchitecture of Intel's Pentium 4) and will become a shaping force for SOC architectures [3]. On the one hand, the increasing impact of latency variations will drive architectures toward modular designs with an explicit global latency mechanism. In the case of multiprocessor architectures, latency variation will lead the designers to expose computation/communication tradeoffs to the software compilers. At the same time, the focus on latency will open the way to new synthesis tools that can automatically generate the hardware interfaces responsible for implementing the appropriate synchronization and communication strategies, such as channel multiplexing, data coherency, and so on. All these considerations lead us to propose a design methodology that guarantees the robustness of the system's functionality and performance with respect to arbitrary latency variations [4].
The theory of latency insensitive design is the foundation of a new correct-by-construction methodology to design complex systems by assembling intellectual property components [1]. Latency insensitive designs are synchronous distributed systems and are realized by composing functional modules that exchange data on communication channels according to an appropriate protocol [2]. The protocol works on the assumption that the modules are stallable, a weak condition to ask them to obey. The goal of the protocol is to guarantee that latency insensitive designs composed of functionally correct modules behave correctly independent of the channel latencies. This allows us to increase the robustness of a design implementation, because any delay variations of a channel can be "recovered" by changing the channel latency while the overall system functionality remains unaffected. As a consequence, an important application of the proposed theory is represented by the latency insensitive methodology to design large digital integrated circuits by using deep sub-micron technologies [3].
Electromagnetic interference (EMI) is becoming an exceptionally crucial issue in the design of modern electronic systems. Frequencies are continuously increasing, while integration trends are squeezing entire systems into extraordinarily high circuit densities. In the conventional design methodology, electromagnetic compatibility (EMC) issues are addressed only after a prototype is built. At that stage, the traditional EMC remedies are confined to adding extra components, metal shields, metal plans, or even redesigning the entire system, with a potentially significant impact both on the cost and on the time-to-market of the products. It is our firm opinion that EMC must instead be addressed as early as possible during the design phase, and not after. EMI simulation could shorten the design time, allowing the analysis of circuits before manufacturing.
Large and expensive electronic equipment can effectively employ standard EMC solutions. But smaller, high-volume, low-priced electronic devices, automotive electronics, and most of the embedded systems cannot afford those costly solutions. Such devices are the most susceptible to EMI problems and the most sensitive to post-prototype EMC added costs. Unfortunately, they are also the most difficult to simulate with respect to EMI problems since they require "full-board" capabilities, and no tool is available to date for such task.
Full-board EMI simulation capability would therefore be instrumental in promptly verifying EMC compliance during the very first stages of the design of high-speed PCBs and IC-packages. For this application, we are investigating the use of integral equation methods:
To address efficiently the first specification, we propose to use an adjoint method to calculate a set of transfer functions from the thousands of circuit inputs (e.g., the pins of the ICs) to the tens of observation points on the measurement sphere.
To address the second specification, we are working on the generation of "dynamical linear systems" that mimic the same behavior of the original interconnect structure, but have a state vector orders of magnitude smaller. In some recent papers of ours, we show how to generate guaranteed passive reduced order models from originally passive structures including dielectrics [4], and for distributed systems with frequency dependent matrices [5] (as generated for instance when accounting for substrate effects or for fullwave effects). Finally we are developping techniques for "parameterized" model order reduction techniques [6] for enabling optimization of interconnect structures.
We address the problem of synthesizing real-time embedded controllers, taking into account constraints deriving from the implementation platform. Specifically, we address the problem of controlling a set of independent systems by using a shared single CPU processor board. Assuming a time-triggered model of computation for control processes and a real-time preemptive scheduling policy, the problem of optimizing robustness of the controlled systems is formulated. Decision variables of the optimization problem are the processes' activation periods and the feedback gains. The analytical formulation of the problem enables an efficient numerical solution based on a branch and bound scheme. The resulting design is directly implementable without performance degradations that may otherwise arise due to scheduling and execution delays.
1Postdoctoral Researcher, ReTiS Lab-Scuola Superiore S. Anna, Italy
2Professor, Università di Pisa, Italy
Embedded systems often exhibit behaviors that are best described using different formalisms and models of computation. While each model offers specialized and efficient techniques to solve particular problems in their domain of application, the interaction across models is made more difficult by their differences. As a result, this interaction is not yet well understood and is often not formally defined. In this project we seek to determine a consistent way to define interactions across different models.
We regard a system as the composition of computational domains. Each computational domain is an island in which the semantics of the coordination and communication among agents is defined, and differs from the semantics of other domains. This definition is generic and covers many possible applications: from communication between models of computation as mathematical domains, to co-simulation of applications described in different programming languages.
We use the framework of trace algebra to describe the semantic domain of each computational domain. We then investigate ways to compose domains in order to obtain additional hybrid domains that exhibit properties from their components. Constructions of this kind include products of algebras and their disjoint sums. We then study the composition of heterogenous agents (agents drawn from different computational domains) in the context of a hybrid domain, and define identity elements with respect to the operation of composition. The objective of this research is to derive correct conversion mechansisms, or communicators, from the identities in the hybrid domain.
System level designs often include a wide range of hardware and software components (e.g., control logic, data paths, sensors, actuators, drivers, and data-flow engines). Such different components can be most naturally formalized using different models of computation (MoCs). Heterogeneous systems are those systems that are best described with two or more different MoCs. Typically, a formal methodology is only applicable to a particular MoC. The objective of this research is to develop semantic foundations that can be applied to heterogeneous systems involving an unlimited number of MoCs. This is crucial for the development of a system integration platform, like the Metropolis Meta-Model, that acts as the central repository of heterogeneous information (similar to the Genesis database for layout information). In particular, the ability to formally reason about the interaction of different MoCs is fundamental for system level design and for platforms to establish a solid foundation for analysis, including formal verification, synthesis, and simulation.
Informally, the semantics of a model of computation are given by a function that maps the model to a set of agents, called the semantic domain. In this work we develop a framework in which we can express semantic domains in a form that is close to their natural formulation (i.e., the form that is most convenient for a given model of computation), and yet structured enough to give us results that apply regardless of the particular domain in question. For each domain, we define functions that formalize concepts such as parallel and sequential composition of agents, scoping, and instantiation that are naturally expressed in the model of computation. Each semantic domain is also equipped with an order on the agents that represents a refinement relation. The structure composed of the set of agents and the functions and relations over the set of agents is called an agent algebra.
An important factor in the design of heterogeneous systems is the ability to flexibly use different levels of abstraction. In our work we emphasize the relationships that can be constructed between different semantic domains at different levels of abstraction, and how they are related to the functions and the refinement relation defined on the domain. The concept of a conservative approximation is introduced as a class of functions that preserve the notion of refinement across different semantic domains. These functions are powerful tools that we can use to study the problem of the integration of heterogeneous domains. In particular, two heterogeneous models can be composed and studied in the context of a third semantic domain that is detailed enough to refine both. Abstraction can be used again to convert the results in the respective domains.
Processor design today is very much an art form. The architect comes up with a design to support an instruction set, creates a high level simulator, and then passes it on to the design team to implement. This process makes it hard for the architect to explore the design space, and requires the design team to rewrite much of the architect’s model. In this research we present a communication-aware methodology for turning a description of an instruction set architecture (ISA) into a potentially complicated microarchitectural implementation. The three major goals of this work are: reusability, correct by construction designs, and the automatic generation of control and binding logic.
Previous work has produced simple in-order pipelines along with their associated control logic. Through the methodology, we aim to achieve more complicated features such as out-of order execution, speculation, and caching. Instead of aiming for automatic synthesis, we break the process into a series of steps, allowing the design decisions to be considered one at a time, and be viewed separately from the functionality of the system. Each step transforms from the previous step through refinement, mapping, or relaxation
To enable design reusability and system integration, we apply the communication based design (CBD) methodology to analog design, i.e., CBD-A. In CBD, formal specifications are used and communication adapters are introduced to deal with the mismatch between communicating blocks without modifying the incompatible blocks. Such adapters enable the interaction between the incompatible blocks. In CBD-A, similar adapters are inserted between analog blocks. To provide suitable adapters, a formal description of the analog interface is required, such as DC level, dynamic range, driving/loading capabilities, biasing requirements, and so on. On the other hand, since the analog block acting as an adapter introduces some non-ideal effects which will cause degradation on the signal (e.g., SNR degradation), we need to know the specifications of the adaptation block in terms of inserted noise, distortion, CMRR, etc.
Currently, we are exploring CBD-A based on an available hardware platform, field programmable analog array (FPAA). Its unique feature is to implement analog functions in reconfigurable architecture. So, it gives us the analog equivalent of FPGA. The FPAA is based on CMOS-based fully differential switched-capacitor technology with an analog switch fabric.
To implement analog interface adapters with FPAA, some dedicated simulation strategies based on FPAA behavioral models are developed to ensure fast and accurate estimations of the non-ideal effects introduced by specific FPAA configurations. Another part of this research is to formalize the analog interface specification from the electrical point of view and also formalize the signal degradation. We plan to develop a synthesis mechanism to map analog system descriptions and interface specifications onto an FPAA platform.