I am a Professor in the Computer Science Division of the EECS Department at the University of California, Berkeley. My main research areas are computer architecture, VLSI design, parallel programming and operating system design. I am Director of the new ASPIRE lab tackling the challenge of improving computational efficiency now that transistor scaling is ending. ASPIRE builds upon the earlier success of the Par Lab, whose goal was to make parallel programming accessible to most programmers. I also lead the Architecture Group at the International Computer Science Institute, am an Associate Director at the Berkeley Wireless Research Center, and hold a joint appointment with the Lawrence Berkeley National Laboratory. Previously at MIT, I led the SCALE group, investigating advanced architectures for energy-efficient high-performance computing.
Active Research
Projects | ||
![]() |
The ASPIRE LabASPIRE is a new 5-year research project that recognizes the shift from transistor-scaling-driven performance improvements to a new post-scaling world where whole-stack co-design is the key to improved efficiency. Building on the success of the soon to be completed Par Lab project, it uses deep hardware and software co-tuning to achieve the highest possible performance and energy efficiency for future mobile and rack computing systems. |
|
![]() |
The Parallel Computing Laboratory (Par Lab)With the end of sequential processor performance scaling, multicore processors provide the only path to increased performance and energy efficiency in all platforms from mobile to warehouse-scale computers. The Par Lab was created by a team of Berkeley researchers with the ambitious goal of enabling "most programmers to be productive writing efficient, correct, portable SW for 100+ cores & scale as cores increase every 2 years". |
|
![]() |
Graph Algorithm PlatformGraph algorithms are becoming increasingly important, from warehouse-scale computers reasoning about vast amounts of data for analytics and recommendation applications to mobile clients running recognition and machine-learning applications. Unfortunately, graph algorithms execute inefficiently on current platforms, either shared-memory systems or distributed clusters. The Berkeley Graph Algorithm Platform (GAP) Project is a Par Lab project that spans the entire stack, aiming to accelerate graph algorithms through software optimization and hardware acceleration. |
|
![]() |
A Liquid Thread EnvironmentApplications built by composing different parallel libraries perform poorly when those libraries interfere with one another by obliviously using the same physical cores, leading to destructive resource oversubscription. Lithe was developed in Par Lab as a low-level substrate that provides basic primitives and a standard interface for composing parallel libraries efficiently. Lithe can be inserted underneath the runtimes of legacy parallel libraries to provide bolt-on composability without needing to change existing application code. |
|
![]() |
Tessellation OSTessellation is a manycore OS developed within Par Lab and targeted at the resource management challenges of emerging client devices. Tessellation is built on two central ideas: Space-Time Partitioning and Two-Level Scheduling. |
|
![]() |
The RISC-V Instruction Set ArchitectureRISC-V is a new instruction set architecture (ISA) developed at UC Berkeley as part of Par Lab. RISC-V is designed to be a realistic, clean, and open ISA that is easy to extend for research or subset for education. A wide variety of implementations have been produced including silicon fabrications and FPGA emulations, and RISC-V is being used in a number of classes. A full set of software tools for the architecture are also under development and are being prepared for open distribution. |
|
![]() |
Resiliency for Extreme Energy EfficiencyMost manycore hardware designs have the potential to achieve maximum energy efficiency when operated in a broad range of supply voltages, spanning from nominal down to near the transistor threshold. We are working on new circuit and architectural techniques to enable parallel processors to work across a broad supply range while tolerating technology variability, and providing immunity to soft- and hardāerrors. |
|
![]() |
Constructing Hardware in a Scala Embedded LanguageChisel is a new open-source hardware construction language developed at UC Berkeley that supports advanced hardware design using highly parameterized generators and layered domain-specific hardware languages. Chisel is embedded in the Scala programming language, which raises the level of hardware design abstraction by providing concepts including object orientation, functional programming, parameterized types, and type inference. |
|
![]() |
Monolithically Integrated CMOS PhotonicsIn a collaboration with MIT, the University of Colorado at Boulder, and Micron Technology, we are exploring the use of silicon photonics to provide high bandwidth energy-efficient links between processors and memory. |
|
![]() |
DEGAS: Dynamic Exascale Global Address Space Programming EnvironmentsThe Dynamic, Exascale Global Address Space programming environment (DEGAS) project will develop the next generation of programming models, runtime systems and tools to meet the challenges of Exascale systems. |
|
![]() |
DHOSA: Defending Against Hostile Operating SystemsThe DHOSA research project focuses on building systems that will remain secure even when the operating system is compromised or hostile. DHOSA is a collaborative effort among researchers from Harvard, Stony Brook, U.C. Berkeley, University of Illinois at Urbana-Champaign, and the University of Virginia. |
|
Earlier Projects at UC Berkeley | ||
![]() |
RAMP: Research Accelerator for Multi-ProcessorsThe RAMP project was a multi-University project to develop new techniques for efficient FPGA-based emulation of novel parallel architectures thereby overcoming the multicore simulation bottlenecks facing computer architecture researchers. At Berkeley, prototypes included the 1,008 processor RAMP Blue system and the RAMP Gold manycore emulator. |
|
Earlier Projects from the MIT SCALE Group | ||
![]() |
The Scale Vector-Thread MicroprocessorThe Scale microprocessor introduced a new architectural paradigm, vector-threading, which combines the benefits of vector and threaded execution. The vector-thread unit can smoothly morph its control structure from vector-style to threaded-style execution. |
|
![]() |
Transactional MemoryIn many dynamic thread-parallel applications, lock management is the source of much programming complexity as well as space and time overhead. We are investigating possible practical microarchitectures for implementing transactional memory, which provides a superior solution for atomicity that is much simpler to program than locks, and which also reduces space and time overheads. |
|
![]() |
Low-power Microprocessor DesignWe have been developing techniques that combine new circuit designs and microarchitectural algorithms to reduce both switching and leakage power in components that dominate energy consumption, including flip-flops, caches, datapaths, and register files. |
|
![]() |
Energy-Exposed Instruction SetsModern ISAs such as RISC or VLIW only expose to software properties of the implementation that affect performance. In this project we are developing new energy-exposed hardware-software interfaces that also allow software to have fine-grain control over energy consumption. |
|
![]() |
Mondriaan Memory ProtectionMondriaan memory protection (MMP) is a fine-grained protection scheme that allows multiple protection domains to flexibly share memory and export protected services. In contrast to earlier page-based systems, MMP allows arbitrary permissions control at the granularity of individual words. |
|
![]() |
Highly Parallel Memory SystemsWe are investigating techniques for building high-performance, low-power memory subsystems for highly parallel architectures. |
|
![]() |
Mobile Computing SystemsWithin the context of MIT Project Oxygen, several projects examine the energy and performance of complete mobile wireless systems. |
|
![]() |
Heads and Tails: Efficient Variable-Length Instruction EncodingExisting variable-length instruction formats provide higher code densities than fixed-length formats, but are ill-suited to pipelined or parallel instruction fetch and decode. Heads-and-Tails is a new variable-length instruction format that supports parallel fetch and decode of multiple instructions per cycle, allowing both high code density and rapid execution for high-performance embedded processors. |
|
Early Projects | ||
![]() |
IRAM: Intelligent RAMThe Berkeley IRAM project sought to understand the entire spectrum of issues involved in designing general-purpose computer systems that integrate a processor and DRAM onto a single chip - from circuits, VLSI design and architectures to compilers and operating systems. |
|
![]() |
PHiPAC: Portable High-Performance ANSI CPHiPAC was the first autotuning project, automatically generating a high-performance general matrix-multiply (GEMM) routine by using parameterized code generators and empirical search to produce fast code for any platform. Autotuners are now standard in high-performance library development. |
|
![]() |
The T0 Vector MicroprocessorT0 (Torrent-0) was the first single-chip vector microprocessor. T0 was designed for multimedia, human-interface, neural network, and other digital signal processing tasks. T0 includes a MIPS-II compatible 32-bit integer RISC core, a 1KB instruction cache, a high performance fixed-point vector coprocessor, a 128-bit wide external memory interface, and a byte-serial host interface. T0 formed the basis of the SPERT-II workstation accelerator. |
|
![]() |
SPACE: Symbolic Processing in Associative Computing ElementsIn the PADMAVATI prototype system, a hierarchy of packaging technologies cascade multiple SPACE chips to form an associative processor array with 170,496 36-bit processors. Primary applications for SPACE are AI algorithms that require fast searching and processing within large, rapidly changing data structures. |
|