# Optimizing the layout and error properties of quantum circuits





John Kubiatowicz kubitron@cs.berkeley.edu

http://qarc.cs.berkeley.edu/ University of California at Berkeley

#### What Do I do?

- Background in Parallel Hardware Design
  - Technically, I'm a computer architect
  - Alewife project at MIT: Parallel Processing • Shared Memory/Message Passing
  - Designed CMMU, Modified SPARC processor
- Background in Operating Systems
  - OS Developer for Project Athena (MIT)
  - Background in High-Availability systems
  - Current OS lead researcher for new Berkeley PARLab (Tessellation OS).
- Background in Peer-to-Peer Systems
  - OceanStore project -Store your data for 1000 years
  - Tapestry and Bamboo -Find you data around globe
- Quantum Computing Architectures
  - Topic of today's lecture
  - Architecture of large-scale Quantum systems
  - Using CAD to study Quantum computers

Quantum Computer Architectures

©2009 John Kubiatowicz/UC Berkeley

### Outline

- Quantum Computer Architecture
   Some Urban legends about Quantum Architecture
- Ion Trap Quantum Computing
- Quantum Computer Aided Design
  - Area-Delay to Correct Result (ADCR) metric
  - Comparison of error correction codes
- Quantum Data Paths
  - QLA, CQLA, Qalypso
  - Ancilla factory and Teleportation Network Design
- Error Correction Optimization ("Recorrection")

©2009 John Kubiatowicz/UC Berkelev

Shor's Factoring Circuit Layout and Design

#### Quantum Computing Architectures

- Why study quantum computing?
  - Interesting, says something about physics
    - Failure to build  $\Rightarrow$  quantum mechanics wrong?
  - Mathematical Exercise (perfectly good reason)
  - Hope that it will be practical someday:
    - $\boldsymbol{\cdot}$  Shor's factoring, Grover's search, Design of Materials
    - Quantum Co-processor included in your Laptop?
- To be practical, will need to hand quantum computer design off to classical designers
  - Baring Adiabatic algorithms, will probably need 100s to 1000s (millions?) of working logical Qubits ⇒
    1000s to millions of physical Qubits working together
  - Current chips: ~1 billion transistors!
- Large number of components is realm of architecture
  - What are optimized structures of quantum algorithms when they are mapped to a physical substrate?
  - Optimization not possible by hand
    - Abstraction of elements to design larger circuits
    - Lessons of last 30 years of VLSI design: USE CAD

QARC:3



#### Things Architect Worries About

- What are the major components in system?
  - Compute Units
  - Ancilla Generators (Entropy Suppression)
  - Memories
  - Wires!
- What are the best architectures for these elements?
  - Adders: Ripple Carry vs Carry Lookahead
  - Ancilla Factories: Pipelined vs Parallel
  - Communication Architectures:
    - Teleportation Network structure
    - EPR Distribution
    - When to choose Ballistic vs Teleportation
- What is the best way to build fault-tolerant architectures?
  - QEC Codes, Layouts, Topology-Specific Error Correction

Quantum Computer Architectures

©2009 John Kubiatowicz/UC Berkeley



OARC:5

## Simple example of Why Architecture Studies are Important (2003)

- Consider Kane-style Quantum Computing Datapath
  - Qubits are embedded P<sup>+</sup> impurities in silicon substrate
  - Manipulate Qubit state by manipulating hyperfine interaction with electrodes above embedded impurities
- Obviously, important to have an efficient *wire* 
  - For Kane-style technology need sequence of SWAPs to communicate quantum state
  - So our group tried to figure out what involved in providing wire
- Results:
  - Swapping control circuit involves complex pulse sequence between every pair of embedded Ions
  - We designed a local circuit that could swap two Qubits (at  $< 4^{\circ}$ K)

electron

P ior

пяп

- Area taken up by control was > 150 x area taken by bits!
- Conclusion: must at least have a practical WIRE! - Not clear that this technology meets basic constraint

©2009 John Kubiatowicz/UC Berkeley Ouantum Computer Architectures

OARC:6

alobal B

electro

P ion

ΠЯП

Si substrate

measurement SETe



- produces classical bits
- Quantum CAD
  - Circuit expressed as netlist



Ouantum Computer Architectures





- Finally, need to perform periodical error correction - Correct after every(?): Gate, Long distance movement, Long Idle Period
- Correction reducing entropy  $\Rightarrow$  Consumes Ancilla bits
- Observation:  $\geq$  90% of QEC gates are used for ancilla production > 70-85% of all gates are used for ancilla production

Quantum Computer Architectures

#### Some Urban Legends for Later

- More powerful QEC codes are better than less powerful QEC ······ codes under all circumstances
- Every Qubit has the same requirements for ancilla bandwidth
- Fault-tolerant Circuits must correct after every gate, long distance movement, long memory storage period
- Quantum Computing Circuits spend all of their time performing error correction

| г | Encoded<br>π/8 (T)<br>Ancilla | ]-/•                    | -SX-              | Correct        |
|---|-------------------------------|-------------------------|-------------------|----------------|
| 7 | / <b>•</b>                    |                         |                   |                |
| 7 | <b>∕-⊕</b>  <br>≁ਸਿਮ          |                         |                   | orrect Correct |
|   |                               | Syndrome<br>Computation | Correct<br>Errors |                |
|   | QEC<br>Ancilla                | rome<br>Itation         | Correc            | t              |

#### Outline

- Quantum Computer Architecture
  - Some Urban legends about Quantum Architecture
- Ion Trap Quantum Computing
- Quantum Computer Aided Design
  - Area-Delay to Correct Result (ADCR) metric
  - Comparison of error correction codes
- Quantum Data Paths
  - QLA, CQLA, Qalypso
  - Ancilla factory and Teleportation Network Design
- Error Correction Optimization ("Recorrection")
- Shor's Factoring Circuit Layout and Design

```
Quantum Computer Architectures
```

©2009 John Kubiatowicz/UC Berkeley

OARC:9

Qubits are atomic ions (e.g. Be<sup>+</sup>)

- Ions suspended in channels

Quantum gates performed by

- Movement (electrode) ops

disturbing state

Demonstrations in the Lab

lasers (either one or two bit ops)

Only at certain trap locations

- Ions move between laser sites to

· Complex pulse sequences to cause Ions to migrate

Care must be taken to avoid

between electrodes

perform gates

- Gate (laser) ops

Classical control

- State is stored in hyperfine levels

Quantum Computer Architectures

-

©2009 John Kubiatowicz/UC Berkeley

OARC:10

### **MEMs-Based Ion Trap Devices**

- Ion Traps: One of the more promising quantum computer implementation technologies
  - Built on Silicon
    - Can bootstrap the vast infrastructure that currently exists in the microchip industry
  - Seems to be on a "Moore's Law" like scaling curve
    - 12 bits exist, 30 promised soon, ...
    - Many researchers working on this problem
  - Some optimistic researchers speculate about room temperature
- Properties:
  - Has a long-distance Wire
    - So-called "ballistic movement"
  - Seems to have relatively long decoherence times
  - Seems to have relatively low error rates for:
    - Memory, Gates, Movement



QARC:11



#### Outline

- Quantum Computer Architecture
  Some Urban legends about Quantum Architecture
- Ion Trap Quantum Computing
- Quantum Computer Aided Design
  - Area-Delay to Correct Result (ADCR) metric
  - Comparison of error correction codes
- Quantum Data Paths
  - QLA, CQLA, Qalypso
  - Ancilla factory and Teleportation Network Design
- Error Correction Optimization ("Recorrection")
- Shor's Factoring Circuit Layout and Design

#### Vision of Quantum Circuit Design



#### **Important Measurement Metrics**

#### Traditional CAD Metrics:

- Area
  - What is the total area of a circuit?
  - Measured in macroblocks (ultimately  $\mu m^2$  or similar)
- Latency (Latency<sub>single</sub>)
  - What is the total latency to compute circuit once
  - Measured in seconds (or μs)
- Probability of Success (P<sub>success</sub>)
  - Not common metric for classical circuits
  - Account for occurrence of errors and error correction
- Quantum Circuit Metric: ADCR
  - Area-Delay to Correct Result: Probabilistic Area-Delay metric

- ADCR = Area  $\times$  E(Latency) =  $\frac{\text{Area} \times \text{Latency}_{\text{single}}}{2}$ 

- ADCR<sub>optimal</sub>: Best ADCR over all configurations
- Optimization potential: Equipotential designs
  - Trade Area for lower latency
- Trade lower probability of success for lower latency ©2009 John Kubiatowicz/UC Berkeley Ouantum Computer Architectures
- OARC·17

#### How to evaluate a circuit?

- First, generate a physical instance of circuit Encode the circuit in one or more QEC codes

  - Partition and layout circuit: Highly dependant of layout heuristics! Create a physical layout and scheduling of bits
    - Yields area and communication cost



- Then, evaluate probability of success
  - Technique that works well for depolarizing errors: Monte Carlo · Possible error points: Operations, Idle Bits, Communications
  - Vectorized Monte Carlo: n experiments with one pass
  - Need to perform hybrid error analysis for larger circuits • Smaller modules evaluated via vector Monte Carlo · Teleportation infrastructure evaluated via fidelity of EPR bits
- Finally Compute ADCR for particular result

| <ul> <li>Repeat as necessary</li> </ul> | v by varying parameters to generate | ADCR <sub>optimal</sub> |
|-----------------------------------------|-------------------------------------|-------------------------|
| Quantum Computer Architectures          | ©2009 John Kubiatowicz/UC Berkeley  | QARC:18                 |

Quantum CAD flow



# Example Place and Route Heuristic: Collapsed Dataflow

- Gate locations placed in dataflow order
  - Qubits flow left to right
  - Initial dataflow geometry folded and sorted
  - Channels routed to reflect dataflow edges
- Too many gate locations, collapse dataflow
  - Using scheduler feedback, identify latency critical edges
  - Merge critical node pairs
  - Reroute channels
- Dataflow mapping allows pipelining of computation!





## Comparing Different QEC Codes Possible to perform a comparison between codes

- - Pick circuit/Run through CAD flow
  - Result depends on goodness of layout and scheduling heuristic
- Layout for CNOT gate (Compare with Cross, et. al)



#### http://garc.cs.berkeley.edu/publications ©2009 John Kubiatowicz/UC Berkeley Quantum Computer Architectures

#### Outline

- Quantum Computer Architecture
  - Some Urban legends about Quantum Architecture
- Ion Trap Quantum Computing
- Quantum Computer Aided Design
  - Area-Delay to Correct Result (ADCR) metric
  - Comparison of error correction codes
- Quantum Data Paths
  - QLA, CQLA, Qalypso
  - Ancilla factory and Teleportation Network Design
- Error Correction Optimization ("Recorrection")
- Shor's Factoring Circuit Layout and Design

```
Ouantum Computer Architectures
```

OARC:21

©2009 John Kubiatowicz/UC Berkeley

OARC:22



#### Details

- Why Regular Array? ٠
  - Distribute Ancilla generation where it is needed
  - Single 2-Qubit storage cell quite large • Concatenated [[7,1,3]] could have 343 or more physical Qubits/ logical Qubit
  - Size of single logical Qubit  $\Rightarrow$ makes sense to teleport between large logical blocks
  - Regularity easier to exploit for CAD tools!
    - Same reason we have ASICs with regular routing channels
- Assumptions: ٠
  - Rate of ancilla consumption constant for every Qubit
  - Ratio of one Teleporter for every two Qubit gate is optimal
  - (Implicit) Error correction after every move or gate is optimal
  - Parallelism of quantum circuits can exploit computation on every Qubit in the system at same time
- Are these assumptions valid???



#### Ancilla Factory Design I



#### Ancilla Factory Design II

- Pipelined ancilla preparation: break into stages
  - Steady stream of encoded ancillae at output port
  - Fully laid out and scheduled to get area and bandwidth estimates



QARC:27

### The Qalypso Datapath Architecture

- Dense data region
  - Data gubits only
  - Local communication
- Shared Ancilla Factories
  - Distributed to data as needed
  - Fully multiplexed to all data
  - Output ports (⇒): close to data
  - Input ports (⇒): may be far from data (recycled state irrelevant)
- Regions connected by teleportation networks



```
Quantum Computer Architectures
```

OARC:29

### Tiled Quantum Datapaths



- Variations include hand-tuned Ancilla generators/factories
- Memory: storage for state that doesn't move much - Less/different requirements for Ancilla

  - Original CQLA paper used different QEC encoding
- Automatic mapping must:
  - Partition circuit among compute and memory regions
  - Allocate Ancilla resources to match demand (at knee of curve)
- Configure and insert teleportation network ©2009 John Kubiatowicz/UC Berkeley Quantum Computer Architectures

OARC:30

### Which Datapath is Best?

- **Random Circuit Generation** 
  - f(Gate Count, Gate Types, Qubit Count, Splitting factor)
  - Splitting factor (r): measures connectivity of the circuit
    - Example: 0.5 splits Qubits in half, adds random gates between two halves, then recursively splits results
    - Closely related to Rent's parameter
- Qalypso clear winner (for all r)
- 4x lower latency than LQLA
- 2x smaller area than CQLA+
- Why Qalypso does well:
- - Shared, matched ancilla generation
  - Automatic network sizing (not one Teleporter for every two Qubits)
  - Automatic Identification of Idle Qubits (memory)
- ٠ LQLA and CQLA+ perform close second
  - Original datapaths supplemented with better ancilla generators. automatic network sizing, and Idle Qubit identification
  - Original QLA and CQLA do very poorly for large circuits

10<sup>16</sup>

10<sup>15</sup>

10<sup>14</sup>

10<sup>13</sup>

10<sup>10</sup>

10<sup>9</sup>

10<sup>8</sup>

10<sup>7</sup>

QARC:31



- What is the architecture of the network?
  - Including Topology, Router design, EPR Generators, etc..

©2009 John Kubiatowicz/UC Berkeley

- What are the details of EPR distribution? •
- What are the practical aspects of routing?
  - When do we set up a channel?
  - What path does the channel take?

CQLA LQLA CQLA+ Qalvpso 100 10000 1000 Gates in random circuit

ADCR vs Number of Gates, r=0.5

QLA





- Precomputed or on-demand start time for setup
- Each EPR gubit has associated classical message

QARC:35

#### **Optimization of Network?**



- EPR Routing Algorithms
  - On Demand using minimal adaptive paths
    - Decisions made at runtime, congestion avoidance picks free path from A->B, delays if no path available
  - Offline, Adaptive
    - Pre-schedules channels to overlap with prior computation
    - Must determine and store full path information for each communication prior to execution
- Scale network to meet circuit needs (after mapping)
  - Size EPR generation, channels, and teleport resources
  - Initial Goal: running all computation at "speed of data" • Causes network to consume 80% of total area if done naively
  - Back off from "at speed" point during ADCR optimization ©2009 John Kubiatowicz/UC Berkeley

#### Reducing QEC Overhead Outline Quantum Computer Architecture - Some Urban legends about Quantum Architecture • Ion Trap Quantum Computing orre Quantum Computer Aided Design - Area-Delay to Correct Result (ADCR) metric - Comparison of error correction codes Standard idea: correct after every gate, and long Quantum Data Paths communication, and long idle time - QLA, CQLA, Qalypso - This is the easiest for people to analyze - Ancilla factory and Teleportation Network Design - Urban Legend? Must do in order to keep circuit fault tolerant! Error Correction Optimization ("Recorrection") ٠ This technique is suboptimal (at least in some domains) · Shor's Factoring Circuit Layout and Design - Not every bit has same noise level! · Different idea: identify critical Qubits - Try to identify paths that feed into noisiest output bits - Place correction along these paths to reduce maximum noise OARC:38 Ouantum Computer Architectures Quantum Computer Architectures ©2009 John Kubiatowicz/UC Berkeley OARC:37 Simple Error Propagation Model **QEC** Optimization Error Distance EDist<sub>MAX</sub> (EDist) Labels iteration QEC Maximum EDist Partitionina Fault Optimization and Input propagation: Analysis EDist<sub>MAX</sub> Layout Circuit 4=max(3,1)+1Optimized 2 Modified version of Lavout retiming algorithm: called recorrection: 1024-bit QRCA and QCLA adders 0.8 1300 Find minimal placement 1200 (10000s) of correction operations 0.7 1100 that meets specified probability 1000 $MAX(EDist) \leq EDist_{MAX}$ 0.6 EDist model of error propagation: 900 operations Probably of success not 0.5 - Inputs start with EDist = 0 800 always reduced for 700 - Each gate propagates max input EDist to outputs 0.4 QRCA prob EDistmax > 1 QCLA prob 600 succe - Gates add 1 unit of EDist, Correction resets EDist to 1 0.3 QCLA opcount Physical gate But, operation count and area drastically reduced 500 QRCA opcount --inal Maximum EDist corresponds to Critical Path 400 0.2 300 Use Actual Layouts and • - Back track critical paths that add to Maximum EDist 0.1 200 Fault Analysis Add correction to keep EDist below critical threshold 100 Optimization pre-layout, 6 8 10 12 2 4 Example: Added correction to keep EDist<sub>MAX</sub> ≤ 2 evaluated post-layout EDist threshold ©2009 John Kubiatowicz/UC Berkeley QARC:39 Quantum Computer Architectures ©2009 John Kubiatowicz/UC Berkeley Quantum Computer Architectures QARC:40



#### Comparison of 1024-bit adders



- 1024-bit Quantum Adder Architectures
  - Ripple-Carry (QRCA)
  - Carry-Lookahead (QCLA)
- Carry-Lookahead is better in all architectures
- QEC Optimization improves ADCR by order of magnitude in some circuit configurations QARC:43

#### Area Breakdown for Adders



- Error Correction is *not* predominant use of area
  - Only 20-40% of area devoted to QEC ancilla
  - For Optimized Qalypso QCLA, 70% of operations for QEC ancilla generation, but only about 20% of area
- T-Ancilla generation is major component
  - Often overlooked
- · Networking is significant portion of area when allowed to optimize for ADCR (30%)
  - CQLA and QLA variants didn't really allow for much flexibility ©2009 John Kubiatowicz/UC Berkelev

#### Investigating 1024-bit Shor's



- Full Layout of all Elements
  - Use of 1024-bit Quantum Adders
  - Optimized error correction
  - Ancilla optimization and Custom Network Layout
- Statistics:
  - Unoptimized version: 1.35×10<sup>15</sup> operations
  - Optimized Version 1000X smaller
  - QFT is only 1% of total execution time

```
Quantum Computer Architectures
```

©2009 John Kubiatowicz/UC Berkeley

#### QARC:45

#### 1024-bit Shor's Continued



#### Conclusion

- Quantum Computer Architecture:
  - Considering details of Quantum Computer systems at larger scale (1000s or millions of components)
- Argued that CAD tools may have a place in Quantum Computing Research
  - Presented Some details of a Full CAD flow (Partitioning, Layout, Simulation, Error Analysis)
  - New Evaluation Metric: ADCR = Area × E(Latency)
  - Full mapping and layout accounts for communication cost
- "Recorrection" Optimization for QEC
  - Simplistic model (EDist) to place correction blocks
  - Validation with full layout
  - Can improve ADCR by factors of 10 or more
    - Improves latency and area significantly, can improve probability under some circumstances as well
- Full analysis of Adder architectures and 1024-bit Shor's
  - Still too long (and too big), but smaller than previous estimates
  - Total circuit size still too big for our error analysis but have hope that we can improve this computer Architectures 2009 John Kubiatowicz/UC Berkeley