# Optimizing the layout and error properties of quantum circuits



Professor John Kubiatowicz University of California at Berkeley

> September 28<sup>th</sup>, 2012 kubitron@cs.berkeley.edu http://qarc.cs.berkeley.edu/

#### Quantum Circuits are Big!

- Some recent (naïve?) estimates for Ground-State Estimation (Level 3 Steane code):
  - 209 logical qubits  $\times$  343 (EC) = 71687 data qubits
  - Total operations: 10<sup>11</sup> to 10<sup>17</sup> (depending on type)
  - $10^{17}$  T gates  $\times$  117,000 ancillas/T gate =  $10^{22}$  ancillas
  - $5 \times 10^{26}$  Operations for SWAP (communication)
  - And on...
- Shor's Algorithm for factoring?
  - $5 \times 10^5$  or more data qubits
  - $1.5\times10^{15}$  operations (or more)
- How can you possibly investigate such circuits?
  - This is the realm of *Computer Architecture* and *Computer Aided Design (CAD)*

Sept 28th, 2012

JIQ Workshop

#### 2

#### Simple example of Why Architecture Studies are Important (2003)

- Consider Kane-style Quantum Computing Datapath
  - Qubits are embedded P<sup>+</sup> impurities in silicon substrate
  - Manipulate Qubit state by manipulating hyperfine interaction with electrodes above embedded impurities
- Obviously, important to have an efficient wire
  - For Kane-style technology need sequence of SWAPs to communicate quantum state
  - So our group tried to figure out what involved in providing wire
- Results:
  - Swapping control circuit involves complex pulse sequence between every pair of embedded Ions
  - We designed a local circuit that could swap two Qubits (at < 4°K)

electron

P ion

Si substrate

measurement

SETs

electroi

P ion

ШЯI

3

- Area taken up by control was > 150 x area taken by bits!
- Conclusion: must at least have a practical WIRE!
   Not clear that this technology meets basic constraint

#### Pushing Limits

- Very interesting problems happen at scale!
  - Small circuits become Computer Architecture
    - Modular design
    - Pipelining
    - Communication Infrastructure
  - Direct analogies to classical chip design apply
    - The physical organization of components matters
    - "Wires are expensive, adders are not"?
- Important Focus Areas for the future:
  - Languages for Describing Quantum Algorithms
  - Optimal partitioning and layout
  - Global communication scheduling
  - Layout-driven error correction

| E:                           | xpressing Quantum<br>Algorithms |   | <ul> <li>Graphically</li> <li>Several a</li> <li>QASM: th</li> <li>Primitives</li> <li>C-like lang</li> <li>Scaffold</li> <li>Embedded</li> <li>Use langu</li> <li>Specific</li> <li>Can build</li> <li>Can intro</li> </ul> | some abstraction, modules, fixed lo | Domain<br>s<br>tors |
|------------------------------|---------------------------------|---|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------|---------------------|
| Sept 28 <sup>th</sup> , 2012 | JIQ Workshop                    | 5 | Sept 28 <sup>th</sup> , 2012                                                                                                                                                                                                 | JIQ Workshop                        | 6                   |

# Quantum Circuit Model



- Universal gate set: Sufficient to form all unitary transformations
- Example: Syndrome Measurement (for 3-bit code)
  - Measurement (meter symbol) produces classical bits
- Quantum CAD
  - Circuit expressed as netlist
  - Computer manpulated circuits and implementations

Sept 28<sup>th</sup>, 2012



#### Higher-Level Language: Chisel

How to express

- Scala-based language for digital circuit design
  - High-level functional descriptions of circuits as input
  - Many outputs: for instance direct production on Verilog
  - Used in design of new advanced RISC pipeline
- Features
  - High-level abstraction
  - Hierarchical design
  - Abstractions build up circuit (netlist)
- Inner-Product FIR Digital Filter:  $y[t] = \sum_{i} w_i * x_i[t-i]$

# Quantum Chisel

- Simple additions to Chisel Code base
  - Addition of Classical  $\Rightarrow$  Quantum translation
    - Produce Ancilla, UseToffoli Gates, CNots, etc
    - Reverse Logic to automagically reverse netlists and produce reversible output
    - State machine transformation (using "shift registers" to keep extra state when needed)
  - Because of the way Chisel constructed, can be below the level of syntax (DSL) seen by programmer
    - $\cdot$  With possible exception of explicit REVERSE operator
- Goal? Take classical circuits designed in Chisel and produce quantum equivalents

JIQ Workshop

- Adders, Multipliers
- Floating-Point processors
- Output: Quantum Assembly (QASM)
  - Input to other tools!

Sept 28th, 2012





Sept 28th, 2012

9

JIQ Workshop

10



#### Topological (Surface) Quantum ECC Smooth boundary



JIQ Workshop

# Computation with Topological Codes



- Each logical Qubit represented by a pair of holes
- Layout for Large Algorithm: Tile Lattice with paired holes
- CNOT: move a smooth hole around a rough one
  - Complications: may need to transform a smooth hole into a rough one before performing CNOT
  - Rules for how to move holes (grow and shrink them)
- Again: Some gates easy, some not (Once again, T is messy)

Sept 28th, 2012

JIQ Workshop

Moving to the Realm of Quantum Computer Aided Design



Sept 28th, 2012

OR

cx q1, q0;

cx q1, q2;

correct q1;

cx q3, q4;

correct q4;

zmeasure q3, c3;

(@c3==1) x q4;

Quantum Assembly

(QASM)

h a2;

Sept 28th, 2012

JIQ Workshop

14

# Need for CAD: More than just Size

- Data locality:
  - Where gubits "live" and how they move can make or break the ability of a quantum circuit to function:
    - Movement carries risk and consumes time
    - Ancilla must be created close to where used
    - Communication must be minimized through routing optimization
- Customized (optimal?) data movement ⇒ customized channel structure/quantum data path
  - One-size fits all topology not necessarily the best
- Parallelism:
  - How to exploit parallelism in dataflow graph
     Partitioning and scheduling algorithms
  - Area-Time tradeoff in Ancilla generation
  - Customized circuits for pre-computing non-transversal Ancilla reuse?
- Error Correction:
  - One-size fits all probably not desirable
  - Adapt level of encoding in circuit-dependent way
  - Corrections after every operation may not be necessary

#### Sept 28th, 2012

15

13



Quadence Design Tool

Layout Network Insertion Error Analysis ... Optimization CAD Tool



JIQ Workshop

### **Important Measurement Metrics**

#### Traditional CAD Metrics:

- Area
  - What is the total area of a circuit?
  - Measured in macroblocks (ultimately μm<sup>2</sup> or similar)
- Latency (Latency<sub>single</sub>)
  - What is the total latency to compute circuit once
  - Measured in seconds (or μs)
- Probability of Success (P<sub>success</sub>)
- Not common metric for classical circuits
- · Account for occurrence of errors and error correction
- Quantum Circuit Metric: ADCR
  - Area-Delay to Correct Result: Probabilistic Area-Delay metric



- ADCR<sub>optimal</sub>: Best ADCR over all configurations
- Optimization potential: Equipotential designs
  - Trade Area for lower latency
- Trade lower probability of success for lower latency JIQ Workshop Sept 28th, 2012





# Optimizing Ancilla and Layout



An Abstraction of Ion Traps Basic block abstraction: Simplify Layout



- Evaluation of layout through simulation
  - Movement of ions can be done classically
  - Yields Computation Time and Probability of Success
- Simple Error Model: Depolarizing Errors
  - Errors for every Gate Operation and Unit of Waiting
  - Ballistic Movement Error: Two error Models
    - 1. Every Hop/Turn has probability of error
  - 2. Only Accelerations cause error JIO Workshop

17

#### Example Place and Route Heuristic: **Collapsed Dataflow**

- Gate locations placed in dataflow order
  - Qubits flow left to right
  - Initial dataflow geometry folded and sorted
  - Channels routed to reflect dataflow edges
- Too many gate locations, collapse dataflow
  - Using scheduler feedback, identify latency critical edges
  - Merge critical node pairs
  - Reroute channels
- Dataflow mapping allows pipelining of computation!



# Quantum Logic Array (QLA)





- **Basic Unit:** 
  - Two-Qubit cell (logical)
  - Storage, Compute, Correction
- Connect Units with Teleporters
  - Probably in mesh topology, but details never entirely clear from original papers
- First Serious (Large-scale) Organization (2005)
  - Tzvetan S. Metodi, Darshan Thaker,

Andrew W. Cross, Frederic T. Chong, and Isaac L. Chuang JIQ Workshop Sept 28th, 2012

# Running Circuit at "Speed of Data" • Often, Ancilla gubits are independent of data

- - Preparation may be pulled offline
  - Very clear Area/Delay tradeoff: Suggests Automatic Tradeoffs (CAD Tool)
- Ancilla gubits should be ready "just in time" to avoid ancilla decoherence from idleness



#### How much Ancilla Bandwidth Needed?



Can precompute ancilla for non-transverse gates!

### **Tiled Quantum Datapaths**



- Configure and insert teleportation network

Sept 28th, 2012

JIQ Workshop

# Which Datapath is Best?

ADCR

- Random Circuit Generation
  - Splitting factor (r): measures connectivity of the circuit Related to Rent's factor
- Qalypso clear winner
  - 4x lower latency than LQLA
  - 2x smaller area than CQLA+
- Why Qalypso does well:



ADCR vs Number of Gates, r=0.5

- Shared, matched ancilla factories - Automatic network sizing (rather than fixed teleportation)
- Automatic Identification of Idle Qubits (memory)
- LQLA and CQLA+ perform close second
  - Original supplemented with better ancilla generators, automatic network sizing, and Idle Qubit identification
  - Original QLA and CQLA do very poorly for large circuits

Sept 28th, 2012

JIQ Workshop

26

# Optimizing Error Correction





- · Standard idea: correct after every gate, and long communication, and long idle time
  - This is the easiest for people to analyze
- This technique is suboptimal
  - Not every bit has same noise level!
- Different idea: identify critical Qubits
  - Try to identify paths that feed into noisiest output bits
  - Place correction along these paths to reduce maximum noise

Sept 28th, 2012

27

25

28

Sept 28th, 2012



#### Investigating Larger Circuits



# Recorrection of 500-gate Random Circuit (r=0.5)



- Not all codes do equally well with Recorrection
  - Both [[23,1,7]] and [[7,1,3]] reasonable candidates
  - [[25,1,5]] doesn't seem to do as well
- Cost of communication and Idle errors is clear here!
- However real optimization situation would vary EDist to find optimal point

Sept 28th, 2012

JIQ Workshop

30

### What does Quadence do?

- ECC Insertion and Optimization
  - Logical  $\Rightarrow$  Physical circuits
    - Includes encoding, and correction
  - ECC Recorrection optimization (more later)
- Circuit partitioning
  - Find minimum places to cut large circuit
  - Compute ancilla needs
  - Place physical qubits in proper regions of grid
- · Communication Estimation and insertion
  - Generate Custom Teleportation network
- · Schedule movement of bits
  - Movement within Ancilla generators (Macros)
  - Movement within compute and memory regions
  - Movement two and from teleportation stations
- Simulation of result to get timing for full circuit
- MonteCarlo simulation to get error analysis



### Area Breakdown for Adders



- Error Correction is *not* predominant use of area
  - Only 20-40% of area devoted to QEC ancilla
  - For Optimized Qalypso QCLA, 70% of operations for QEC ancilla generation, but only about 20% of area
- T-Ancilla generation is major component
  - Often overlooked
- Networking is significant portion of area when allowed to optimize for ADCR (30%)
  - CQLA and QLA variants didn't really allow for much flexibility

35

#### Sept 28th, 2012

Direct Comparison:

Concatenated and Topological QECC

- Ground State Estimation
  - Find ground state of Glycine
- Problem Size:
  - 50 Basis Functions
  - Result Calculated with 5 Bits accuracy
  - 60 Qubits, 6.9 x 10<sup>12</sup> gates, Parallelism: 2.5
- Conceptual Primitives
  - Quantum Simulation and Phase Estimation



# Properties of Quantum Technologies: Gate Times and Errors

|           | Supercond.<br>Qubits<br>(Primitive) | Supercond.<br>Qubits<br>(Optimal) | Ion Traps<br>(Primitive) | Lon Traps<br>(Optimal) | Neutral<br>Atoms<br>(Primitive) | Neutral<br>Atoms<br>(Trotter) |
|-----------|-------------------------------------|-----------------------------------|--------------------------|------------------------|---------------------------------|-------------------------------|
| Time (ns) | 25                                  | 28                                | 32,000                   | 32,000                 | 14,818                          | 19,465                        |
| Gate Err  | 1.0x10 <sup>-5</sup>                | 6.6x10 <sup>-4</sup>              | 3.2x10 <sup>-9</sup>     | 2.9x10 <sup>-7</sup>   | 8.1×10 <sup>-3</sup>            | 1.5x10 <sup>-3</sup>          |
| Mem Err   | 1.0×10 <sup>-5</sup>                | 1.0×10 <sup>-5</sup>              | 2.5×10 <sup>-12</sup>    | 2.5x10 <sup>-12</sup>  | 0.0                             | 0.0                           |

- Ion traps slower but more reliable than superconductors
- Neutral atoms unusable with concat. codes

| Sept 28 <sup>th</sup> , 2012 | JIQ Workshop | 37 | Sept 28 <sup>th</sup> , 2012 | JIQ Workshop |
|------------------------------|--------------|----|------------------------------|--------------|
|                              |              |    |                              |              |

#### Ground State Estimation, Multiple Technologies

|    | Multiple recimologies   |                                   |                                     |                                   |                |  |
|----|-------------------------|-----------------------------------|-------------------------------------|-----------------------------------|----------------|--|
|    |                         | 1 x 10 <sup>-3</sup><br>19,000 ns | 1 x 10 <sup>-5</sup><br>25 ns       | 1 x 10 <sup>-9</sup><br>32,000 ns |                |  |
|    |                         | Neutral<br>Atoms<br>(Trotter)     | Supercond.<br>Qubits<br>(Primitive) | Ion Trap<br>(Primitive)           |                |  |
|    | Surface                 | 10,883<br>years                   | 4.5 years                           | 5,588 years                       | Time           |  |
|    | Code                    | $2.0 \times 10^{24}$              | 3.5 x 10 <sup>22</sup>              | 3.9 × 10 <sup>22</sup>            | Gates          |  |
|    |                         | 2.5 × 10 <sup>8</sup>             | $1.7 \times 10^{7}$                 | 4 4 × 10 <sup>7</sup>             | Qubits         |  |
|    |                         | -                                 | 4,229 years                         | (128 years)                       | Time           |  |
|    | Bacon<br>Shor Code      | -                                 | 9.5 x 10 <sup>32</sup>              | 1.5 x 10 <sup>19</sup>            | Gates          |  |
|    |                         | -                                 | 9.4 × 10 <sup>11</sup>              | 1.6 x 10 <sup>5</sup>             | Qubits         |  |
|    |                         | -                                 | 5                                   | 1                                 | Concatenations |  |
| pt | 28 <sup>th</sup> , 2012 |                                   | JIQ Workshop                        |                                   |                |  |

#### Conclusion

- How to express guantum algorithms?
  - Embedded DSLs in higher-order languages
- Size of Quantum Circuits  $\Rightarrow$  Must Optimize Locality
  - Presented Some details of a Full CAD flow (Partitioning, Layout, Simulation, Error Analysis)
  - New Evaluation Metric: ADCR = Area  $\times$  E(Latency)
  - Full mapping and layout accounts for communication cost
- Ancilla Optimization Important
  - Ancilla bandwidth varies widely
  - Custom ancilla factories sized to meet needs of circuit
- "Recorrection" Optimization for QEC
  - Selective placement of error correction blocks
  - Validation with full layout to find optimal level of correction
- Analysis of 1024-bit adder architectures
  - Carry-Lookahead adders better than Ripple Carry adders
  - Error correction not the primary consumer of area! JIQ Workshop

40