#### Design Methodology for On-Chip Power Grid Interconnect: AI/ML Perspective

#### **Sukanta Dey**

#### Ph.D. Defense Seminar, 22<sup>nd</sup> March 2021

Supervisors Prof. Sukumar Nandi Dr. Gaurav Trivedi



Department of Computer Science and Engineering Indian Institute of Technology Guwahati Guwahati – 781039, Assam, India

### Overview

- Motivation
- Introduction and Objective
- Contributions:
  - Power Grid Analysis using Probabilistic Approach
  - Design Space Exploration of PG Interconnect
  - ML Approach for PG Design
  - ML Approach for Aging Prediction
- Conclusion and Future works

# Motivation

- Apple's new iPhone/iPad bionic processor in Sept 2019\*
  - 8.5 billion transistors (A13) /10 billion transistors (A12x)
  - Die area: 98.48 mm<sup>2</sup> (A13) /122 mm<sup>2</sup> (A12x)
  - 0.7 V supply voltage (2<sup>nd</sup> gen TSMC 7nm Tech)
- 20 billion or more transistors in future mobile SoC.
- Challenge is distributing 0.7 V to 20 billion or more transistors
- Better design methodology required.
- EDA/VLSI Design comes to rescue.
- To reduce design cycle time and manpower.

3

# **VLSI** Design objective

- •**Objective:** To create the layout from the design specification.
- •Requirement: To decrease the human effort in the design process and to automate and make the design cycle faster.

#### What is VLSI Physical Design?

- The process of **converting** the **specification of an electrical circuit** called netlist into a **geometric representation** called layout.
- The layout must take optimum silicon area.
- Simultaneously, other issues must be minimized.

# **VLSI** Physical Design Flow

- Floorplanning.
- Placement.
- Routing.



# Floorplan



### **Power Grid Network Connections**



#### Power Grid Network Connections



#### Power Grid Network Connections



# Power Grid Modeling



**Fig: Power Grid Network and its equivalent resistive networks** 

### Power Grid Modeling



Challenge is to solve for large power grid networks. For eg: think of power grid design of Apple A12x SoC with 10 billion transistors

# Design Challenges in Power Grid

- IR drop
- Electromigration

Figure: IR drop map of IBM processor



Courtesy: IBM

# Design Challenges in Power Grid: Electromigration



#### Cathode

Figure: Metal wire cross section



Courtesy: IEEE

Figure: SEM Image of Void (open circuit) and hillock (short circuit)

# **On-Chip Power Planning**



# **Previous Works**

- Early 2000s: Many works on Power Grid Analysis, Few works on Area Optimization of Power Grid. [DAC'00/01/02/03, ICCAD'05/08, DATE'06/08]
- Early 2010s: Works mostly focused on Fast Power Grid Analysis/Verification using Linear Algebric Methods.[DATE'11/13, ICCAD'10/12/13, DAC'11/12].
- Since 2013: Electromigration Issues in Power Grid [DAC'13/14/16/17, ICCAD'13/16/17/19, DATE'17]
- **Recently, 2018/19/20**: Learning-based approaches in different issues in Power Grid Design. [ICCAD'19, DATE'19]
- So Why Learning Approaches now?

# AI/ML History



Since an early flush of optimism in the 1950s, smaller subsets of artificial intellgigence – first machin learning, then deep learning, a subset of machine learning – have created ever large disruptions

# What is AI/ML?

- Artificial Intelligence: Intelligence demonstrated by machines.
- Machine Learning: Inlelligence demonstrated by machines using its previous experiences.
- Supervised Learning: Labelled data.
- Unsupervised Learning: Unlabelled data.



Figure: Difference between Traditional programming and Machine Learning

# Why AI/ML for PG Design? Why Now?

- Designs are getting complex and large.
- Big data is associated with each design.
- Time-consuming design cycle.
- Involves vast human resource.
- New technology nodes evolve in every alternate year. - 14nm in 2014, 10nm in 2016, 7nm in 2018, 5/3nm in 2020.
- AI/ML can help in automation/semi-automation.
- Recently, Placement, Routing problems are also solved using AI/ML.

# Why ML for EDA? Why Now?

- Intelligent Design of Electronic Assets *(IDEA)* program sponsored by DARPA in 2018.
- It's under an initiative of \$1.5 billion.
- Involve many US universities and industry.
- IDEA aims to create a "no human in the loop" 24 hour turnaround layout generator for System-On-Chips, System-In-Packages, and Printed Circuit Boards.

https://spectrum.ieee.org/tech-talk/semiconductors/design/darpa-picks-its-first-set-ofwinners-in-electronics-resurgence-initiative

# Thesis Objective & Problem Statement

- **Objective:** Designing Relibale On-Chip Power Grid.
- Finding the hotspots for a large scale power grid network is challenging task.
- **Problem 1**: Solving power grid network in efficient way with better runtime.
- **Problem 2**: Obtaining optimum power grid design parameters.
- **Problem 3**: Calculating the Electromigration Aging time of the power grid network during the design time.
- Our Main focus is AI & ML-based approaches.

# **Thesis Contributions**

- First Contribution: Power Grid Analysis using Probabilitisc Approach
- Second Contribution: Design Space Exploration of PG Interconnects
- Third Contribution: Machine Learning Approach in PG Design
- Fourth Contribution: Machine Learning Approach in Aging Prediction of PG

# Simulations and Experimental setup

- All the methods are implemented in C/C++, Python.
- OS : Ubuntu 14.04/16.04/CentOS 6.
- Machine: Intel E5-2650/i5.
- Memory: 32/64 GB



# First Contribution: Power Grid Analysis using Probabilitisc Approach

- Power grid network is converted to a unweigthed graph.
- Objective is to speedup power grid analysis process.
- Levy Flight-based Probabilistic Method is used for traversing the graph.
- Results.

#### Single Node of PG Network



#### Previous Work\*



\* Qian et al. "*Random Walks in Supply Network*" in **DAC'03**.

Circuit Equation

$$V_x = \sum_{i=1}^{degree(x)} \frac{g_i}{\sum_{i=1}^{degree(x)} g_i} - \frac{I_x}{\sum_{i=1}^{degree(x)} g_i}$$

• RW Eq. for gain f()

f(x) = E[total money earned| walk starts at node x]

 $f(x) = p_{x,1}f(1) + p_{x,2}f(2) + \dots + p_{x,degree(x)}f(degree(x)) - m_x$ 

Compare both

$$p_{x,i} = \frac{g_i}{\sum_{i=1}^{degree(x)} g_i}, i = 1, 2, \cdots, degree(x)$$

$$m_x = \frac{I_x}{\sum_{i=1}^{degree(x)} g_i}, m_0 = V_{DD}, f(x) = V_x$$
<sup>25</sup>

# **Our Contributions**

- We try to remove the self-loops of Random Walks.
- Further, we adapted jumping strategy to speedup the convergence.
- Levy Flight is used to incorporate the jumping strategy.
- The reward is modified by calculating the effective resistance.
- To validate the method the result is tested for 49M nodes power grid network.

#### **Our Contributions**



# Experimental Results\*

| PG Circuits | $t_{RW}$ (s) | $t_{GS}(s)$ | $t_{HLS}$ (s) | $t_{levy}(s)$ | Speedup                        | Speedup                        | Speedup                         |
|-------------|--------------|-------------|---------------|---------------|--------------------------------|--------------------------------|---------------------------------|
|             |              |             |               |               | $\left(t_{RW}/t_{levy}\right)$ | $\left(t_{GS}/t_{levy}\right)$ | $\left(t_{HLS}/t_{levy}\right)$ |
| pgckt_40K   | 0.30         | 0.36        | 0.82          | 0.17          | 1.76×                          | 2.11×                          | 4.82×                           |
| pgckt_90K   | 0.65         | 1.62        | 1.84          | 0.30          | 2.16×                          | $5.40 \times$                  | 6.13×                           |
| pgckt_250K  | 1.92         | 6.78        | 6.06          | 0.86          | $2.23 \times$                  | 7.88 	imes                     | 7.04×                           |
| pgckt_640K  | 7.98         | 19.31       | 20.00         | 2.19          | $3.64 \times$                  | 8.81 	imes                     | 9.13×                           |
| pgckt_1M    | 18.95        | 27.85       | 39.55         | 3.51          | 5.39×                          | $7.93 \times$                  | 11.26×                          |
| pgckt_4M    | 297.21       | 117.76      | 154.78        | 14.61         | $20.34 \times$                 | $8.06 \times$                  | 10.59×                          |
| pgckt_9M    | 1513.4       | 272.74      | 349.66        | 33.88         | $44.66 \times$                 | 8.05 	imes                     | 10.32×                          |
| pgckt_16M   | 3326.44      | 486.03      | 651.15        | 61.49         | $54.09 \times$                 | 7.90 	imes                     | 10.58×                          |
| pgckt_25M   | 6263.80      | 760.10      | 1034.36       | 112.85        | $55.50 \times$                 | $6.73 \times$                  | 9.16×                           |
| pgckt_36M   | 9800.35      | 1094.01     | 1562.71       | 167.63        | $58.46 \times$                 | $6.52 \times$                  | 9.32×                           |
| pgckt_49M   | 14065.90     | 1498.20     | 2430.38       | 232.91        | $60.39 \times$                 | $6.43 \times$                  | 10.43×                          |

TABLE I Speedup Analysis of Proposed method using Lévy flight on CPU

Max speedup ~60X over RW method

# First Contribution Summary

- Fast power grid method is proposed.
- We have achieved max. ~60X speedup over RW method for 49M node PG circuit.
- Solutions of the proposed method is similar to RW method (3-4% error).

# Second Contribution: Design Space Exploration of PG Interconnects\*

- First IR drop minimization problem using metaheuristics.
- First Multiobjective DSE for PG Design
- Problem Formulation
- Proposed DSE Framework
- Results.

\* Parts of work is published by Dey et al. ISVLSI'18. & Elsevier MICPRO 2021

# Problem Formulation: IR Drop Minimization



$$1 \xrightarrow{R_1} X \xrightarrow{R_2} I \xrightarrow{R_1} X \xrightarrow{R_3} 3$$

$$egin{aligned} arphi &= \sum_{i=1}^b |I_i| R_i \ &= \sum_{i=1}^b |I_i| rac{
ho \, l_i}{w_i}, \end{aligned}$$

/ -

$$\begin{array}{ll} \underset{I_{i},w_{i}}{\text{minimize}} & \upsilon(I_{i},w_{i}) \\ \text{subject to} & |I_{i\in E}|R_{i\in E} \leq \xi, \sum_{i=1}^{b}l_{i}w_{i} \leq \mathscr{A}_{max} \\ & \frac{I_{i\in E}}{w_{i\in E}} \leq I_{m}, w_{i\in E} \geq w_{min} \\ & \sum_{i=1}^{K}I_{j_{i}} + I_{x} = 0 \; \forall j \in V \end{array}$$

31

# Problem Formulation: IR Drop-Area Minimization

• *P* represents the set of nodes and *Q* represents the set of branches

$$\begin{array}{ll} \underset{I_{i},w_{i}}{\text{minimize}} & \begin{cases} v_{\text{IR total}} = \sum_{i=1}^{b} |I_{i}| \frac{\rho l_{i}}{w_{i}} \\ A = \sum_{i=1}^{b} l_{i}w_{i} \end{cases} \\ \text{subject to} & |I_{i}| \frac{\rho l_{i}}{w_{i}} \leq \xi \; \forall i \in Q, \end{cases} \\ & \sum_{i=1}^{b} A_{i} \leq \mathcal{A}_{max}, \\ & \sum_{i=1}^{K} I_{j_{i}} + I_{x} = 0 \; \forall j \in P \; (\text{KCL}), \\ & I_{i} \leq I_{max}. \end{cases} \end{array}$$

#### Proposed Framework



#### Pareto Front of a Subcircuit of ibmpg1



#### Note: Knee Point is selected as the suitable point from the Pareto front

#### Results

~7.31% reduction in worst-case IR drop ~8.51% reduction in metal routing area



Figure: Two instances of overall IR drop map for ibmpg2 benchmark circuit. (a) Before optimization. (b) After using the proposed framework.

# Second Contribution Summary

- DSE framework helps in obtaining IR drop and Area minimization for PG designs.
- These estimations help designer to obtain initial idea about the design parameters.

# Third Contribution: Machine Learning Approach in PG Design\*

- First ML Model for PG Design
- Problem Formulation
- Feature Extraction
- Training Data Generation
- Proposed Learning Model
- Results

#### Overview of ML Approach



## Feature Selection & Training Data

- r<sup>2</sup> score is evaluated
- Power Grid Designs Extracted from the IBM processors are employed.
- Label dataset is prepared for supervised learning.
- Input features: (x,y),  $I_d$
- Output feature: w<sub>i</sub>



# **Problem Formulation**

- Problem 1: Given (x,y) coordinate and I<sub>d</sub>, then predict metal width required which satisfy IR and EM margin.
- Problem 2: Given the width and I<sub>d</sub>, predict IR drop of the PG interconnect.



## ML Model



- 10 hidden layers.
- Adam optimizer.
- Trained with the generated training data.
- Testdata set is created by perturbing training dataset and performance is tested.

#### ML Model



# Experimental Results:

|             | Time (sec)   | Speedup         |                                                                        |
|-------------|--------------|-----------------|------------------------------------------------------------------------|
| PG circuits | Conventional | PowerPlanningDL | $\frac{\mathbf{Time}_{Conventional}}{\mathbf{Time}_{PowerPlanningDL}}$ |
| ibmpg1      | 6.85         | 3.56            | $1.92 \times$                                                          |
| ibmpg2      | 23.46        | 11.88           | $1.97 \times$                                                          |
| ibmpg3      | 29.50        | 8.07            | $3.59 \times$                                                          |
| ibmpg4      | 52.4         | 11.83           | $4.42 \times$                                                          |
| ibmpg5      | 74.80        | 12.74           | $5.87 \times$                                                          |
| ibmpg6      | 97.5         | 17.41           | $5.60 \times$                                                          |
| ibmpgnew1   | 102.58       | 21.50           | $4.77 \times$                                                          |
| ibmpgnew2   | 48.60        | 10.86           | $4.47 \times$                                                          |

~**5-6X** max. speedup

## Experimental Results: IR drop map

Same quality of

results using ML!



Figure: IR Drop Map of IBMPG2 using Conventional



Figure: Predicted IR Drop Map IBMPG2



Figure: Predicted IR Drop Map IBMPG6

Figure: IRDrop Map of IBMPG6 using Conventional

# Third Contribution Summary

- Same quality of results using ML.
- Fast convergence (~5-6X max. Speedup)
- Low overhead (~2% error)

# Fourth Contribution: Machine Learning Approach in Aging Prediction of PG\*

- First ML Model for Aging Prediction of PG
- Problem Formulation
- Feature Extraction
- Training Data Generation
- Proposed Learning Model
- Proposed Failure Criterion
- Results show improvement than SOTA

\* Published by Dey et al. "Machine Learning Approach for Fast Electromigration Aware Aging Prediction in Incremental Design of Large Scale On-Chip Power Grid Network" in ACM TODAES.

### Overview of the ML Model



## Feature Selection & Training Data

- r<sup>2</sup> score is evaluated
- Power Grid Designs Extracted from the IBM processors are employed.
- Label dataset is prepared for supervised learning.
- Input features: J, L, T, IR drop of the interconnect
- Output feature: MTTF or Mean Lifetime.

### **Proposed Method Flow**



## Experimental Results : CPU Runtime

|              | CPU Runtime (t) (Hours) |               |                     |                     |              | Speedup              |                         |                      |                      |
|--------------|-------------------------|---------------|---------------------|---------------------|--------------|----------------------|-------------------------|----------------------|----------------------|
| Methods      | <b>TCAD2016</b> [2]     | ICCAD2017 [3] | <b>TCAD2018</b> [4] | <b>IRPS2019</b> [5] | Proposed     | $\frac{t_H}{t_{ML}}$ | $\frac{t_{Ch}}{t_{ML}}$ | $\frac{t_C}{t_{ML}}$ | $\frac{t_N}{t_{ML}}$ |
|              | $(t_H)$                 | $(t_{Ch})$    | $(t_C)$             | $(t_N)$             | $(t_{ML})$   |                      |                         |                      |                      |
| PG Circuits  |                         |               |                     |                     |              |                      |                         |                      |                      |
| PG1          | 0.02                    | 0.02          | 0.001               | 0.000166            | 0.0001       | $200 \times$         | $200 \times$            | $10 \times$          | $1.66 \times$        |
| ibmpg1       | 0.05                    | 0.03          | 0.003               | 0.01000             | 0.0003       | $166.66\times$       | $100 \times$            | $10 \times$          | $33.33 \times$       |
| ibmpg2       | 0.11                    | 0.31          | 0.04                | 0.02000             | 0.002        | $55 \times$          | $155 \times$            | $20 \times$          | $10 \times$          |
| ibmpg3       | 5.83                    | 4.27          | 0.41                | 0.07000             | 0.009        | $647.77\times$       | $610 \times$            | $45.55 \times$       | $7.77 \times$        |
| ibmpg4       | 14.71                   | 6.81          | 2.31                | 0.11000             | 0.007        | $2101.42\times$      | $972.85 \times$         | $330 \times$         | $15.71 \times$       |
| ibmpg5       | 0.69                    | 0.25          | 0.06                | 0.03000             | 0.006        | $115 \times$         | $41.66 \times$          | $10 \times$          | $5 \times$           |
| ibmpg6       | 1.75                    | 2.07          | 0.79                | 0.23330             | 0.009        | $194.44 \times$      | $230 \times$            | $87.77 \times$       | $25.92 \times$       |
| ibmpgnew1    | 16.78                   | 0.42          | 1.24                | 0.08000             | 0.013        | $1290.76\times$      | $32.06 \times$          | $95.38 \times$       | $6.15 \times$        |
| ibmpgnew2    | 15.32                   | 2.60          | 0.43                | 0.06000             | 0.008        | $1915 \times$        | $325 \times$            | $53.75 \times$       | $7.50 \times$        |
| PG2          | 10.94                   | 1.12          | 1.06                | 0.10166             | 0.010        | $1094 \times$        | $112 \times$            | $106 \times$         | $10.06 \times$       |
| PG3          | -                       | -             | -                   | 0.13666             | 0.04200      | -                    | -                       | -                    | $3.25 \times$        |
| PG4          | -                       | -             | -                   | 0.25666             | 0.10100      | -                    | -                       | -                    | $2.54 \times$        |
| Avg. Speedup |                         |               |                     |                     | $778 \times$ | $277.85 \times$      | $76.84 \times$          | $10.74 \times$       |                      |

#### Our proposed ML-approach achieved significant speedup than all SOTA.

#### Experimental Results : MTTF

|             | MTTF ( $\mu$ ) (years) |               |                     |                     |              |  |  |
|-------------|------------------------|---------------|---------------------|---------------------|--------------|--|--|
| Methods     | TCAD2016 [2]           | ICCAD2017 [3] | <b>TCAD2018</b> [4] | <b>IRPS2019</b> [5] | Proposed     |  |  |
|             | $(\mu_H)$              | $(\mu_{Ch})$  | $(\mu_C)$           | $(\mu_N)$           | $(\mu_{ML})$ |  |  |
| PG Circuits |                        |               |                     |                     |              |  |  |
| PG1         | 14.01                  | 6.10          | 8.51                | 6.5                 | 13.25        |  |  |
| ibmpg1      | 12.55                  | 6.50          | 10.91               | 7.0                 | 12.10        |  |  |
| ibmpg2      | 18.75                  | 6.78          | 10.11               | 12.1                | 12.55        |  |  |
| ibmpg3      | 31.96                  | 6.66          | 9.95                | 6.7                 | 12.25        |  |  |
| ibmpg4      | 33.39                  | 9.83          | 11.95               | 16.7                | 17.48        |  |  |
| ibmpg5      | 25.16                  | 6.54          | 6.63                | 6.3                 | 10.33        |  |  |
| ibmpg6      | 19.87                  | 9.53          | 11.96               | 11.2                | 12.41        |  |  |
| ibmpgnew1   | 25.96                  | 13.24         | 11.64               | 13.2                | 14.56        |  |  |
| ibmpgnew2   | 21.80                  | 5.72          | 6.72                | 7.3                 | 13.24        |  |  |
| PG2         | 17.85                  | 8.32          | 9.32                | 10.3                | 11.21        |  |  |
| PG3         | -                      | -             | -                   | 7.2                 | 10.51        |  |  |
| PG4         | -                      | -             | -                   | 6.8                 | 8.47         |  |  |

### Accuracy wise our ML approach is the closest to accurate method of TCAD2016, and better than all other SOTA results.

# Fourth Contribution Summary

- ML approach can predict EM aging in PG design.
- MTTF value compared with accurate model, and compared to the other SOTA models.
- Proposed ML framework is faster than all SOTA models.
- Speeding up the MTTF prediction process helps in overall design sign-off time

## Conclusions and Future works

- A fast and more effective power grid analysis technique is proposed, reducing the solving time of the circuit.
- We also attempt to design the power grid interconnects more efficiently by obtaining an optimum trade-off.
- This study includes machine learning techniques for power grid design and Electromigration-aware aging prediction of the power grid network.
- Works of the thesis will help the power grid designer to obtain an initial idea of different design metrics and to handle the reliability issues in the process of designing cost-effective as well as reliable chip.