

2014 IEEE International Workshop on **Si**gnal **P**rocessing **S**ystems October 20 – 22 2014, Belfast, UK SIPS 2014





Carlo Sau and Luigi Raffo Università degli Studi di Cagliari DIEE – Dept. of Electrical and Electronics Eng. EOLAB - Microelectronics and Bioeng. Lab. Francesca Palumbo Università degli Studi di Sassari PolComIng – Information Engineering Unit





# OUTLINE

- Introduction
  - Problem statement
  - Background
  - The power issue
- Automatic Power-Awareness Strategies
  - Baseline Multi-Dataflow Composer
  - Static Power: Structural Optimization
  - Dynamic Power: Behavior Optimization
- Performance Assessment
  - Design Under Test
  - Structural Evaluation
  - Behavior Evaluation
- Final Remarks and Future Directions

# OUTLINE

- Introduction
  - Problem statement
  - Background
  - The power issue
- Automatic Power-Awareness Strategies
  - Baseline Multi-Dataflow Composer
  - Static Power: Structural Optimization
  - Dynamic Power: Behavior Optimization
- Performance Assessment
  - Design Under Test
  - Structural Evaluation
  - Behavior Evaluation
- Final Remarks and Future Directions

# **PROBLEM STATEMENT**

#### **CONSUMER NEEDS:**

- **HIGH PERFORMANCES** real time applications:
  - Media players, video calling...
- UP-TO-DATE SOLUTIONS
  - Support for the last audio/video codecs, file formats...
- MORE INTEGRATED FEATURES in mobile devices:
  - MP<sub>3</sub>, Camera, Video, GPS...
- LONG BATTERY LIFE
  - Convenient form factor, affordable price...





# PROBLEM STATEMENT

#### **CONSUMER NEEDS:**

- **HIGH PERFORMANCES** real time applications:
  - Media players, video calling...
- UP-TO-DATE SOLUTIONS
  - Support for the last audio/video codecs, file formats...
- MORE INTEGRATED FEATURES in mobile devices:
  - MP<sub>3</sub>, Camera, Video, GPS...
- LONG BATTERY LIFE
  - Convenient form factor, affordable price...





#### **POSSIBLE SOLUTION:**

- DATAFLOW MODEL OF COMPUTATION
  - Modularity and parallelism 
     → EASIER INTEGRATION AND FAVOURED RE-USABILITY
- COARSE-GRAINED RECONFIGURABILITY
  - − Flexibility and resource sharing → MULTI-APPLICATION PORTABLE DEVICES

B

 $\square$ 

#### DATAFLOW FORMALISM

- Directed graph of actors (functional units).
- Actors exchange tokens (data packets) through dedicated channels.

#### CHARATERISTICS

 Explicit the intrinsic application parallelism.

actions

state

• Modularity favours model long-term adaptivity.



#### DATAFLOW FORMALISM

- Directed graph of actors (functional units).
- Actors exchange tokens (data packets) through dedicated channels.

#### CHARATERISTICS

- Explicit the intrinsic application parallelism.
- Modularity favours model long-term adaptivity.



#### FINE- GRAINED (FG) RECONFIGURATION

- High flexibility bit-level reconfiguration
- Slow and memory expensive configuration phase
- Suitable for applications with high control flow **COARSE-GRAINED (CG) RECONFIGURATION**
- Medium flexibility word-level reconfiguration
- Fast configuration phase
- Suitable for applications with high level of instruction/data parallelism



SiPS 2014 - 2014 October 22<sup>nd</sup> - Belfast (United Kingdom) - Carlo Sau



# THE POWER ISSUE



# THE POWER ISSUE



Modern systems need to take into consideration both **STATIC AND DYNAMIC POWER** since the **EARLY STAGES** of the design flow (architectural level)



# OUTLINE

- Introduction
  - Problem statement
  - Background
  - The power issue
- Automatic Power-Awareness Strategies
  - Baseline Multi-Dataflow Composer
  - Static Power Management: Structural Optimization
  - Dynamic Power Management: Behavior Optimization
- Performance Assessment
  - Design Under Test
  - Structural Evaluation
  - Behavior Evaluation
- Final Remarks and Future Directions













SiPS 2014 - 2014 October 22<sup>nd</sup> - Belfast (United Kingdom) - Carlo Sau













# STATIC POWER MANAGEMENT: STRUCTURAL OPTIMIZATION $A \rightarrow \alpha \qquad \beta \qquad \gamma$ $C \rightarrow D \qquad D \rightarrow B \rightarrow D \qquad B \rightarrow D$

#### STATIC POWER MANAGEMENT: STRUCTURAL OPTIMIZATION $A \ \alpha \qquad \beta \qquad \gamma$ $C \rightarrow D \qquad D \rightarrow B \rightarrow D \qquad B \rightarrow D$



B

# STATIC POWER MANAGEMENT: STRUCTURAL OPTIMIZATION $A \rightarrow \alpha \qquad \beta \qquad \gamma$ $C \rightarrow D \qquad D \rightarrow B \rightarrow D \qquad B \rightarrow D$



longest chain 2 SBoxes

SiPS 2014 - 2014 October 22<sup>nd</sup> - Belfast (United Kingdom) - Carlo Sau





longest chain 2 SBoxes



















SiPS 2014 - 2014 October 22<sup>nd</sup> - Belfast (United Kingdom) - Carlo Sau





SiPS 2014 - 2014 October 22<sup>nd</sup> - Belfast (United Kingdom) - Carlo Sau





#### STATIC POWER MANAGEMENT: STRUCTURAL OPTIMIZATION



#### STATIC POWER MANAGEMENT: STRUCTURAL OPTIMIZATION



#### STATIC POWER MANAGEMENT: STRUCTURAL OPTIMIZATION





SiPS 2014 - 2014 October 22<sup>nd</sup> - Belfast (United Kingdom) - Carlo Sau

















**POWER WASTING** due to the **RESOURCES** that are **NOT INVOLVED** in the current computation







SiPS 2014 - 2014 October 22<sup>nd</sup> - Belfast (United Kingdom) - Carlo Sau









SiPS 2014 - 2014 October 22<sup>nd</sup> - Belfast (United Kingdom) - Carlo Sau







SiPS 2014 - 2014 October 22<sup>nd</sup> - Belfast (United Kingdom) - Carlo Sau

### OUTLINE

- Introduction
  - Problem statement
  - Background
  - The power issue
- Automatic Power-Awareness Strategies
  - Baseline Multi-Dataflow Composer
  - Static Power: Structural Optimization
  - Dynamic Power: Behavior Optimization
- Performance Assessment
  - Design Under Test
  - Structural Evaluation
  - Behavior Evaluation
- Final Remarks and Future Directions

### DESIGN UNDER TEST

| APPLICATION | # KERNEL | # ACTORS | # SBOXES |
|-------------|----------|----------|----------|
| zoom        | 7        | 87       | 54       |



### **DESIGN UNDER TEST**

| APPLICATION | # KERNEL | # ACTORS | # SBOXES |
|-------------|----------|----------|----------|
| zoom        | 7        | 87       | 54       |

| RAM                                                         | **** | DPN                  | #ACTORS | #OCC |
|-------------------------------------------------------------|------|----------------------|---------|------|
| address_in data_out<br>address_out<br>data_in               |      | Min-Max              | 1       | 1050 |
| check_gb<br>in1 out<br>out<br>abs_1<br>in2 out2<br>in3 out3 |      | Abs                  | 1       | 3150 |
|                                                             |      | Sbwlabel             | 17      | 2722 |
|                                                             |      | Median               | 9       | 1069 |
| → inl out                                                   | 4    | Check_GeneralBilevel | 7       | 3072 |
| abs_2                                                       |      | Cubic                | 10      | 1070 |
| → in1 out → in2                                             |      | Cubic_Conv           | 6       | 408  |









# **BEHAVIORAL EVALUATION**

Synthesis trials have been performed through the Cadence SOC Encounter synthesizer targeting a 90 nm CMOS technology.

| DESIGN         | # of LRs | NOCG AREA<br>[µm2] | CG AREA<br>[µm2] | CG vs NOCG |
|----------------|----------|--------------------|------------------|------------|
| TOP.f          | 9        | 135819             | 136076           | +0.19%     |
| TOP.p          | 13       | 124026             | 124579           | +0.25%     |
| TOP.p vs TOP.f | +44.44%  | -8.68%             | -8.45%           | +31.58%    |

**NOCG** = without clock gating implementation **AUTO** = with the synthesizer automatic register-level clock gating implementation **CG** = with the proposed high-level clock gating implementation

# **BEHAVIORAL EVALUATION**

Synthesis trials have been performed through the Cadence SOC Encounter synthesizer targeting a 90 nm CMOS technology.

| DESIGN         | # of LRs | NOCG AREA<br>[µm2] | CG AREA<br>[µm2] | CG vs NOCG |
|----------------|----------|--------------------|------------------|------------|
| TOP.f          | 9        | 135819             | 136076           | +0.19%     |
| TOP.p          | 13       | 124026             | 124579           | +0.25%     |
| TOP.p vs TOP.f | +44.44%  | -8.68%             | -8.45%           | +31.58%    |

| DESIGN         | DYNAMIC POWER |            |  |
|----------------|---------------|------------|--|
| DESIGN         | CG vs NOCG    | CG vs AUTO |  |
| TOP.f          | -74.86%       | -69.06%    |  |
| TOP.p          | -71.30%       | -63.75%    |  |
| TOP.p vs TOP.f | -13.75%       | -14.39%    |  |

**NOCG** = without clock gating implementation **AUTO** = with the synthesizer automatic register-level clock gating implementation **CG** = with the proposed high-level clock gating implementation

### OUTLINE

- Introduction
  - Problem statement
  - Background
  - The power issue
- Automatic Power-Awareness Strategies
  - Baseline Multi-Dataflow Composer
  - Static Power: Structural Optimization
  - Dynamic Power: Behavior Optimization
- Performance Assessment
  - Design Under Test
  - Structural Evaluation
  - Behavior Evaluation
- Final Remarks and Future Directions

 Power consumption management is a challenging issue in modern embedded system designs

- Power consumption management is a challenging issue in modern embedded system designs
- The Multi-Dataflow Composer aims at:
  - Implementing coarse-grained multi-functional devices
  - Providing efficient power-aware architectures

- Power consumption management is a challenging issue in modern embedded system designs
- The Multi-Dataflow Composer aims at:
  - Implementing coarse-grained multi-functional devices
  - Providing efficient power-aware architectures
- MDC now integrates high-level power aware strategies reducing both static and dynamic power consumption

- Power consumption management is a challenging issue in modern embedded system designs
- The Multi-Dataflow Composer aims at:
  - Implementing coarse-grained multi-functional devices
  - Providing efficient power-aware architectures
- MDC now integrates high-level power aware strategies reducing both static and dynamic power consumption
- Future developments
  - Power gating on different logic regions
  - Improvements in the estimation models
  - Heuristic for the profiler design space exploration

### ACKNOWLEDGEMENTS

The research leading to these results has received funding from:









• the Region of Sardinia, Young Researchers Grant, POR Sardegna FSE 2007-2013, L.R.7/2007 "Promotion of the scientific research and technological innovation in Sardinia"



2014 IEEE International Workshop on **Si**gnal **P**rocessing **S**ystems October 20 – 22 2014, Belfast, UK SIPS 2014

**Carlo Sau** 

Università degli Studi di Cagliari DIEE – Dept. of Electrical and Electronics Eng. EOLAB - Microelectronics and Bioeng. Lab.

# Power-Awareness in Coarse-Grained Reconfigurable Designs: a Dataflow Based Strategy

# QUESTIONS

