# **Embedded System Design**

Modeling, Synthesis, Verification

Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner



**Chapter 6: Hardware Synthesis** 

7/8/2009

- Design flow
- RTL architecture
- Input specification
- Specification profiling
- RTL synthesis
  - Variable merging (Storage sharing)
  - Operation Merging (FU sharing)
  - Connection Merging (Bus sharing)
- Chaining and multi-cycling
- · Data and control pipelining
- Scheduling
- · Component interfacing
- Conclusions





# Hardware Synthesis ✓ Design flow • RTL architecture • Input specification • Specification profiling • RTL synthesis • Variable merging (Storage sharing) • Operation Merging (FU sharing) • Connection Merging (Bus sharing) • Chaining and multi-cycling • Data and control pipelining • Scheduling • Component interfacing • Conclusions

Chapter 6: Hardware Synthesis

7/8/2009

Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner







- ✓ Design flow
- √ RTL architecture
- · Input specification
- Specification profiling
- RTL synthesis
  - Variable merging (Storage sharing)
  - Operation Merging (FU sharing)
  - Connection Merging (Bus sharing)
- Chaining and multi-cycling
- Data and control pipelining
- Scheduling
- Component interfacing
- Conclusions

### **Input Specification**

- Programming language (C/C++, ...)
  - · Programming semantics requires pre-synthesis optimization
- System description language (SystemC, ...)
  - · Simulation semantics requires pre-synthesis optimization
- Control/Data flow graph (CDFG)
  - CDFG generation requires dependence analysis
- Finite state machine with data (FSMD)
  - · State interpretation requires some kind of scheduling
- RTL netlist
  - · RTL design that requires only input and output logic synthesis
- Hardware description language (Verilog / VHDL)
  - · HDL description requires RTL library and logic synthesis

Embedded System Design
© 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

7/8/2009

g

### **C Code for Ones Counter**

### Programming language semantics

- Sequential execution,
- Coding style to minimize coding

### •HW design

- Parallel execution,
- Communication through signals

```
01: int OnesCounter(int Data){
02: int Ocount = 0;
03: int Temp, Mask = 1;
04: while (Data > 0) {
05: Temp = Data & Mask;
06    Ocount = Data + Temp;
07: Data >>= 1;
08: }
09: return Ocount;
10: }
```

Function-based C code

```
01: while(1) {
02:
      while (Start == 0);
03:
    Done = 0;
     Data = Input;
Ocount = 0;
04:
05:
06:
     Mask = 1;
     while (Data>0) {
07:
08:
       Temp = Data & Mask;
09:
        Ocount = Ocount + Temp;
10:
        Data >>= 1;
11:
12:
      Output = Ocount;
13:
       Done = 1;
14: }
```

RTL-based C code

Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

4

Chapter 6: Hardware Synthesis

7/8/2009

### **CDFG for Ones Counter**

### Control/Data flow graph

- •Resembles programming language
  - •Loops, ifs, basic blocks (BBs)
- •Explicit dependencies
  - •Control dependences between BBs
  - •Data dependences inside BBs
- •Missing dependencies between BBs



Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner



Chapter 6: Hardware Synthesis

7/8/2009

11

### **FSMD** for Ones Counter

### •FSMD more detailed then CDFG

- •States may represent clock cycles
- •Conditionals and statements executed concurrently
- All statement in each state executed concurrently
- •Control signal and variable assignments executed concurrently
- •FSMD includes scheduling
- •FSMD doesn't specify binding or connectivity



Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner



Chapter 6: Hardware Synthesis

7/8/2009





### **HDL** description of Ones Counter

### HDL description

- •Same as RTL description
- Several levels of abstraction
  - Variable binding to storage
  - •Operation binding to FUs
  - •Transfer binding to connections
- Netlist must be synthesized
- •Partial HLS may be needed

```
02: always@(posedge clk)
03: begin : output_logic
04:
       case (state)
05:
         S4: begin
06:
07:
          B1 = RF[0];
          B2 = RF[1];
08:
         B3 = alu(B1, B2, l_and);
RF[3] = B3;
next_state = S5;
09:
10:
11:
12:
        end
13:
        S7: begin
14:
          B1 = RF[2];
15:
16:
          Outport <= B1;
17:
          done <= 1;
18:
          next_state = S0;
19:
         end
20:
       endcase
21:
     end
22:
    endmodule
```

Embedded System Design
© 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

7/8/2009

15

## **Hardware Synthesis**

- ✓ Design flow
- √ RTL architecture
- ✓ Input specification
- Specification profiling
- RTL synthesis
  - Variable merging (Storage sharing)
  - Operation Merging (FU sharing)
  - Connection Merging (Bus sharing)
- Chaining and multi-cycling
- Data and control pipelining
- Scheduling
- Component interfacing
- Conclusions

Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

7/8/2009

# **Profiling and Estimation**

- · Pre-synthesis optimization
- · Preliminary scheduling
  - Simple scheduling algorithm
- Profiling
  - Operation usage
  - · Variable life-times
  - Connection usage
- Estimation
  - Performance
  - Cost
  - Power

Embedded System Design
© 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

7/8/2009

17

### **Square-Root Algorithm (SRA)** • SQR = max((0.875x + 0.5y), x)• x = max(|a|, |b|)• y= min (|a|, |b|) Start In1 In2 Control Controller Datapath Signals Done Out Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner Chapter 6: Hardware Synthesis 7/8/2009 18





- ✓ Design flow
- √ RTL architecture
- ✓ Input specification
- ✓ Specification profiling
- · RTL synthesis
  - Variable merging (Storage sharing)
  - Operation Merging (FU sharing)
  - Connection Merging (Bus sharing)
- Chaining and multi-cycling
- Data and control pipelining
- Scheduling
- Component interfacing
- Conclusions

Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

7/8/2009

21

## **Datapath Synthesis**

- Variable Merging (Storage Sharing)
- Operation Merging (FU Sharing)
- Connection Merging (Bus Sharing)
- Register merging (RF sharing)
- · Chaining and Multi-Cycling
- Data and Control Pipelining

























- ✓ Design flow
- √ RTL architecture
- ✓ Input specification
- ✓ Specification profiling
- √ RTL synthesis
  - Variable merging (Storage sharing)
  - Operation Merging (FU sharing)
  - Connection Merging (Bus sharing)

### Chaining and multi-cycling

- Data and control pipelining
- Scheduling
- Component interfacing
- Conclusions

Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

7/8/2009





- ✓ Design flow
- ✓ RTL architecture
- √ Input specification
- ✓ Specification profiling
- √ RTL synthesis
  - Variable merging (Storage sharing)
  - Operation Merging (FU sharing)
  - Connection Merging (Bus sharing)
- ✓ Chaining and multi-cycling
- · Data and control pipelining
- Scheduling
- Component interfacing
- Conclusions

# **Pipelining**

- Functional Unit pipelining
  - Two or more operation executing at the same time
- Datapath pipelining
  - Two or more register transfers executing at the same time
- Control Pipelining
  - · Two or more instructions generaqted at the same time

Embedded System Design
© 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

7/8/2009













- ✓ Design flow
- √ RTL architecture
- ✓ Input specification
- ✓ Specification profiling
- ✓ RTL synthesis
  - Variable merging (Storage sharing)
  - Operation Merging (FU sharing)
  - Connection Merging (Bus sharing)
- ✓ Chaining and multi-cycling
- ✓ Data and control pipelining
- Scheduling
- Component interfacing
- Conclusions

# **Scheduling**

- Scheduling assigns clock cycles to register transfers
- Non-constrained scheduling
  - · ASAP scheduling
  - ALAP scheduling
- Constrained scheduling
  - · Resource constrained (RC) scheduling
    - Given resources, minimize metrics (time, power, ...)
  - Time constrained (TC) scheduling
    - Given time, minimize resources (FUs, storage, connections)

Embedded System Design
© 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

7/8/2009













- ✓ Design flow
- √ RTL architecture
- ✓ Input specification
- ✓ Specification profiling
- ✓ RTL synthesis
  - Variable merging (Storage sharing)
  - Operation Merging (FU sharing)
  - Connection Merging (Bus sharing)
- ✓ Chaining and multi-cycling
- ✓ Data and control pipelining
- ✓ Scheduling
- · Component interfacing
- Conclusions









### Conclusion

- Synthesis techniques
  - Variable Merging (Storage Sharing)
  - Operation Merging (FU Sharing)
  - Connection Merging (Bus Sharing)
- Architecture techniques
  - Chaining and Multi-Cycling
  - Data and Control Pipelining
  - · Forwarding and Caching
- Scheduling
  - Metric constrained scheduling
- Interfacing
  - Part of HW component
  - · Bus interface unit
- If too complex, use partial order

Embedded System Design
© 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

7/8/2009