0% found this document useful (0 votes)
40 views

A Method To Detect Hazards in Pipeline Processor: Yihui He

This is a method ...and i am sure that this will be helpfull to you

Uploaded by

Hassaan Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

A Method To Detect Hazards in Pipeline Processor: Yihui He

This is a method ...and i am sure that this will be helpfull to you

Uploaded by

Hassaan Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

MATEC Web of Conferences 139, 00085 (2017)

DOI: 10.1051/matecconf/201713900085
ICMITE 2017

A Method to Detect Hazards in Pipeline Processor


Yihui He1, Han Wan1,*, Bo Jiang1 and Xiaopeng Gao1
1Schoolof Computer Science and Engineering,Beihang University, Beijing, China
{zy1506229, wanhan, jiangbo, gxp}@buaa.edu.cn

Abstract. In order to improve the throughput of the processors, pipeline technique is widely used to
implement the instruction-level parallelism. However, this technique also leads to data hazards which has a
great influence on the performance. This paper proposed a method called supply-matching to detect and solve
data hazards efficiently. The logic of bypassing and stalling can be easily realized through this method.
Furthermore, an RTL description of instructions was also introduced in this paper to reduce resource
utilization. The case study was conducted through a five-stage microprocessor based on the PowerPC
architecture with different approaches. Experiment results show our method requires less resources and
achieves better performance.

1 Introduction and efficiently in Section 3. Then we introduce control


hazards and structural hazards in Section 4. Finally, we do
Pipeline technique is widely used in microprocessor an experiment to verify our method in Section 5.
design to improve performance. A pipeline processor can
execute multiple instructions within a clock cycle.
However, hazards arise if pipeline architecture is used, 2 Preliminary
including data hazards, control hazards and structural
hazards. Moreover, the problem becomes more complex 2.1 Architecture
when pipeline depth increases. Forwarding and stalling
are effective solutions to resolve hazards for processors in In this paper, processor adopts the architecture with five
embedded system. However, the complexity of detecting pipeline stages, as shown in Fig.1 (The italic words mean
hazards increases rapidly as the number of instructions the name of the core units. Other words mean the name of
increases. In this case, many combination of instructions interface and data).
may lead to hazards in unanticipated way. Performing a Function
Controller Function-signal
highly effective method to resolve huge plenty of hazards
IF_ID ID_EXE EXE_MEM MEM_WB
oriented from deep pipeline and large instruction set is WD_Addr
necessary. Cout
WD
Many studies have proposed some methods in pipeline rA
NPC PC Addr Ins rB Addr1 RD1 D1 Out A
32' b0 D2
processor design motivated by these situations mentioned PC +4 Instruction
Addr2
rS Addr3 RD2 MUX Addr
Memory
above. Amit Pandey and Yu Qiaoyan used class-based NPC PC4
Imm32
Register File
Imm16 Imm32 D1 Out B Data
Dout
D2
method to detect data hazard [1, 2]. This method divides NPC Imm24
EXT MUX ALU Data Memory
the whole instruction set into several parts so big problem WD_Addr
are broken up into smaller ones. P. Bernardi and D. Instruction
Boyang proposed a SBST algorithm [3]. Jiajing Lu Stall-signal Bypass-signal
designed a dynamic scheduling algorithm to improve the Stall Controller Bypass Controller

pipeline efficiency, which only increases one single- IF ID EXE MEM WB


instruction buffer and some combination logic [4]. Also, 1. Instruction Fetch 2. Decode/ 3. Execute 4. Memory 5.Write
Register Read Back
Schönherr J, Schreiber I, and Fordran E proposed a Fig. 1. The architecture of a 5-stage-pipeline microprocessor
method using symbolic model checking to detect hazards
in pipelined processor [5]. IF stage means fetching instructions from program
In this work, we aim to find solutions to resolve huge memory. PC, Next PC (NPC) and Instruction Memory
plenty of hazards oriented from deep pipeline and large (IM) are assigned in IF stage. ID stage is assumed to
instruction set. Firstly, we briefly introduce the decode the instruction and read or write register file. Also,
architecture, controllers and discuss our method of the operation of expanding immediate is done in ID stage.
generating datapath according to the RTL description in EXE stage is used to execute arithmetic operations and
Section 2. Next, a method called supply-matching is logic operations. MEM stage is supposed to read or write
introduced to detect and resolve data hazards completely memory. And WB stage means the execution result of the

*
Corresponding author: [email protected]
© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution
License 4.0 (http://creativecommons.org/licenses/by/4.0/).
MATEC Web of Conferences 139, 00085 (2017) DOI: 10.1051/matecconf/201713900085
ICMITE 2017

instruction will be wrote back to register file. Four to realize bypassing technique. The logic that generates
pipeline registers are distributed in the pipeline the bypass-signal is more complicated than the logic of
architecture to store data tentatively. function-signals as hazard detection becomes more
difficult when the sum of instruction or the pipeline depth
increases. This paper will introduce an effective method
2.2 Notations and datapath
to detect hazards in the next chapter, so the logic of
An RTL description of instructions was also introduced in bypass-signals can be much easier to get. Stall-controller
this paper to reduce resource utilization. The rule is is responsible for pipeline stalling as some hazards
defined as follow: (1) A.B means port B of core unit A. situation cannot be resolved by using bypass technique. In
(2) X_Y means pipeline register. For example, IF_ID this situations, stall-controller generates stall-signals to
means pipeline register between IF stage and ID stage. (3) stall IF stage and ID stage. For pipeline registers, ID_EXE
B@X_Y means data B locked in pipeline register X_Y. clears all data and IF_ID remains unchanged. Also, PC
(4) Z[z] means field z of unit Z. For example, should remain the value of PC+4 to ensure the correctness
Ins@IF_ID[Imm16] means 16-bits immediate operand of of the order in which instructions are executed.
instruction stored in IF_ID. (5) A.B  C.D means data
transfer from port B of unit A to port D of unit C.
3 Data hazard
It is more easily to structure datapath by using the RTL
description. A datapath is a collection of functional units
(such as ALU or multipliers), registers and buses. Follow 3.1 Problem definition
the RTL rule, the data flow of each instruction is clear and
all units have been linked. Then merging the data flow in Data hazards are the hazards which are most frequently
the vertical direction to remove the repeated data flow of occurring in pipeline processor. Forwarding and stalling
the whole instruction set. Adding MUX unit if the core are effective solutions to resolve this problems. However,
unit has multiple inputs. The MUX control signal is the complexity of detecting hazards increases rapidly as
generated by the controller which described in Section 2.3. the number of instructions increases. In this situation,
Fig.2 shows the result of the method which mentioned many combination of instructions may lead to hazards in
above that aimed at ADD, SUBF, STW, LWZ, B unanticipated way. It is imperative to take completeness
instructions of PowerPC instruction set. detection of data hazards to ensure that all hazards
IF ID EXE combinations are considered.
PC.PC  IM.Addr Ins@IF_ID[11:15]  RF.Addr1 RD1@ID_EXE  ALU.A
PC.PC + 4  NPC.PC4 RF.RD1  ID_EXE RD2@ID_EXE  ALU.B
IM.Ins  IF_ID Ins@IF_ID[16:20]  RF.Addr2 ALU.Cout  EXE_MEM
NPC.NPC  PC.NPC RF.RD2  ID_EXE Ins@ID_EXE  EXE_MEM 3.2 Solution
Ins@IF_ID  ID_EXE (Ins@ID_EXE[11:15]==0)?32'd0:RD1@ID_EXE 
ALU.A
Ins@IF_ID[6:10]  RF.Addr3 Imm32@ID_EXE  ALU.B This paper proposed a method called supply-matching to
RF.RS  ID_EXE
Ins@IF_ID[16:31]  EXT.Imm16
RS@ID_EXE  EXE_MEM detect and solve data hazards completely and efficiently.
EXT.Imm32  NPC.Imm32 Ins@IF_ID[6:29]  EXT.Imm24 The method can be divided into two steps.
MEM WB  Build Tuse-Tnew matrix of all instructions for specified
Cout@EXE_MEM MEM_WB
Ins@EXE_MEM  MEM_WB
Cout@MEM_WB  RF.WD
Ins@MEM_WB[6:10]  RF.WD_Addr
instruction set. Then all value of Tuse of registers and
all value of Tnew for the processor can be got by
Cout@EXE_MEM  DM.Addr
RS@EXE_MEM  DM.Data synthesizing all records of T use-Tnew matrix.
DM.Dout  MEM_WB Dout@MEM_WB  RF.WD  Build register strategy matrix according to Tuse-Tnew
matrix. A Tuse-Tnew record can be uniquely
Fig. 2. RTL description of five instructions determined for specific register. According to the
supply-matching model which is described in
2.3 Controller design section 3.2.1, any data hazards can be detected and
resolved by using formula (1).
A three-controller architecture which includes function-
controller, bypass-controller and stall-controller is used in
3.2.1 Supply-matching model
this paper, as shown in Fig.1. Function-controller is
responsible for decoding the instruction and creating the Data hazards occur when instructions that exhibit data
function-signals to indicate what the core units should do. dependence modify data in different stages of a pipeline.
For example, if the instruction is ADD, the function- Therefore, data hazards detection can be transformed into
controller would create signal like DM_Wr to denote the detection of relationship between data demand and
whether Data Memory is wrote or not. Also, function- data supply. In this case, provider is the pipeline register
controller determines the data source of core units by which saved the execution result of last instruction. For
creating selecting signal of function-multiplexers. This example, EXE_MEM pipeline register and MEM_WB
function-signals are certain since they are directly pipeline register are the provider for all operation
appeared in RTL description for every instruction. So the instructions. Demander is the components which need the
final logic of the signals can be created by integrating all most up-to-date value saved in provider at present. For
instructions’ RTL directly. example, ALU is the demander for all operation
Bypass-controller is mainly in charge of bypass- instructions.
signals which choose the right data source as input to Two basic principles are defined in this method.
bypass-multiplexers. Bypass-controller design is the key

2
MATEC Web of Conferences 139, 00085 (2017) DOI: 10.1051/matecconf/201713900085
ICMITE 2017

 The stage of the instruction decoding must not be specific classification of the instructions. This method
earlier than ID stage whether the centralized greatly reduces the complexity of hazards detection even
decoding mode or distributed decoding mode is if the number of instructions increases. For example,
adopted. supposing the instruction set includes instructions of
 Forwarding technology has higher priority when both ADD, SUB, ANDI, ORI, LW, SW and BEQ. Concerning
forwarding and stalling technology can be used to the five-stage pipeline, the Tuse-Tnew matrix is shown in
resolve data hazards. Table 1.
Besides, two parameters called Tuse and Tnew are
defined. Table 1. Tuse-Tnew matrix
 Tuse means the number of clock cycles that a certain Target Tuse_1 Tuse_2 Tnew
functional unit will use the value saved in register Ins RS RT EXE MEM WB
after the instruction enters ID stage. Tuse is a static ADD 1 1 1 0 0
value and an instruction can have multiple T use SUB 1 1 1 0 0
according to the number of operands of the ANDI 1 Null 1 0 0
instruction. Meanwhile, Tuse≥0. ORI 1 Null 1 0 0
 Tnew means the minimum number of clock cycles that LW 1 Null 2 1 0
the instructions which at stages after ID stage will SW 1 2 Null Null Null
produce the result that will be wrote back to registers. BEQ 0 0 Null Null Null
Tnew is a dynamic value. The value reduces by 1 as {0,1} {0,1,2} {1,2} {0,1} {0}
instruction flows through the pipeline stage and the
value will no longer change once the value is 0. So Using the supply-matching model can easily build
an instruction has different Tnew at different stage. Tuse-Tnew matrix. The procedure also applies to processor
Meanwhile, Tnew≥0. The management of Tnew and which has deeper pipeline stage or larger instruction set.
Tuse in pipeline processor is shown in Fig.3.
Set Value of Tnew −ͳ −ͳ 3.2.3 Register strategy matrix
Get Value of Tuse
IF_ID
WD_Addr
ID_EXE EXE_MEM MEM_WB Register strategy matrix provides the resolution strategy
Cout of data hazards for specific register. Based on the Tuse-Tnew
WD
rA Addr1 RD1 D1 Out
matrix, a complete strategy matrix can be built for specific
NPC PC Addr Ins A
PC
rB Addr2
rS Addr3 RD2
32' b0 D2 register as the Tuse-Tnew matrix considers all instructions
+4 Instruction MUX Addr
NPC PC4
Memory Register File
D1 Out Data
Dout and all pipeline stages except the stage before ID stage
Imm32 Imm16 Imm32 B
NPC Imm24
D2
MUX ALU Data Memory (Basic principle 2 makes the rule. If the number of stages
EXT
WD_Addr
which before ID stage more than one, some additional
Instruction work should be done to detect and resolve the data hazards
IF ID EXE MEM WB
for this part). Formula (1) is used to structure strategy
1. Instruction Fetch 2. Decode/ 3. Execute 4. Memory 5.Write matrix. Taking the RT register as example, the result is
Register Read Back
shown in Table 2.
Fig. 3. The management of Tnew and Tuse in pipeline processor
Based on the above, solution of data hazards becomes Table 2. Strategy matrix for RT register
digital. According to the definition of T use and Tnew, each
Tnew EXE MEM WB
instruction has a set of digits which represent Tuse and Tnew.
Tuse 1 2 0 1 0
For a specific instruction set, the comparison between T use
and Tnew of different instructions reflects data dependence. 0 Stall Stall Bypass Stall Bypass
If the Tnew > Tuse, data hazards can only be solved by 1 Bypass Stall Bypass Bypass Bypass
stalling as the result writes back too late. If the T new ≤ 2 Bypass Bypass Bypass Bypass Bypass
Tuse, data hazards can be solved by forwarding. The
formula is as followed. This strategy matrix is correct and can be proofed. We
Stall 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 < 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 can prove it from follow aspects:
Operation = { (1) 1. According to the definition of Tuse and Tnew, Tuse≥0,
Forward 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 ≥ 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇
Tnew≥0. So there just three relationship between Tuse
3.2.2 Tuse-Tnew matrix and Tnew, which are Tuse > Tnew, Tuse < Tnew and Tuse =
Tnew.
The Tuse-Tnew matrix is used to find all the value of T use 2. If Tuse > Tnew, it indicates that the instruction before
and Tnew of all instructions. The Tuse-Tnew record of an has already finished computing the write-back data
instruction is certain depending on the definition of T use which the current instruction needs. So bypassing can
and Tnew if the pipeline architecture stays the same. be established correctly.
Therefore, a Tuse-Tnew matrix is certain too if all 3. If Tuse < Tnew, it means that current instruction cannot
instructions of specific instruction set are taken into get the dependency data from the subsequent stages.
account. In this situation, all the work is focusing on the Thus stalling is used to ensure processor performs
execution semantics of each instruction and completing correctly.
the matrix line by line, rather than concentrating on the

3
MATEC Web of Conferences 139, 00085 (2017) DOI: 10.1051/matecconf/201713900085
ICMITE 2017

4. If Tuse = Tnew, it shows that current instruction can get multiplexers were implemented with Verilog HDL.
related data immediately as the instruction before Meanwhile, hazards in pipeline are resolved based on our
finish computing at the same time. Forwarding method. Finally, binary codes were got from compiling
technology can be used here. the programs written in C or assembly by GCC. The result
A complete stalling control signal can be created by of registers after executing every instruction is obtained
using logic operation OR to integrate all the stalling from QEMU in single-step mode. The correctness of the
conditions. In other cases, data hazards can be resolved by processor can be determined by comparing the value of
forwarding. However, there may have many forwarding registers with the result from QEMU during the
sources represented data from multiple pipeline stages simulation. Spartan6-6SLX150FGG484 made by Xilinx
when using forwarding technology. In this situation, the is the FPGA used to synthesize and implement with ISE
priority of forwarding source should be set. This paper in our experiment.
adopts a forwarding strategy based on the pipeline priority.
The strategy sets the stage which Tuse stages behind ID Design Realization Simulation
Verilog
stage has the highest priority among the whole pipeline Architecture specification
MUX Hazards GCC
stages and the priority of other stages behind it are
decreasing in turn. Based above, just selecting forwarding Instruction set Controllers QEMU
source which has the highest priority when there are Core Units
Core Unit
modules
multiple forwarding sources. Pipeline
Architecture
Datapath EDA

Fig. 4. The experiment framework


4 Control hazard and structural hazard
Control hazards (branch hazards) cause by branch 5.2 Result and analysis
instructions. There are two main techniques to resolve
branch hazards, including branch prediction and branch By comparing the value of registers with the result from
delay slot. Other studies have already done this part QEMU, the correctness of the processor has been proved.
efficiently. Please refer to [6, 7, 8] for details. This also indicates the correctness of the supply-matching
Structural hazards occur when a part of the processor’s method. Meanwhile, we have compared the synthesize
hardware is needed by two or more instructions at the reports of two microprocessors which implemented the
same time. The strategy for resolving this hazards is supply-matching method using different decoding modes
simple. Just stalling the pipeline or copying the basic unit. in three aspects: clock frequency (MHz), the number of
Flip-Flops (FF), the number of BELs (which includes all
basic logic primitives like LUT, MUXCY, etc). The result
5 Experiment is shown in Table 3.
Table 3. Comparisons of method implementation using
5.1 Framework different decoding mode

The methods mentioned above were adopted to Parameter


Clock FF BEL
implement a five-stage PowerPC microprocessor which Mode
supports 72 instructions. It is significant to realize the Using centralized
8.46 3525 12844
whole microprocessor rather than the logic of hazards decoding mode
detection alone because microprocessor cannot work Using hybrid
59.46 3481 10380
normally if the system only supports the logic of hazards decoding mode
detection.
It is obvious to conclude that our method which using
There are two decoding modes which are centralized
hybrid decoding mode can gain faster clock frequency
decoding and distributed decoding in processor design. In
with less resources.
centralized decoding mode, controllers are assigned at ID
stage. Then control signals created by controllers transfer
through the pipeline. But in distributed decoding mode, 6 Conclusions
controllers are distributed in multiple stages and
controllers only create the control signal which related to In this paper, an efficient method called supply-matching
the core units in the same stage [9, 10]. In this paper, the to completely detect and resolve data hazards has been
supply-matching method uses a hybrid decoding mode proposed. The logic of bypassing and stalling can be
which means Tuse uses centralized decoding mode and easily integrated and implemented due to this method.
Tnew uses distributed decoding mode. However, in order Finally, we conducted extensive experiments based on a
to set up a control experiment, we also implemented the state-of-art microprocessor, PowerPC architecture with
supply-matching method only using centralized decoding five stage pipeline. Experiment results proof that our
mode. method can achieve faster clock frequency with less
The framework for the experiment as shown in Fig.4. resource than the well-known method called stage-
Firstly, instruction set, core units and pipeline architecture decoding.
were determined according to architecture specification.
After that, core unit modules, controllers, datapath, and

4
MATEC Web of Conferences 139, 00085 (2017) DOI: 10.1051/matecconf/201713900085
ICMITE 2017

Acknowledgment

This work has been supported by project of the


Professional Group Construction of Beijing Educational
Committee.

References
1. Pandey A. Study of data hazard and control hazard
resolution techniques in a simulated five stage
pipelined RISC processor. International Conference
on Inventive Computation Technologies. IEEE, 1-4
(2017).
2. Qiao-Yan Y, Peng L, Qing-Dong Y. A data hazard
detection method for DSP with heavily compressed
instruction set. (2004).
3. Bernardi P, Boyang D, Ciganda L, et al. A functional
test algorithm for the register forwarding and pipeline
interlocking unit in pipelined microprocessors.
Design and Test Symposium. IEEE, 1-6 (2014).
4. Lu J, Zhou X, Wang J. A novel dynamic scheduling
algorithm of data hazard for embedded processor.
International Conference on Asic. IEEE, 28-31
(2007).
5. Schönherr J, Schreiber I, Fordran E, et al. Hazard
Checking in Pipelined Processor Designs Using
Symbolic Model Checking. Euromicro Conference,
1999. Proceedings. IEEE, 1, 75-78 (1999).
6. E. Nurvitadhi, J. C. Hoe, T. Kam, and S. Lu.
Automatic pipelining from transactional datapath
specifications. In Design, Automation and Test in
Europe, DATE 2010, Dresden, Germany, March 8-
12, 2010, 1001-1004 (2010).
7. E. Nurvitadhi. Automatic pipeline synthesis and
formal verification from transactional datapath
specifications. Terapevticheski Arkhiv, 80(4), 73-6
(2008).
8. E. Nurvitadhi, J. C. Hoe, T. Kam, and S. Lu.
Automatic pipelining from transactional datapath
specifications. IEEE Trans. on CAD of Integrated
Circuits and Systems, 30(3), 441-454 (2011).
9. P. Yiannacouras, J. G. Steffan, and J. Rose.
Exploration and customization of fpga-based soft
processors. IEEE Trans. on CAD of Integrated
Circuits and Systems, 266–277 (2007).
10. Yiannacouras, Peter, Jonathan Rose, and J. Gregory
Steffan. The microarchitecture of FPGA-based soft
processors. Proceedings of the 2005 international
conference on Compilers, architectures and synthesis
for embedded systems, 202-212 (2005).

You might also like