A Method To Detect Hazards in Pipeline Processor: Yihui He
A Method To Detect Hazards in Pipeline Processor: Yihui He
DOI: 10.1051/matecconf/201713900085
ICMITE 2017
Abstract. In order to improve the throughput of the processors, pipeline technique is widely used to
implement the instruction-level parallelism. However, this technique also leads to data hazards which has a
great influence on the performance. This paper proposed a method called supply-matching to detect and solve
data hazards efficiently. The logic of bypassing and stalling can be easily realized through this method.
Furthermore, an RTL description of instructions was also introduced in this paper to reduce resource
utilization. The case study was conducted through a five-stage microprocessor based on the PowerPC
architecture with different approaches. Experiment results show our method requires less resources and
achieves better performance.
*
Corresponding author: [email protected]
© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution
License 4.0 (http://creativecommons.org/licenses/by/4.0/).
MATEC Web of Conferences 139, 00085 (2017) DOI: 10.1051/matecconf/201713900085
ICMITE 2017
instruction will be wrote back to register file. Four to realize bypassing technique. The logic that generates
pipeline registers are distributed in the pipeline the bypass-signal is more complicated than the logic of
architecture to store data tentatively. function-signals as hazard detection becomes more
difficult when the sum of instruction or the pipeline depth
increases. This paper will introduce an effective method
2.2 Notations and datapath
to detect hazards in the next chapter, so the logic of
An RTL description of instructions was also introduced in bypass-signals can be much easier to get. Stall-controller
this paper to reduce resource utilization. The rule is is responsible for pipeline stalling as some hazards
defined as follow: (1) A.B means port B of core unit A. situation cannot be resolved by using bypass technique. In
(2) X_Y means pipeline register. For example, IF_ID this situations, stall-controller generates stall-signals to
means pipeline register between IF stage and ID stage. (3) stall IF stage and ID stage. For pipeline registers, ID_EXE
B@X_Y means data B locked in pipeline register X_Y. clears all data and IF_ID remains unchanged. Also, PC
(4) Z[z] means field z of unit Z. For example, should remain the value of PC+4 to ensure the correctness
Ins@IF_ID[Imm16] means 16-bits immediate operand of of the order in which instructions are executed.
instruction stored in IF_ID. (5) A.B C.D means data
transfer from port B of unit A to port D of unit C.
3 Data hazard
It is more easily to structure datapath by using the RTL
description. A datapath is a collection of functional units
(such as ALU or multipliers), registers and buses. Follow 3.1 Problem definition
the RTL rule, the data flow of each instruction is clear and
all units have been linked. Then merging the data flow in Data hazards are the hazards which are most frequently
the vertical direction to remove the repeated data flow of occurring in pipeline processor. Forwarding and stalling
the whole instruction set. Adding MUX unit if the core are effective solutions to resolve this problems. However,
unit has multiple inputs. The MUX control signal is the complexity of detecting hazards increases rapidly as
generated by the controller which described in Section 2.3. the number of instructions increases. In this situation,
Fig.2 shows the result of the method which mentioned many combination of instructions may lead to hazards in
above that aimed at ADD, SUBF, STW, LWZ, B unanticipated way. It is imperative to take completeness
instructions of PowerPC instruction set. detection of data hazards to ensure that all hazards
IF ID EXE combinations are considered.
PC.PC IM.Addr Ins@IF_ID[11:15] RF.Addr1 RD1@ID_EXE ALU.A
PC.PC + 4 NPC.PC4 RF.RD1 ID_EXE RD2@ID_EXE ALU.B
IM.Ins IF_ID Ins@IF_ID[16:20] RF.Addr2 ALU.Cout EXE_MEM
NPC.NPC PC.NPC RF.RD2 ID_EXE Ins@ID_EXE EXE_MEM 3.2 Solution
Ins@IF_ID ID_EXE (Ins@ID_EXE[11:15]==0)?32'd0:RD1@ID_EXE
ALU.A
Ins@IF_ID[6:10] RF.Addr3 Imm32@ID_EXE ALU.B This paper proposed a method called supply-matching to
RF.RS ID_EXE
Ins@IF_ID[16:31] EXT.Imm16
RS@ID_EXE EXE_MEM detect and solve data hazards completely and efficiently.
EXT.Imm32 NPC.Imm32 Ins@IF_ID[6:29] EXT.Imm24 The method can be divided into two steps.
MEM WB Build Tuse-Tnew matrix of all instructions for specified
Cout@EXE_MEM MEM_WB
Ins@EXE_MEM MEM_WB
Cout@MEM_WB RF.WD
Ins@MEM_WB[6:10] RF.WD_Addr
instruction set. Then all value of Tuse of registers and
all value of Tnew for the processor can be got by
Cout@EXE_MEM DM.Addr
RS@EXE_MEM DM.Data synthesizing all records of T use-Tnew matrix.
DM.Dout MEM_WB Dout@MEM_WB RF.WD Build register strategy matrix according to Tuse-Tnew
matrix. A Tuse-Tnew record can be uniquely
Fig. 2. RTL description of five instructions determined for specific register. According to the
supply-matching model which is described in
2.3 Controller design section 3.2.1, any data hazards can be detected and
resolved by using formula (1).
A three-controller architecture which includes function-
controller, bypass-controller and stall-controller is used in
3.2.1 Supply-matching model
this paper, as shown in Fig.1. Function-controller is
responsible for decoding the instruction and creating the Data hazards occur when instructions that exhibit data
function-signals to indicate what the core units should do. dependence modify data in different stages of a pipeline.
For example, if the instruction is ADD, the function- Therefore, data hazards detection can be transformed into
controller would create signal like DM_Wr to denote the detection of relationship between data demand and
whether Data Memory is wrote or not. Also, function- data supply. In this case, provider is the pipeline register
controller determines the data source of core units by which saved the execution result of last instruction. For
creating selecting signal of function-multiplexers. This example, EXE_MEM pipeline register and MEM_WB
function-signals are certain since they are directly pipeline register are the provider for all operation
appeared in RTL description for every instruction. So the instructions. Demander is the components which need the
final logic of the signals can be created by integrating all most up-to-date value saved in provider at present. For
instructions’ RTL directly. example, ALU is the demander for all operation
Bypass-controller is mainly in charge of bypass- instructions.
signals which choose the right data source as input to Two basic principles are defined in this method.
bypass-multiplexers. Bypass-controller design is the key
2
MATEC Web of Conferences 139, 00085 (2017) DOI: 10.1051/matecconf/201713900085
ICMITE 2017
The stage of the instruction decoding must not be specific classification of the instructions. This method
earlier than ID stage whether the centralized greatly reduces the complexity of hazards detection even
decoding mode or distributed decoding mode is if the number of instructions increases. For example,
adopted. supposing the instruction set includes instructions of
Forwarding technology has higher priority when both ADD, SUB, ANDI, ORI, LW, SW and BEQ. Concerning
forwarding and stalling technology can be used to the five-stage pipeline, the Tuse-Tnew matrix is shown in
resolve data hazards. Table 1.
Besides, two parameters called Tuse and Tnew are
defined. Table 1. Tuse-Tnew matrix
Tuse means the number of clock cycles that a certain Target Tuse_1 Tuse_2 Tnew
functional unit will use the value saved in register Ins RS RT EXE MEM WB
after the instruction enters ID stage. Tuse is a static ADD 1 1 1 0 0
value and an instruction can have multiple T use SUB 1 1 1 0 0
according to the number of operands of the ANDI 1 Null 1 0 0
instruction. Meanwhile, Tuse≥0. ORI 1 Null 1 0 0
Tnew means the minimum number of clock cycles that LW 1 Null 2 1 0
the instructions which at stages after ID stage will SW 1 2 Null Null Null
produce the result that will be wrote back to registers. BEQ 0 0 Null Null Null
Tnew is a dynamic value. The value reduces by 1 as {0,1} {0,1,2} {1,2} {0,1} {0}
instruction flows through the pipeline stage and the
value will no longer change once the value is 0. So Using the supply-matching model can easily build
an instruction has different Tnew at different stage. Tuse-Tnew matrix. The procedure also applies to processor
Meanwhile, Tnew≥0. The management of Tnew and which has deeper pipeline stage or larger instruction set.
Tuse in pipeline processor is shown in Fig.3.
Set Value of Tnew −ͳ −ͳ 3.2.3 Register strategy matrix
Get Value of Tuse
IF_ID
WD_Addr
ID_EXE EXE_MEM MEM_WB Register strategy matrix provides the resolution strategy
Cout of data hazards for specific register. Based on the Tuse-Tnew
WD
rA Addr1 RD1 D1 Out
matrix, a complete strategy matrix can be built for specific
NPC PC Addr Ins A
PC
rB Addr2
rS Addr3 RD2
32' b0 D2 register as the Tuse-Tnew matrix considers all instructions
+4 Instruction MUX Addr
NPC PC4
Memory Register File
D1 Out Data
Dout and all pipeline stages except the stage before ID stage
Imm32 Imm16 Imm32 B
NPC Imm24
D2
MUX ALU Data Memory (Basic principle 2 makes the rule. If the number of stages
EXT
WD_Addr
which before ID stage more than one, some additional
Instruction work should be done to detect and resolve the data hazards
IF ID EXE MEM WB
for this part). Formula (1) is used to structure strategy
1. Instruction Fetch 2. Decode/ 3. Execute 4. Memory 5.Write matrix. Taking the RT register as example, the result is
Register Read Back
shown in Table 2.
Fig. 3. The management of Tnew and Tuse in pipeline processor
Based on the above, solution of data hazards becomes Table 2. Strategy matrix for RT register
digital. According to the definition of T use and Tnew, each
Tnew EXE MEM WB
instruction has a set of digits which represent Tuse and Tnew.
Tuse 1 2 0 1 0
For a specific instruction set, the comparison between T use
and Tnew of different instructions reflects data dependence. 0 Stall Stall Bypass Stall Bypass
If the Tnew > Tuse, data hazards can only be solved by 1 Bypass Stall Bypass Bypass Bypass
stalling as the result writes back too late. If the T new ≤ 2 Bypass Bypass Bypass Bypass Bypass
Tuse, data hazards can be solved by forwarding. The
formula is as followed. This strategy matrix is correct and can be proofed. We
Stall 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 < 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 can prove it from follow aspects:
Operation = { (1) 1. According to the definition of Tuse and Tnew, Tuse≥0,
Forward 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 ≥ 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇
Tnew≥0. So there just three relationship between Tuse
3.2.2 Tuse-Tnew matrix and Tnew, which are Tuse > Tnew, Tuse < Tnew and Tuse =
Tnew.
The Tuse-Tnew matrix is used to find all the value of T use 2. If Tuse > Tnew, it indicates that the instruction before
and Tnew of all instructions. The Tuse-Tnew record of an has already finished computing the write-back data
instruction is certain depending on the definition of T use which the current instruction needs. So bypassing can
and Tnew if the pipeline architecture stays the same. be established correctly.
Therefore, a Tuse-Tnew matrix is certain too if all 3. If Tuse < Tnew, it means that current instruction cannot
instructions of specific instruction set are taken into get the dependency data from the subsequent stages.
account. In this situation, all the work is focusing on the Thus stalling is used to ensure processor performs
execution semantics of each instruction and completing correctly.
the matrix line by line, rather than concentrating on the
3
MATEC Web of Conferences 139, 00085 (2017) DOI: 10.1051/matecconf/201713900085
ICMITE 2017
4. If Tuse = Tnew, it shows that current instruction can get multiplexers were implemented with Verilog HDL.
related data immediately as the instruction before Meanwhile, hazards in pipeline are resolved based on our
finish computing at the same time. Forwarding method. Finally, binary codes were got from compiling
technology can be used here. the programs written in C or assembly by GCC. The result
A complete stalling control signal can be created by of registers after executing every instruction is obtained
using logic operation OR to integrate all the stalling from QEMU in single-step mode. The correctness of the
conditions. In other cases, data hazards can be resolved by processor can be determined by comparing the value of
forwarding. However, there may have many forwarding registers with the result from QEMU during the
sources represented data from multiple pipeline stages simulation. Spartan6-6SLX150FGG484 made by Xilinx
when using forwarding technology. In this situation, the is the FPGA used to synthesize and implement with ISE
priority of forwarding source should be set. This paper in our experiment.
adopts a forwarding strategy based on the pipeline priority.
The strategy sets the stage which Tuse stages behind ID Design Realization Simulation
Verilog
stage has the highest priority among the whole pipeline Architecture specification
MUX Hazards GCC
stages and the priority of other stages behind it are
decreasing in turn. Based above, just selecting forwarding Instruction set Controllers QEMU
source which has the highest priority when there are Core Units
Core Unit
modules
multiple forwarding sources. Pipeline
Architecture
Datapath EDA
4
MATEC Web of Conferences 139, 00085 (2017) DOI: 10.1051/matecconf/201713900085
ICMITE 2017
Acknowledgment
References
1. Pandey A. Study of data hazard and control hazard
resolution techniques in a simulated five stage
pipelined RISC processor. International Conference
on Inventive Computation Technologies. IEEE, 1-4
(2017).
2. Qiao-Yan Y, Peng L, Qing-Dong Y. A data hazard
detection method for DSP with heavily compressed
instruction set. (2004).
3. Bernardi P, Boyang D, Ciganda L, et al. A functional
test algorithm for the register forwarding and pipeline
interlocking unit in pipelined microprocessors.
Design and Test Symposium. IEEE, 1-6 (2014).
4. Lu J, Zhou X, Wang J. A novel dynamic scheduling
algorithm of data hazard for embedded processor.
International Conference on Asic. IEEE, 28-31
(2007).
5. Schönherr J, Schreiber I, Fordran E, et al. Hazard
Checking in Pipelined Processor Designs Using
Symbolic Model Checking. Euromicro Conference,
1999. Proceedings. IEEE, 1, 75-78 (1999).
6. E. Nurvitadhi, J. C. Hoe, T. Kam, and S. Lu.
Automatic pipelining from transactional datapath
specifications. In Design, Automation and Test in
Europe, DATE 2010, Dresden, Germany, March 8-
12, 2010, 1001-1004 (2010).
7. E. Nurvitadhi. Automatic pipeline synthesis and
formal verification from transactional datapath
specifications. Terapevticheski Arkhiv, 80(4), 73-6
(2008).
8. E. Nurvitadhi, J. C. Hoe, T. Kam, and S. Lu.
Automatic pipelining from transactional datapath
specifications. IEEE Trans. on CAD of Integrated
Circuits and Systems, 30(3), 441-454 (2011).
9. P. Yiannacouras, J. G. Steffan, and J. Rose.
Exploration and customization of fpga-based soft
processors. IEEE Trans. on CAD of Integrated
Circuits and Systems, 266–277 (2007).
10. Yiannacouras, Peter, Jonathan Rose, and J. Gregory
Steffan. The microarchitecture of FPGA-based soft
processors. Proceedings of the 2005 international
conference on Compilers, architectures and synthesis
for embedded systems, 202-212 (2005).