Basic Performance Equation and Problems
Basic Performance Equation and Problems
where ‘T’ is the execution time or the processor time, ‘N’ is the instruction count
(IC), ‘S’ is the number of clock cycles per instruction or CPI, ‘R’ is the clock rate
or clock frequency and ‘P’ is the Clock Time (CT) which is the reciprocal of the
clock frequency, i.e., P=1/R.
For example, a 1 GHz processor has a cycle time of 1.0 ns and a 4 GHz processor
has a cycle time of 0.25 ns.
This equation remains valid if the time units are changed on both sides of the
equation. The left-hand side and the factors on the right-hand side are discussed in
the following sections.
The three factors are, in order, known as the instruction count (IC), clocks per
instruction (CPI), and clock time (CT). CPI is computed as an effective value.
Instruction Count
Computer architects can reduce the instruction count by adding more powerful
instructions to the instruction set. However, this can increase either CPI or clock
time, or both.
Clock Time
Clock time depends on transistor speed and the complexity of the work done in a
single clock. Clock time can be reduced when transistor sizes decrease. However,
power consumption increases when clock time is reduced. This increase the
amount of heat generated.
Instruction Count
For predicting the effects of incremental changes, architects use execution traces of
benchmark programs to get instruction counts. If the incremental change does not
change the instruction set then the instruction count normally does not change. If
there are small changes in the instruction set then trace information can be used to
estimate the change in the instruction count.
For comparison purposes, two machines with different instruction sets can be
compared based on compilations of the same high-level language code on the two
machines.
Clocks per instruction (CPI) is an effective average. It is averaged over all of the
instruction executions in a program.
For computing clocks per instruction as an effective average, the cases are
categories of instructions, such as branches, loads, and stores. Frequencies for the
categories can be extracted from execution traces. Knowledge of how the
architecture handles each category yields the clocks per instruction for that
category.
Clock Time
Clock time (CT) is the period of the clock that synchronizes the circuits in a
processor. It is the reciprocal of the clock frequency.
For example, a 1 GHz processor has a cycle time of 1.0 ns and a 4 GHz processor
has a cycle time of 0.25 ns.
Clock time is affected by circuit technology and the complexity of the work done
in a single clock. Logic gates do not operate instantly. A gate has a propagation
delay that depends on the number of inputs to the gate (fan in) and the number of
other inputs connected to the gate's output (fan out). Increasing either the fan in or
the fan out slows down the propagation time. Cycle time is set to be the worst-case
total propagation time through gates that produce a signal required in the next
cycle. The worst-case total propagation time occurs along one or more signal paths
through the circuitry. These paths are called critical paths.
For the past 35 years, integrated circuit technology has been greatly affected by a
scaling equation that tells how individual transistor dimensions should be altered as
the overall dimensions are decreased. The scaling equations predict an increase in
speed and a decrease in power consumption per transistor with decreasing size.
Technology has improved so that about every 3 years, linear dimensions have
decreased by a factor of 2. Transistor power consumption has decreased by a
similar factor. Speed increased by a similar factor until about 2005. At that time,
power consumption reached the point where air cooling was not sufficient to keep
processors cool if the ran at the highest possible clock speed.
Problem Statement 1
Solution
We have the instruction count: 109 instructions. The clock time can be computed
quickly from the clock rate to be 0.5×10-9 seconds. So we only need to to compute
clocks per instruction as an effective value:
Then we have
Problem Statement 2
Suppose the processor in the previous example is redesigned so that all instructions
that initially executed in 5 cycles now execute in 4 cycles. Due to changes in the
circuitry, the clock rate has to be decreased from 2.0 GHz to 1.9 GHz. No changes
are made to the instruction set. What is the overall percentage improvement?
Solution Form
We can determine the percentage improvement quickly by first finding the ratio
between before and after performance. The performance equation implies that this
ratio will be a product of three factors: a performance ratio for instruction count, a
performance ratio for CPI or its reciprocal, instruction throughput, and a
performance ratio for clock time or its reciprocal, clock frequency. We can ignore
the first factor in this problem since it is 1.0: the instruction count has not changed.
We are left with determining the performance ratio for CPI and, since we are given
clock frequencies, the performance ratio for clock frequencies.
Solution
The performance ratio for frequencies must be less than 1.0: if other factors are the
same then a slower clock rate implies worse performance. So this factor of the
improvement ratio must be 1.9/2.0.
For the clocks per instruction, we had a value of 3.7 before the change. We
compute clocks per instruction after the change as an effective value:
Now, lower clocks per instruction means higher instruction throughput and thus
better performance, so we expect this part of the performance ratio to be greater
than 1.0; that is, 3.7/3.5.
Then we have