40% found this document useful (5 votes)
5K views

Basic Performance Equation and Problems

The performance equation analyzes execution time as a product of three factors: instruction count, clocks per instruction, and clock time. It is given by the equation T=(N*S)/R, where T is execution time, N is instruction count, S is clocks per instruction, and R is clock rate. The document then discusses each of the three factors in the performance equation and provides examples of how changing processor design can impact each factor.

Uploaded by

biswajit biswal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
40% found this document useful (5 votes)
5K views

Basic Performance Equation and Problems

The performance equation analyzes execution time as a product of three factors: instruction count, clocks per instruction, and clock time. It is given by the equation T=(N*S)/R, where T is execution time, N is instruction count, S is clocks per instruction, and R is clock rate. The document then discusses each of the three factors in the performance equation and provides examples of how changing processor design can impact each factor.

Uploaded by

biswajit biswal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

The Performance Equation

The performance equation analyzes execution time as a product of three factors


that are relatively independent of each other. It is given by :

T=(N*S)/R = N*S*P = IC*CPI*CT

where ‘T’ is the execution time or the processor time, ‘N’ is the instruction count
(IC), ‘S’ is the number of clock cycles per instruction or CPI, ‘R’ is the clock rate
or clock frequency and ‘P’ is the Clock Time (CT) which is the reciprocal of the
clock frequency, i.e., P=1/R.

For example, a 1 GHz processor has a cycle time of 1.0 ns and a 4 GHz processor
has a cycle time of 0.25 ns.

This equation remains valid if the time units are changed on both sides of the
equation. The left-hand side and the factors on the right-hand side are discussed in
the following sections.

The three factors are, in order, known as the instruction count (IC), clocks per
instruction (CPI), and clock time (CT). CPI is computed as an effective value.

Instruction Count

Computer architects can reduce the instruction count by adding more powerful
instructions to the instruction set. However, this can increase either CPI or clock
time, or both.

Clocks Per Instruction

Computer architects can reduce CPI by exploiting more instruction-level


parallelism. If they add more complex instructions it often increases CPI.

Clock Time

Clock time depends on transistor speed and the complexity of the work done in a
single clock. Clock time can be reduced when transistor sizes decrease. However,
power consumption increases when clock time is reduced. This increase the
amount of heat generated.

Instruction Count

Instruction (IC) count is a dynamic measure: the total number of instruction


executions involved in a program. It is dominated by repetitive operations such as
loops and recursions.
Instruction count is affected by the power of the instruction set. Different
instruction sets may do different amounts of work in a single instruction. CISC
processor instructions can often accomplish as much as two or three RISC
processor instructions. Some CISC processor instructions have built-in looping so
that they can accomplish as much as several hundred RISC instruction executions.

For predicting the effects of incremental changes, architects use execution traces of
benchmark programs to get instruction counts. If the incremental change does not
change the instruction set then the instruction count normally does not change. If
there are small changes in the instruction set then trace information can be used to
estimate the change in the instruction count.

For comparison purposes, two machines with different instruction sets can be
compared based on compilations of the same high-level language code on the two
machines.

Clocks Per Instruction

Clocks per instruction (CPI) is an effective average. It is averaged over all of the
instruction executions in a program.

CPI is affected by instruction-level parallelism and by instruction complexity.


Without instruction-level parallelism, simple instructions usually take 4 or more
cycles to execute. Instructions that execute loops take at least one clock per loop
iteration. Pipelining (overlapping execution of instructions) can bring the average
for simple instructions down to near 1 clock per instruction. Superscalar pipelining
(issuing multiple instructions per cycle) can bring the average down to a fraction of
a clock per instruction.

For computing clocks per instruction as an effective average, the cases are
categories of instructions, such as branches, loads, and stores. Frequencies for the
categories can be extracted from execution traces. Knowledge of how the
architecture handles each category yields the clocks per instruction for that
category.

Clock Time

Clock time (CT) is the period of the clock that synchronizes the circuits in a
processor. It is the reciprocal of the clock frequency.

For example, a 1 GHz processor has a cycle time of 1.0 ns and a 4 GHz processor
has a cycle time of 0.25 ns.

Clock time is affected by circuit technology and the complexity of the work done
in a single clock. Logic gates do not operate instantly. A gate has a propagation
delay that depends on the number of inputs to the gate (fan in) and the number of
other inputs connected to the gate's output (fan out). Increasing either the fan in or
the fan out slows down the propagation time. Cycle time is set to be the worst-case
total propagation time through gates that produce a signal required in the next
cycle. The worst-case total propagation time occurs along one or more signal paths
through the circuitry. These paths are called critical paths.

For the past 35 years, integrated circuit technology has been greatly affected by a
scaling equation that tells how individual transistor dimensions should be altered as
the overall dimensions are decreased. The scaling equations predict an increase in
speed and a decrease in power consumption per transistor with decreasing size.
Technology has improved so that about every 3 years, linear dimensions have
decreased by a factor of 2. Transistor power consumption has decreased by a
similar factor. Speed increased by a similar factor until about 2005. At that time,
power consumption reached the point where air cooling was not sufficient to keep
processors cool if the ran at the highest possible clock speed.

Problem Statement 1

Suppose a program (or a program task) takes 1 billion instructions to execute on a


processor running at 2 GHz. Suppose also that 50% of the instructions execute in 3
clock cycles, 30% execute in 4 clock cycles, and 20% execute in 5 clock cycles.
What is the execution time for the program or task?

Solution

We have the instruction count: 109 instructions. The clock time can be computed
quickly from the clock rate to be 0.5×10-9 seconds. So we only need to to compute
clocks per instruction as an effective value:

Value Frequency Product


3 0.5 1.5
4 0.3 1.2
5 0.2 1.0
CPI = 3.7

Then we have

Execution time = 1.0×109 × 3.7 × 0.5×10-9 sec = 1.85 sec.

Problem Statement 2

Suppose the processor in the previous example is redesigned so that all instructions
that initially executed in 5 cycles now execute in 4 cycles. Due to changes in the
circuitry, the clock rate has to be decreased from 2.0 GHz to 1.9 GHz. No changes
are made to the instruction set. What is the overall percentage improvement?
Solution Form

We can determine the percentage improvement quickly by first finding the ratio
between before and after performance. The performance equation implies that this
ratio will be a product of three factors: a performance ratio for instruction count, a
performance ratio for CPI or its reciprocal, instruction throughput, and a
performance ratio for clock time or its reciprocal, clock frequency. We can ignore
the first factor in this problem since it is 1.0: the instruction count has not changed.
We are left with determining the performance ratio for CPI and, since we are given
clock frequencies, the performance ratio for clock frequencies.

Solution

The performance ratio for frequencies must be less than 1.0: if other factors are the
same then a slower clock rate implies worse performance. So this factor of the
improvement ratio must be 1.9/2.0.

For the clocks per instruction, we had a value of 3.7 before the change. We
compute clocks per instruction after the change as an effective value:

Value Frequency Product


3 0.5 1.5
4 0.3 1.2
4 0.2 0.8
CPI = 3.5

Now, lower clocks per instruction means higher instruction throughput and thus
better performance, so we expect this part of the performance ratio to be greater
than 1.0; that is, 3.7/3.5.

Then we have

3.7 1.9 7.03


performance ratio = × = =1.0043
3.5 2.0 7.0

This is a 0.43% improvement, which is probably not worth the effort.

You might also like