Introduction To QLY MGT Module
Introduction To QLY MGT Module
Statistical quality assurance is built around the normal distribution. It is a theoretical basis of
statistical process control and troubleshooting. However, the normal distribution is not a good
representation of other statistical phenomena in quality assurance. Some things are not normally
distributed; therefor, other distributions are more appropriate such as: the binomial, the Poisson,
and the exponential distributions.
Binomial Distribution
This model is a discrete probability distribution, one that is commonly used in quality assurance.
Decision makers often face dichotomous situations. These are situations with only two possible
outcomes. For example, a survey undertaken before the marketing of a new product can ether
have a fovourable or unfavourable outcome from each individual sampled, inspection of the
output of a production run results in either defective or not defective items, etc.
Requirements:
The binomial distribution is a reasonable representation of reality under a rather restrictive pair
of conditions. First, the event of interest must be an independent, mutually exclusive event, such
as a die throw or coin toss or any event that can occur in only two ways. Product inspection
generally fits this requirement if, as a result of the inspection, the unit inspected is either
“acceptable" or “unacceptable.” Second, the probability, p should not change as a result of earlier
events. This implies an infinitely large population, as coin tosses or die throws would be. In
practical applications to quality control, we are usually satisfied if there is no significant change
in population as the result of finding or failing to find an unacceptable unit. As a rule of the
thumb, we may be justified in assuming the binomial distribution is the population is at least 10
times the size of the random sample.
James Bernoulli undertook the earliest careful studies of the probability aspects of a dichotomy
in the 17th Century. Binomial or Bernoulli probabilities are calculated from P(x) = nCx px qn-x
Where:p = the probability of success on a single trial of the experiment
q = the probability of failure, which is 1-p
n = total number of trials
x = the specified number of outcomes we are looking for
nCx = the total number of different sequences of n outcomes, x of which are successes
and (n-x) of which are failures, is n! /x! (n-x)!
Example: Calculate the probability that two out of five people surveyed at the launch of a new
product will like the product, given that the probability that any one of them will like it is 70%.
Solution:
n = total number of people surveyed = 5
x = number of outcomes we are looking for = 2
p = probability of success of any one of them liking the product = 70% = 0.70
q = probability of failure, not liking the product = 1-p = 1-0.70 = 0.30
p(x) = nCx px qn-x
= ((n! /x! (n-x)!) (px qn-x)
= (5! /2! (5-2)!) (0.702 x 0.305-2
= (10) x 0.702 x 0.303
= 0.13
Many of the situations in which binomial random variables are used involve descriptions like at
most, at least, not more than, more than, and so on. Care is needed in dealing with these
descriptions, for instance:
- Less than 2 refers to 0 and 1, 2 not included
- Utmost or at most 2 or 2 or less is 0, 1, 2; 2 is the maximum
- More than 2 is 3, 4, 5…….; 2 is excluded
- At least 2 is the same as 2 or more; 2 is the minimum
Exercise:
1.0 A firm has found from experience that 5 in 100 transactions are incorrectly processed in
client’s financial records. An auditor randomly draws a sample of 8 transactions from this
client’s accounting records. What is the probability that:
a) No transaction will be incorrectly processed?
b) At least one transaction will be incorrectly processed?
c) No more than two transactions will be incorrectly processed?
2.0 Suppose an auditor randomly drawn a sample of 100 transactions from a client’s accounting
records from the exercise 1 above, how many on average would be incorrectly processed?
Solution:
The above implies that we need the mean and, in addition the standard deviation.
Mean = µ = np in this exercise = 100 x 0.05 = 5
Standard deviation = δ = √npq = √100x0.05x0.95 = 2.1794
3.0 Calculate the mean number of defective chairs manufactured by Nikii, and the standard
deviation, if the probability that a randomly selected chair is defective is 10%. And Nikii
produces 400 chairs in fortnight.
Solution:
Mean = µ = np = 400x 0.1 = 40
Standard deviation = δ = √npq = √400 x 0.1 x 0.9 = 6
This means that out of 400 chairs produced by Nikii, one would expect 40 of them to be
defective, but this figure is likely to vary between 34 and 46 defective chairs, out of 400.
Exponential distribution
How much time will elapse before an earthquake occurs in a given region? How long do we need
to wait before a customer enters our shop? How long will it take before a call center receives the
next phone call? How long will a piece of machinery work without breaking down?
Questions such as these are often answered in probabilistic terms using the exponential
distribution – a continuous curve representing a continuum of opportunities for a random event
to occur. The exponential, then, might represent the probability that some randomly occurring
event might happen before an indicated time.
Where µ is a constant and x is the set of all positive abscissa values from zero to infinity. We
may redefine x as a particular set of positive values of interest to us. The letter ℮ = 2.71828. The
phrase exponential distribution commonly refers to the distribution that might be better known as
negative exponential, or ℮-x/µ
Exponential examples
The exponential distribution has numerous applications in operations research, particularly in
queuing theory. If it can be shown that events such as arrivals at a facility occur randomly, the
distribution of times between arrivals can be shown to be exponential.
In quality assurance, the c charts and reliability, exponential distribution is oftentimes used. One
of the uses in reliability occurs when a failure rate is assumed to be constant: µ in the expression
above would be the expected life and x would be the time to failure. If the mean, or expected
time to failure, is 200 hours, the probability of a failure occurrence before 20 hours would be 1-
℮-x/µ = 1 - ℮-20/200 = 1 – 2.71828-20/200 = 0.095
Poisson distribution
The Poisson is a distribution of probabilities of discrete events, such as the number of event
occurrences per minute or per hour. For example, the probability distribution function of discrete
random variable x may take on any of the values 0, 1, 2, …………. Depending upon the property
to be emphasized, the same phenomenon might be represented by an exponential distribution of
opportunity sets until event occurrence or by a Poisson distribution of probabilities of numbers of
occurrences in a given opportunity set.
The Poisson and exponential distribution are appropriately used to describe random event
occurrence. The errors may occur other than randomly in a newspaper production process. We
can test this hypothesis of randomness by an appropriate goodness of fit test of either the
exponential distribution of page or words between errors or the Poisson set of event occurrences
per pages or other appropriate unit.
Let us assume you have satisfied yourself as to the random nature of error occurrence in pages of
a daily newspaper. Let us also assume you have no reason to believe there has been a substantial
change in this propensity since the basic data were taken. Then the Poisson model may provide
some kind of reasonable control device. If the mean of our data turns out to be ć = 2 errors per
page, then the figure below represents a set of probabilities of occurrence.
0.3
P(c)
0.2
0.1
C
Poisson distribution, ć = 2
Note that the occurrence of, say seven or more errors (probability = 1 – 0.995) would be an event
very unlikely to happen by chance alone.
Notation
The following notation is helpful, when we talk about the Poisson distribution.
e: A constant equal to approximately 2.71828. (Actually, e is the base of the natural
logarithm system.)
μ: The mean number of successes that occur in a specified region.
x: The actual number of successes that occur in a specified region.
P(x; μ): The Poisson probability that exactly x successes occur in a Poisson experiment,
when the mean number of successes is μ.
Poisson distribution
A Poisson random variable is the number of successes that result from a Poisson experiment.
The probability distribution of a Poisson random variable is called a Poisson distribution. Given
the mean number of successes (μ) that occur in a specified region, we can compute the Poisson
probability based on the following formula:
Poisson Formula. Suppose we conduct a Poisson experiment, in which the average number of
successes within a given region is μ. Then, the Poisson probability is: P(x; μ) = (e-μ) (μx) / x!
Where x is the actual number of successes that result from the experiment, and e is
approximately equal to 2.71828.
Solution:
This is a Poisson experiment in which we know the following:
μ = 2; since 2 homes are sold per day, on average.
x = 3; since we want to find the likelihood that 3 homes will be sold tomorrow.
e = 2.71828; since e is a constant equal to approximately 2.71828.
Example:
2.0 Suppose the average number of lions seen on a 1-day safari is 5. What is the probability that
tourists will see fewer than four lions on the next 1-day safari?
Solution:
This is a Poisson experiment in which we know the following:
μ = 5; since 5 lions are seen per safari, on average.
x = 0, 1, 2, or 3; since we want to find the likelihood that tourists will see fewer than 4
lions; that is, we want the probability that they will see 0, 1, 2, or 3 lions.
e = 2.71828; since e is a constant equal to approximately 2.71828.
To solve this problem, we need to find the probability that tourists will see 0, 1, 2, or 3 lions.
Thus, we need to calculate the sum of four probabilities: P(0; 5) + P(1; 5) + P(2; 5) + P(3; 5). To
compute this sum, we use the Poisson formula:
Exercise:
One of the tools in a large shipment turns out to be defective. What is the probability that among
400 randomly inspected tools?
a) only 3 will be defective
b) at most 3 will be defective
c) at least 3 will be defective
A tailor has established that her sewing machine stops randomly due to thread breakages at an
average rate of 5 stoppages per hour. What is the probability that:
a) no stoppage occurs in a given hour
b) at least 1 stoppage occurs in a 30 minute period
c) more than 1 stoppage occurs in three quarters of an hour period
d) at most 1 stoppage in 2 hours
The Normal Distribution
The normal distribution is extremely important in quality assurance. The control chart is based
on the idea that sample means taken from a production process will tend to form a normal
distribution. Some of the run theory and control charts stemming from it are based on the normal
distribution. Some of the work of obtaining evidence of specific problems through the analysis of
means (ANOM) uses the normal distribution
Probably the most useful single property of the normal distribution is its symmetry and the
predictability of accumulated areas beneath the curve. Manufacturing processes and natural
occurrences frequently create this type of distribution, a unimodal bell curve. The distribution is
spread symmetrically around the central location. This occurs when occurrences can occur
equally above and below an average.
Save
These three figures are often referred to as the Empirical Rule or the 68-95-99.5 Rule as
approximate representations population data within 1, 2, and 3 standard deviations from the
mean of a normal distribution.
Over time, upon making numerous calculations of the cumulative density function and z-scores,
with these three approximations in mind, you will be able to quickly estimate populations and
percentages of area that should be under a curve.
Many natural occurring events and processes with "common cause" variation exhibit a normal
distribution (when it does not, this is another way to help identify "special cause"). This
distribution is frequently used to estimate the proportion of the process that will perform within
specification limits or a specification limit (NOT control limits - call that specification limits and
control limits are different).
However, when the data does not meet the assumptions of normality the data will require a
transformation to provide an accurate capability analysis.
The mean is used to define the central location in a normal data set and the median, mode, and
mean are near equal. The area under the curve equals all of the observations or measurements.
P-value < alpha risk set at 0.05 indicates a non-normal distribution although normality
assumptions may apply. The level of confidence assumed throughout is 95%. P-value > alpha
risk set at 0.05 indicates a normal distribution.
The Z-statistic can be derived from any variable point of interest (X) with the mean and standard
deviation. The z-statistic can be referenced to a table that will estimate a proportion of the
population that applies to the point of interest.
Recall, one of two important implications of the Central Limit Theorem is, regardless
distribution type (unimodal, bi-modal, skewed, symmetric), the distribution of the sample means
will take the shape of a normal distribution as the sample size increases. The greater the sample
size the more normality can be assumed.
Some tables and software programs compute the z-statistic differently but will all get the correct
results if interpreted correctly.
Some tables incorporate single-tail probability and another table may incorporate double-tail
probability. Examine each table carefully to make the correct conclusion.
The bell curve theoretically spreads from negative infinity to positive infinity and approaches the
x-axis without ever touching it, in other words it is asymptotic to the x-axis.
The area under the curve represents the probabilities and the whole area is estimated to be equal
to 1.0 or 100%.
The normal distribution is described by the mean and the standard deviation. The formula for the
normal distribution density function is shown below (e = 2.71828):
Due to the time consuming calculations using integral calculus to come up with the area under
the normal curve from the formula above most of the time it is easier to reference tables. With
pre-populated values based on a given value for "x", the probabilities can be assessed using a
conversion formula (shown below) from the z-distribution, also known as the standardized
normal curve.
A Z-score is the number of standard deviations that a given value "x" is above or below the mean
of the normal distribution.
Example
A machining process has produced widgets with a mean length of 12.5 mm and variance of
0.0625 mm.
A customer has indicated that the upper specification limit (USL) is 12.65 mm. What proportion
of the bars will be shorter than 12.65 mm.
From the normal std. table below it shows that 0.60 corresponds to 0.7257. 72.57% of the area
under the curve is represented below the point of x = 12.65 mm.
The means that 72.57% of the widgets will be below the USL of the customer. This result will
not likely meet the voice of the customer.
Exercise;
1.0 Your company produces aluminum siding. For a given width, the expected number of
randomly occurring minor paint blemishes average 1.5 occurrences in 30 m. you randomly
select six 3-m units from a large lot of such units and examine them for paint blemishes.
a) What is the probability that no blemishes will be found?
b) What is the probability that there will be no more than two blemishes?
c) What is the probability of three or more blemishes in the six units?
d) If two blemishes were found, what is the probability that they were both on a single unit?
2.0 An ordinary 100-W light bulb is said to have a constant failure rate and an expected life of
1200 hr.
a) What is the probability that it will expire before 600hr?
b) What is the probability that it will last longer than 1500 hr.?
c) Of 10 such bulbs, what is the probability that at least 2 are still burning after 2400 hr.?
3.0 A certain process creates, on the average, 3 defectives per each lot shipped. Estimate the
following:
a) Probability of exactly 1 defective in a lot
b) Probability of 4 or more defective in a lot
4.0 given the following probability distribution, calculate its mean and variance:
Variance (X): 4 5 6 7 8
Probability p(x): 0.06 0.30 0.40 0.16 0.08
5.0 Given the following sample of 80 data points, estimate the mean, and the standard deviation.
Tabulate the values into appropriate cells and plot the cumulative values on normal
probability paper to determine if the sample could have come from a normal population.
237.3 241.3 243.3 235.1 234.0 242.2 244.3 339.9 244.2 236.4
233.2 234.7 236.0 238.3 237.9 236.9 224.3 246.0 233.7 240.1
228.3 234.8 228.1 239.3 243.0 234.9 242.6 234.5 231.2 242.2
247.4 237.0 237.0 241.8 238.9 257.3 232.7 236.6 237.2 246.0
237.6 229.5 214.1 244.7 249.2 241.4 238.1 232.5 234.5 231.5
246.5 233.0 233.9 238.6 235.1 232.2 239.4 240.6 237.3 235.6
241.9 242.2 230.8 235.8 232.1 233.7 236.5 240.8 232.4 238.5
236.4 238.9 235.9 249.2 238.2 250.9 233.6 236.1 233.1 229.5
6.0 Assume Mean = 238 and s = 6 for the data in question five.
a) How many of the 80 data points would you expect to be less than 242?
b) How many would you expect to be greater than 226?
c) How many would you expect to be between 229 and 247?
Parametric and Non Parametric Tests
Once the data is determined to take on a normal distribution (or assumed to be normal) it
indicates that the center value for the distribution of data is the mean.
For nonparametric test the measure of central tendency for the distribution of data is the median.
Parametric tests are generally more powerful assuming the same amount of data that
nonparametric test for ANOVA and t-test. It is easier (fewer samples) to determine a significant
difference) using parametric tests.