Neural Network and Fuzzy System
Neural Network and Fuzzy System
Example
Fuzzy logic allows making definite decisions based on imprecise or ambiguous data, whereas
ANN tries to incorporate human thinking process to solve problems without mathematically
modelling them. Even though both of these methods can be used to solve nonlinear problems,
and problems that are not properly specified, they are not related. In contrast to Fuzzy logic,
ANN tries to apply the thinking process in the human brain to solve problems. Further, ANN
includes a learning process that involves learning algorithms and requires training data whereas
Fuzzy logic makes decision based on the raw and ambigous data given to it whereas
Neural network tries to learn from the data, incorporating the same way involved in the
biological neural network. Both of these system are used to solve non-linear and
complex problems and are no where related to each other.
Aerospace
Automotive
Business
Defense
Electronics
Finance
Industrial Sector
Manufacturing
Marine
Medical
Securities
Transportation
In Pattern Recognition and Classification, fuzzy logic is used in the following areas −
Psychology
Example
Human Reasoning was dominated for centuries by the fundamental “Laws of Thought”
(Korner, 1967), introduced by Aristotle (384-322 BC) and the philosophers that
preceded him, which include:
• The principle of identity
• The law of the excluded middle
• The law of contradiction
In particular, the second of the above laws, stating that every proposition has to be
either “True” or “False”, was the basis for the genesis of the Aristotle’s bi-valued Logic.
The precision of the traditional mathematics owes undoubtedly a large part of its
success to this Logic.
However, even when Parmenides proposed, around 400 BC, the first version of the law
of the excluded middle, there were strong and immediate objections. For example,
Heraclitus opposed that things could be simultaneously true and not true, whereas the
Buddha Sidhartha Gautama, who lived in India a century earlier, had already indicated
that almost every notion contains elements from its opposite one. The ancient Greek
philosopher Plato (427-377 BC) laid the foundation of what it was later called FL by
claiming that there exists a third area beyond “True” and “False”, where these two
opposite notions can exist together. More modern philosophers like Hegel, Marx,
Engels and others adopted and further cultivated the above Plato’s belief.
The Polish philosopher Jan Lukasiewicz (1878-1956) was the first to propose a
systematic alternative of the bi-valued logic introducing in the early 1900’s a three
valued logic by adding the term “Possible” between “True” and “False” (Lejewski,
1967). Eventually he developed an entire notation and axiomatic system from which he
hoped to derive modern mathematics. Later he also proposed four and five valued
Zadeh (1921–2017) (Wikipedia, retrieved from the Web on February, 2012) was born in
Baku, Azerbaijan of USSR, to a Russian Jewish mother (FanyaKoriman), who was a
pediatrician, and an Iranian Azeri father (Rahim Aleskerzade), who was a journalist on
assignment from Iran.
At the age of 10, when Stalin introduced collectivization of farms in USSR, the Zadeh
family moved to Iran. In 1942 Zadeh graduated from the University of Tehran with a
degree in electrical engineering and moved to the USA in 1944. He received a MS from
MIT in 1946 and a Ph.D. in electrical engineering from Columbia University in 1949.
He taught for ten years in Columbia, promoted to a Full Professor in 1957, before
moving to Berkeley in 1959. Among others he introduced jointly with J.R. Ragazzini in
1962 the pioneering z-transform method used today in the digital analysis (Brule, 2016)
whereas his more recent works include computing with words and perceptions (Zadeh,
1984;2005a) and an outline towards a generalized theory of uncertainty (Zadeh, 2005b).
It has been estimated that Zadeh, who died in Berkeley on 6 September 2017, aged 96,
counted in 2011 more than 950 000 citations by other researchers!
As it was expected, the far-reaching theory of fuzzy systems aroused some objections to
the scientific community. While there have been generic complaints about the fuzziness
of assigning values to linguistic terms, the most cogent criticisms come from Haak
(1979). She argued that there are only two areas – the nature of Truth and Falsity and
the fuzzy systems’ utility – in which FL could be possibly needed, and then maintained
that in both cases it can be shown that FL is unnecessary.
Fox (1981) responded to her objections indicating that FL is useful in three areas: To
handle real-world relationships which are inherently fuzzy, to calculate the frequently
existing in real world situations fuzzy data and to describe the operation of some
inferential systems which are inherently fuzzy. His most powerful arguments were that
traditional and FL need not be seen as competitive, but as complementary and that FL,
despite the objections of classical logicians, has found its way into practical applications
and has proved very successful there.
Real life situations appear frequently where some definitions have not clear boundaries,
like “the young people of a city”, “the good players of a team”, “the diligent students of
Therefore, the age of a recently born baby has membership degree mA(0) = 1, the age of
25 years has membership degree:
In this form, a set is represented by listing all the elements comprising it. The elements
are enclosed within braces and separated by commas.
In this form, the set is defined by specifying a property that elements of the set have in
common. The set is described as A = {x:p(x)}
Cardinality of a Set
Cardinality of a set S, denoted by |S||S|, is the number of elements of the set. The
number is also referred as the cardinal number. If a set has an infinite number of
elements, its cardinality is ∞∞.
If there are two sets X and Y, |X| = |Y| denotes two sets X and Y having same
cardinality. It occurs when the number of elements in X is exactly equal to the number
of elements in Y. In this case, there exists a bijective function ‘f’ from X to Y.
|X| ≤ |Y| denotes that set X’s cardinality is less than or equal to set Y’s cardinality. It
occurs when the number of elements in X is less than or equal to that of Y. Here, there
exists an injective function ‘f’ from X to Y.
|X| < |Y| denotes that set X’s cardinality is less than set Y’s cardinality. It occurs when
the number of elements in X is less than that of Y. Here, the function ‘f’ from X to Y is
injective function but not bijective.
If |X| ≤ |Y| and |X| ≤ |Y| then |X| = |Y|. The sets X and Y are commonly referred as
equivalent sets.
Types of Sets
Sets can be classified into many types; some of which are finite, infinite, subset,
universal, proper, singleton set, etc.
Infinite Set
Subset
Example 1 − Let, X = {1,2,3,4,5,6} and Y = {1,2}. Here set Y is a subset of set X as all
the elements of set Y is in set X. Hence, we can write Y⊆X.
Example 2 − Let, X = {1,2,3} and Y = {1,2,3}. Here set Y is a subset (not a proper
subset) of set X as all the elements of set Y is in set X. Hence, we can write Y⊆X.
Proper Subset
The term “proper subset” can be defined as “subset of but not equal to”. A Set X is a
proper subset of set Y (Written as X ⊂ Y) if every element of X is an element of set Y
and |X| < |Y|.
Example − Let, X = {1,2,3,4,5,6} and Y = {1,2}. Here set Y ⊂ X, since all elements in
Y are contained in X too and X has at least one element which is more than set Y.
Universal Set
It is a collection of all elements in a particular context or application. All the sets in that
context or application are essentially subsets of this universal set. Universal sets are
represented as U.
Example − We may define U as the set of all animals on earth. In this case, a set of all
mammals is a subset of U, a set of all fishes is a subset of U, a set of all insects is a
subset of U, and so on.
A Singleton set or Unit set contains only one element. A singleton set is denoted by {s}.
Equal Set
If two sets contain the same elements, they are said to be equal.
Example − If A = {1,2,6} and B = {6,1,2}, they are equal as every element of set A is
an element of set B and every element of set B is an element of set A.
Equivalent Set
If the cardinalities of two sets are same, they are called equivalent sets.
Overlapping Set
Two sets that have at least one common element are called overlapping sets. In case of
overlapping sets −
n(A∪B)=n(A)+n(B)−n(A∩B)
n(A∪B)=n(A−B)+n(B−A)+n(A∩B)
n(A)=n(A−B)+n(A∩B)
n(B)=n(B−A)+n(A∩B)
Example − Let, A = {1,2,6} and B = {6,12,42}. There is a common element ‘6’, hence
these sets are overlapping sets.
Two sets A and B are called disjoint sets if they do not have even one element in
common. Therefore, disjoint sets have the following properties −
n(A∩B)=ϕ
n(A∪B)=n(A)+n(B)
Example − Let, A = {1,2,6} and B = {7,9,14}, there is not a single common element,
hence these sets are overlapping sets.
Set Operations include Set Union, Set Intersection, Set Difference, Complement of Set,
and Cartesian Product.
Union
The union of sets A and B (denoted by A∪ BA ∪ B) is the set of elements which are in
A, in B, or in both A and B. Hence, A ∪ B = {x|x∈ A OR x ∈ B}.
Intersection
The intersection of sets A and B (denoted by A ∩ B) is the set of elements which are in
both A and B. Hence, A ∩ B = {x|x∈ A AND x ∈ B}.
The set difference of sets A and B (denoted by A–B) is the set of elements which are
only in A but not in B. Hence, A − B = {x|x∈ A AND x ∉ B}.
Complement of a Set
The complement of a set A (denoted by A′) is the set of elements which are not in set A.
Hence, A′ = {x|x∉ A}.
More specifically, A′ = (U−A) where U is a universal set which contains all objects.
Example − If A = {x|x belongs to set of add integers} then A′ = {y|y does not belong to
set of odd integers}
Properties on sets play an important role for obtaining the solution. Following are the
different properties of classical sets −
Commutative Property
A∪B=B∪A
A∩B=B∩A
Associative Property
A∪(B∪C)=(A∪B)∪C
A∩(B∩C)=(A∩B)∩C
Distributive Property
A∪(B∩C)=(A∪B)∩(A∪C)
A∩(B∪C)=(A∩B)∪(A∩C)
Idempotency Property
A∪A=A
A∩A=A
Identity Property
A∪φ=A
A∩X=A
If A⊆B⊆C
, then A⊆C
De Morgan’s Law
It is a very important law and supports in proving tautologies and contradiction. This
law states −
𝐴 ∩ 𝐵=A∪B
A∪B=A∩B
Exercise:
Introduction:
Fuzzy logic starts with and builds on a set of user-supplied human language rules. The
fuzzy systems convert these rules to their mathematical equivalents. This simplifies the
job of the system designer and the computer, and results in much more accurate
representations of the way systems behave in the real world.
Additional benefits of fuzzy logic include its simplicity and its flexibility. Fuzzy logic
can handle problems with imprecise and incomplete data, and it can model nonlinear
functions of arbitrary complexity. "If you don't have a good plant model, or if the
system is changing, then fuzzy will produce a better solution than conventional control
techniques," says Bob Varley, a Senior Systems Engineer at Harris Corp., an aerospace
company in Palm Bay, Florida.
You can create a fuzzy system to match any set of input-output data. The Fuzzy Logic
Toolbox makes this particularly easy by supplying adaptive techniques such as adaptive
neuro-fuzzy inference systems (ANFIS) and fuzzy subtractive clustering.
Fuzzy logic models, called fuzzy inference systems, consist of a number of conditional
"if-then" rules. For the designer who understands the system, these rules are easy to
write, and as many rules as necessary can be supplied to describe the system adequately
(although typically only a moderate number of rules are needed).
In fuzzy logic, unlike standard conditional logic, the truth of any statement is a matter
of degree. (How cold is it? How high should we set the heat?) We are familiar with
inference rules of the form p -> q (p implies q). With fuzzy logic, it's possible to say
(.5* p ) -> (.5 * q). For example, for the rule if (weather is cold) then (heat is on), both
variables, cold and on, map to ranges of values. Fuzzy inference systems rely on
membership functions to explain to the computer how to calculate the correct value
between 0 and 1. The degree to which any fuzzy statement is true is denoted by a value
between 0 and 1.
Not only do the rule-based approach and flexible membership function scheme make
fuzzy systems straightforward to create, but they also simplify the design of systems
and ensure that you can easily update and maintain the system over time.
A paradigm is a set of rules and regulations which defines boundaries and tells us what
to do to be successful in solving problems within these boundaries. For example the use
of transistors instead of vacuum tubes is a paradigm shift - likewise the development of
Fuzzy Set Theory from conventional bivalent set theory is a paradigm shift.
The most obvious limiting feature of bivalent sets that can be seen clearly from the
diagram is that they are mutually exclusive - it is not possible to have membership of
more than one set (opinion would widely vary as to whether 50 degrees Fahrenheit is
'cold' or 'cool' hence the expert knowledge we need to define our system is
mathematically at odds with the humanistic world). Clearly, it is not accurate to define a
transition from a quantity such as 'warm' to 'hot' by the application of one degree
Fahrenheit of heat. In the real world a smooth (unnoticeable) drift from warm to hot
would occur.
This natural phenomenon can be described more accurately by Fuzzy Set Theory. Fig.2
below shows how fuzzy sets quantifying the same information can describe this natural
drift.
Although, the concept of fuzzy logic had been studied since the 1920's. The term fuzzy
logic was first used with 1965 by LotfiZadeh a professor of UC Berkeley in California.
He observed that conventional computer logic was not capable of manipulating data
representing subjective or unclear human ideas.
Fuzzy logic has been applied to various fields, from control theory to AI. It was
designed to allow the computer to determine the distinctions among data which is
neither true nor false. Something similar to the process of human reasoning. Like Little
dark, Some brightness, etc.
However, fuzzy logic is never a cure for all. Therefore, it is equally important to
understand that where we should not use fuzzy logic.
Here, are certain situations when you better not use Fuzzy Logic:
Fuzzy Logic architecture has four main parts as shown in the diagram:
Rule Base:
It contains all the rules and the if-then conditions offered by the experts to control the
decision-making system. The recent update in fuzzy theory provides various methods
for the design and tuning of fuzzy controllers. This updates significantly reduce the
number of the fuzzy set of rules.
Fuzzification:
Inference Engine:
It helps you to determines the degree of match between fuzzy input and the rules. Based
on the % match, it determines which rules need implment according to the given input
field. After this, the applied rules are combined to develop the control actions.
Defuzzification:
At last the Defuzzification process is performed to convert the fuzzy sets into a crisp
value. There are many types of techniques available, so you need to select it which is
best suited when it is used with an expert system.
Fuzzy: Tom's degree of membership within the Probability: There is a 90% chance that Tom is
set of old people is 0.90. old.
In Crisp logic law of Excluded Middle and In the fuzzy logic law of Excluded Middle and
Non- Contradiction may or may not hold Non- Contradiction hold
A classical set is defined by crisp boundaries, A fuzzy set always has ambiguous boundaries,
i.e., there is clarity about the location of the set i.e., there may be uncertainty about the location
See the below-given diagram. It shows that in fuzzy systems, the values are denoted by
a 0 to 1 number. In this example, 1.0 means absolute truth and 0.0 means absolute
falseness.
The Blow given table shows how famous companies using fuzzy logic in their products.
Copy machine Canon Using for adjusting drum voltage based on picture
Nissan, Isuzu, Use it to adjusts throttle setting to set car speed and
Cruise control
Mitsubishi acceleration
Golf diagnostic
Maruman Golf Selects golf club based on golfer's swing and physique.
system
Microwave Mitsubishi
Sets lunes power and cooking strategy
oven Chemical
Fuzzy logic is not always accurate, so The results are perceived based on assumption,
so it may not be widely accepted.
Fuzzy systems don't have the capability of machine learning as-well-as neural network
type pattern recognition
Core
For any fuzzy set A˜, the core of a membership function is that region of universe that
is characterize by full membership in the set. Hence, core consists of all those elements
yof the universe of information such that,
μA˜(y)=1
Support
For any fuzzy set A˜, the support of a membership function is the region of universe
that is characterize by a nonzero membership in the set. Hence core consists of all those
elements yof the universe of information such that,
μA˜(y)>0
Boundary
For any fuzzy set A˜, the boundary of a membership function is the region of universe
that is characterized by a nonzero but incomplete membership in the set. Hence, core
consists of all those elements yof the universe of information such that,
1>μA˜(y)>0
It may be defined as the process of transforming a crisp set to a fuzzy set or a fuzzy set
to fuzzier set. Basically, this operation translates accurate crisp input values into
linguistic variables.
In this method, the fuzzified set can be expressed with the help of the following relation
A˜=μ1Q(x1)+μ2Q(x2)+...+μnQ(xn)
Here the fuzzy set Q(xi)is called as kernel of fuzzification. This method isimplemented
by keeping μi constant and xi being transformed to a fuzzy set Q(xi).
It is quite similar to the above method but the main difference is that it kept xiconstant
and μiis expressed as a fuzzy set.
2.5.2 Defuzzification
It may be defined as the process of reducing a fuzzy set into a crisp set or to convert a
fuzzy member into a crisp member.
We have already studied that the fuzzification process involves conversion from crisp
quantities to fuzzy quantities. In a number of engineering applications, it is necessary to
defuzzify the result or rather “fuzzy result” so that it must be converted to crisp result.
Mathematically, the process of Defuzzification is also called “rounding it off”.
Max-Membership Method
This method is limited to peak output functions and also known as height method.
Mathematically it can be represented as follows −
μA˜(x∗)>μA˜(x)forallx∈X
This method is also known as the center of area or the center of gravity method.
Mathematically, the defuzzified output x∗will be represented as −
x∗=∫μA˜(x).xdx∫μA˜(x).dx
Weighted Average Method
will be represented as −
x∗=∑μA˜(xi).xi∑μA˜(xi)
Mean-Max Membership
This method is also known as the middle of the maxima. Mathematically, the
defuzzified output x∗
will be represented as −
x∗=∑i=1nxin
Having two fuzzy sets A˜ and B˜, the universe of information U and an element 𝑦 of
the universe, the following relations express the union, intersection and complement
operation on fuzzy sets.
Union/Fuzzy ‘OR’
Let us consider the following representation to understand how the Union/Fuzzy ‘OR’
relation works −
μA˜∪B˜(y)=μA˜∨μB˜∀y∈U
μA˜∩B˜(y)=μA˜∧μB˜∀y∈U
Complement/Fuzzy ‘NOT’
μA˜=1−μA˜(y).y∈U
Definition. (normal fuzzy set) A fuzzy subset A ofa classical set X is called normal if
there exists anx ∈X such that A(x) = 1. Otherwise A is subnormal.
Definition. (α-cut) An α-level set of a fuzzy setA of X is a non-fuzzy set denoted by
[A]α and isdefined by
{t ∈ X|A(t) ≥ α} if α > 0
[A]α= {
cl(suppA) if α = 0
Definition. (convex fuzzy set) A fuzzy set A of X iscalled convex if [A]α is a convex
subset of X ∀α ∈[0, 1].Anα-cut of a triangular fuzzy number.
Definition. (fuzzy number) A fuzzy number A is afuzzy set of the real line with a normal,
(fuzzy) convex and continuous membership function of bounded support. The family of
fuzzy numbers will be denoted by F.
Let A be a fuzzy number. Then [A]γ is a closedconvex (compact) subset of R for all γ
∈[0, 1]. Let us introduce the notations a1(γ) = min[A]γ, a2(γ) = max[A]γ
In other words, a1(γ) denotes the left-hand side anda2(γ) denotes the right-hand side of
the γ-cut. It iseasy to see that Ifα ≤β then [A]α⊃[A]β
Furthermore, the left-hand side functiona1: [0, 1] →Ris monoton increasing and lower
semi-continuous,and the right-hand side functiona2: [0, 1] →Ris monoton decreasing
and upper semi-continuous.
We shall use the notation[A]γ = [a1(γ), a2(γ)].The support of A is the open interval
(a1(0), a2(0)).
If A is not a fuzzy number then there exists a γ ∈ [0, 1] such that [A]γ is not a convex
subset of R.
Definition. (Triangular fuzzy number) A fuzzy set A is called triangular fuzzy number
with peak (or center) a, left width α > 0 and right width β > 0 if its membership function
has the following form
Definition .(trapezoidal fuzzy number) A fuzzy set A is called trapezoidal fuzzy number
with tolerance interval [a, b], left width α and right width β if its membership function
has the following form
and we use the notation A = (a, b, α, β). It can easily be shown that
[A]γ = [a −(1 −γ)α, b + (1 −γ)β], ∀γ ∈[0, 1]. The support of A is (a −α, b + β).
A trapezoidal fuzzy number may be seen as a fuzzy quantity ”x is approximately in the
interval [a, b]”.
Definition. (subsethood) Let A and B are fuzzy subsets of a classical set X. We say that
A is a subset of B if A(t) ≤ B(t), ∀t ∈ X.
(¬A)(t) = 1 − A(t)
The first area Haack defines is that of the nature of Truth and Falsity: if it could be
shown, she maintains, that these are fuzzy values and not discrete ones, then a need for
fuzzy logic would have been demonstrated. The other area she identifies is that of fuzzy
systems' utility: if it could be demonstrated that generalizing classic logic to encompass
fuzzy logic would aid in calculations of a given sort, then again a need for fuzzy logic
would exist.
In regards to the first statement, Haack argues that True and False are discrete terms.
For example, "The sky is blue" is either true or false; any fuzziness to the statement
arises from an imprecise definition of terms, not out of the nature of Truth. As far as
fuzzy systems' utility is concerned, she maintains that no area of data manipulation is
made easier through the introduction of fuzzy calculus; if anything, she says, the
calculations become more complex. Therefore, she asserts, fuzzy logic is unnecessary.
Fox has responded to her objections, indicating that there are three areas in which fuzzy
logic can be of benefit: as a "requisite" apparatus (to describe real-world relationships
which are inherently fuzzy); as a "prescriptive" apparatus (because some data is fuzzy,
and therefore requires a fuzzy calculus); and as a "descriptive" apparatus (because some
inferencing systems are inherently fuzzy).
His most powerful arguments come, however, from the notion that fuzzy and classic
logics need not be seen as competitive, but complementary. He argues that many of
Haack's objections stem from a lack of semantic clarity, and that ultimately fuzzy
statements may be translatable into phrases which classical logicians would find
palatable.
Exercise:
A fuzzy relation generalizes these degrees to membership grades. So, a crisp relation is a
restricted case of a fuzzy relation.
Crisp relation:
Fuzzy relation
。Cartesian product :
X i {( x1 , , xn ) | xi X i , i Nn }
iNn
Nn {1,2, , n}
。n-ary relation: a subset of Xi
iNn
R( X 1 , X 2 , , X n ) X 1 X 2 X n
i.e.,
| |
a set the universal set
Characteristic function:
1 if ( x1 , x1 , xn ) R
R ( x1 , x1 , xn )
0 otherwise
。Binary, Ternary, Quaternary, Quinary, n-ary
Relations
Definition of Relation
A relation among crisp sets X1,...,Xn is a subset of X1 × ... × Xn denoted as R(X1,...,Xn) or R(Xi
| 1 ≤ i ≤ n). So, the relation R(X1,...,Xn) ⊆ X1 × ... ×Xn is set, too. The basic concept of sets can
be also applied to relations: • containment, subset, union, intersection, complement Each crisp
The membership of (x1,...,xn) in R signifies that the elements of (x1,...,xn) are related to
each other.
If (x1,...,xn) ∈ X1 × ... ×Xn corresponds to ri1,...,in ∈ R, then ri1,...,in =(1, if and only if
(x1,...,xn) ∈ R, 0, otherwise.
Fuzzy Relations
The characteristic function of a crisp relation can be generalized to allow tuples to have
degrees of membership. • Recall the generalization of the characteristic function of a crisp set!
Then a fuzzy relation is a fuzzy set defined on tuples (x1,...,xn) that may have varying
degrees of membership within the relation. The membership grade indicates strength of the
present relation between elements of the tuple. The fuzzy relation can also be represented by
an n-dimensional membership array.
µA1×...×An(x1,...,xn) = ⊤(µA1(x1),...,µAn(xn))
whereas xi ∈ Xi, 1 ≤i≤ n. Usually ⊤ is the minimum (sometimes also the product).
Subsequences
Projection
Given a relation R(x1,...,xn). Let [R ↓ Y] denote the projection of R on Y. It disregards all sets
in X except those in the family
Y = {Xj | j ∈ J ⊆INn}.
R(x).
Under special circumstances, this projection can be generalized by replacing the max
operator by another t-conorm.
Cylindric Extension
Another operation on relations is called cylindric extension. Let X and Y denote the same
families of sets as used for projection. Let R be a relation defined on Cartesian product of sets
in family Y. Let [R ↑ X \ Y] denote the cylindric extension of R into sets X1, (i∈INn) which are in
X but not in Y. It follows that for each x with x ≻ y
[R ↑ X \ Y](x) = R(y).
The cylindric extension • produces largest fuzzy relation that is compatible with projection, • is
the least specific of all relations compatible with projection, • guarantees that no information
not included in projection is used to determine extended relation.
Example
Consider again the example for the projection. The membership functions of the cylindric
extensions of all projections are already shown in the table under the assumption that their
arguments are extended to (x1,x2,x3) e.g.
In this example none of the cylindric extensions are equal to the original fuzzy relation. This is
identical with the respective projections. Some information was lost when the given relation
was replaced by any one of its projections.
Example
Consider again the example for the projection. The cylindric closures of three families of the
projections are shown below:
Binary relations are significant among n-dimensional relations. They are (in some sense)
generalized mathematical functions. On the contrary to functions from X to Y, binary relations
R(X,Y) may assign to each element of X two or more elements of Y. Some basic operations on
functions, e.g. inverse and composition, are applicable to binary relations as well. Given a fuzzy
relation R(X,Y). Its domain domR is the fuzzy set on X whose membership function is defined
for each x ∈ X as
R(x,y),
i.e. each element of X belongs to the domain of R to a degree equal to the strength of its
strongest relation to any y ∈ Y.
The range ran of R(X,Y) is a fuzzy relation on Y whose membership function is defined for each
y ∈ Y as
i.e. the strength of the strongest relation which each y ∈ Y has to an x ∈ X equals to the degree
of membership of y in the range of R. The height h of R(X,Y) is a number defined by
maxx∈X
R(x,y).
R−1(y,x) = R(x,y), ∀x ∈ X, ∀y ∈ Y.
(R−1)−1 = R, ∀R.
Standard Composition
Consider the binary relations P(X,Y), Q(Y,Z) with common set Y. The standard composition of P
and Q is defined as
min{P(x,y), Q(y,z)}, ∀x ∈ X, ∀z ∈ Z.
If Y is finite, sup operator is replaced by max. Then the standard composition is also called
max-min composition.
Note that the standard composition is not commutative. Matrix notation: [rij] = [pik] ◦ [qkj]
with rij = maxkmin(pik,qkj).
Consider the following fuzzy relations for airplanes: • relation A between maximal speed and
maximal height, • relation B between maximal height and the type.
Relational Join
A similar operation on two binary relations is the relational join. It yields triples (whereas
composition returned pairs). For P(X,Y) and Q(Y,Z), the relational join P ∗ Q is defined by
Then the max-min composition is obtained by aggregating the join by the maximum:
[P ∗ Q](x,y,z), ∀x ∈X,∀z∈ Z.
Example
The join S = P ∗ Q of the relations P and Q has the following membership function (shown
below on left-hand side). To convert this join into its corresponding composition R = P ◦ Q
(shown on right-hand side), the two indicated pairs of S(x,y,z) are aggregated using max.
For instance,
Example
An example of R(X,X) defined on X = 1,2,3,4. Two different representation are shown below.
Note that a fuzzy binary relation that is reflexive, symmetric and transitive is called fuzzy
equivalence relation.
0 Otherwise
○ Example 3.1 :
R( X Y ) Z )Y1Y2Y3Y4 Z 2 Z1Z 3 Z 4 Z 5
R({ X 1 , X 3} , X i J N nY X
1 1 1 1 0.8 0.8 0.8 0.8
{ X i | j J N n } R2
X , a,* X , a,$ Y , b,* Y , a,$ X , b,* X , b,$ Y , b,* Y , b,$
y { X 1 , X 3}
Y j X j j J
[ R X Y ] :[ R2 { X 1 , X 3}]
y { X 2 }, X Y { X 1 , X 3 } {(*,*), ( x,$), (Y ,*), (Y , s)}
( x) R ( y ) [ R { X i }](Y ) max R( x) Rij
x y
Y j | j J X X j
jJ
Y1 Dollar 1 0 1 0 0 Dollar 0 0 1 0 0
Y2 Pound 0 0 0 1 0 Pound 0 0 0 0 0
Y3 Franc 0 0 0 0 0 Franc 0 1 0 0 0
Y4 Mark 0 0 0 0 0 Mark 0 0 0 0 0
Z1 Z 2 Z 3 Z 4 Z 5 Z1 Z 2 Z 3 Z 4 Z 5
English Franch
X1 X 2
tuples : ( x1 , x2 ,..., xn )
membership grade :
0 R( x1 , x2 ,..., xn ) 1
Y ={Beijing , New York , London}
Relation in list notation
R( X Y ) =
NY Paris
Beijing 1 0.9
NY 0 0.7
L-fuzzy relation
Let X = xi | i N n X X i
jJ
Let Y = Y j | j J X X j
jJ
Where J N n , | J | = r
Y a subsequence of X , Y X
iff Y j X j j J
R( X1 , X 2 , , X n ) : a relation
Y= { X i | j J N n }
[ R y ](Y ) max R( x)
x y
0.9 0 1 0.8
R1,3
X ,* X ,$ Y ,* Y ,$
0.9 1
R1
* y
1 0.8
R2
a b
1 0.8
R3
* $
X-Y : sets X i that are in X but are not in Y
[ R X Y ]( x) R ( y )
R: a relation defined on Y
And R= R1,2 y { X1 , X 3}
∴ X-Y = X 3 = {*,$}
∴ [ R X Y ] [ R1,2 { X 3}] =
[ R X Y ]:[ R12 { X 3}] [ R13 { X 2 }] [ R23 { X1}] [ R1 { X 2 , X 2 }] [ R2 { X1 , X 3}]
[ R3 { X1 , X 2 }]
Consider [ R X Y ] = [ R2 { X1 , X 3}]
y { X 2 }, X Y { X1 , X 3} {(*,*),( x,$),(Y ,*),(Y , s)}
{x,y} {x,$}
1 0.8
R= R2
a b
1 1 1 1 0.8 0.8 0.8 0.8
∴ [ R2 { X1 , X 3}] =
X , a,* X , a,$ Y , b,* Y , a,$ X , b,* X , b,$ Y , b,* Y , b,$
cyl{Pi }( X ) min [ Pi X Yi ]( X ) R
iI
‧ Example:
0.9 0.4 1 0.7 0.4 0.8
Cyl{R1, 2 , R1,3 , R2,3 }
x, a, x, b, y, a, y, a,$ y, b, y, b,
_ _ _
Refer to the original relation R( X , X , X ) in example 3.3.
1 2 3
X Y :bipartite graph
X Y :directed graph
‧ Representations
i, matrices R [ rij ] , where rij R( xi , y j )
Examples:
i) y1 y2 y3 y4 y5
x1 .9 1 0 0 0
x2 0 .4 0 0 0
x3 0 0 1 .2 0
x4 0 0 0 0 .4
x5 0 0 0 0 .5
x6 0 0 0 0 .2
ii)
‧ Domain:dom R
Crisp – dom R = {x X | ( x, y ) R, y Y }
The domain of a fuzzy relation R(x,y) is a fuzzy set on X; dom R(x) is its membership function.
‧ Range:ran R
Crisp – ran R = { y Y | ( x, y ) R, x X }
R 1 ( y, x) R( x, y)
R 1 RT , ( R 1 ) 1 R
0.3 0.2
e.g. R 0 1
0.6 0.4
0.3 0 0.6
R 1 RT
0.2 1 0.4
‧ Composition: R( X , Z ) P( X , Y ) Q(Y , Z )
R( x, z) [ P Q]( x, z) max min[ P( x, y), Q( y, z)]
yY
Max-min composition
P。Q Q。P
Properties: ( P。Q) 1 Q 。
1
P 1
( P。Q。
) R P。(Q。R)
R( x, z) [ P。Q]( x, z) max [ P( x, y‧
) Q( y, z)]
yY
max-product composition
‧ Example
0.3 0.5 0.8 0.9 0.5 0.7 0.7
0.0 0.7 1.0 。0.3 0.2 0.0 0.9
0.4 0.6 0.5 1.0 0.0 0.5 0.5
‧Representations
irrflexive
antiflexive
ii, symmetric
asymmetric
antisymmetric ( x, y ) R, ( y, x) R x y
strictly antisymmetric x y, ( x, y ) R or ( y, x) R
iii, transitive
nontransitive
antitransitive
◎ Fuzzy Relations
i, reflexive --- x , R( x, x) 1
irreflexive --- x , R( x, x) 1
antiflexive --- x , R( x, x) 1
-reflexive --- x , R( x, x)
asymmetric -- x, y, R( x, x) R( y, x)
R ( x, x ) 0
antisymmetric -- x y
R( y, x) 0
R( x, z ) max
min R( x, y ), R( y, z )
yY
R( x, z ) max
min R( x, y ) R( y, z )
yY
nontransitive -- x, z
R( x, z ) max
min R( x, y ), R( y, z )
yY
antitransitive -- x, z
R( x, z ) max
min R( x, y ), R( y, z )
yY
◎Summary
Antisymmetric
Antireflexive
Symmetric
Transitive
Reflexive
Crisp: equivalence;
Fuzzy: similarity
Quasi-equivalence
Compatibility or
Tolerance
Partial ordering
Preordering or
Quasi-ordering
Strict ordering
Figure3.6 Some important types of binary relation R(X,X)
◎ transitive closure: RT ( X )
1. R / R ( R R)
2. If R / R , Let R / R , go to step1
3. Stop, RT R /
◎Example 3.8
Step2: R / R , Let R R /
repeat step1
Step3: R / R , Let R R /
repeat step1
Step4: Stop
equivalence classes
partition: X/R
◎ Example 3.9:
X 1, 2, ,10
R( X X ) { ( x, y ) | x, y have the same remainder when divided by 3}
equivalence
partition X / R (1, 4, 7,10), (2,5,8), (3, 6,9)
。 Fuzzy
defined on X .
If R: Similarity relation,
R : equivalence relation
。Let ( R) : the partition of X w.r.t. R
( R) ( R) | 0,1
( R) : nested, i.e.,
( R) : aredefinement of ( R) iff
Pf :∵ R : a similar relation
∴R : reflexive, i.e., x X , R( x, x) 1
i, R : reflexive
x X , R( x, x) 1 [0,1],( x, x) R
R : reflexive
ii, R : symmetric
∵R : symmetric
x, y Z , R( x, y ) R( y, x)
Let R( x, y ) R( y, x)
Then or
a, if => ( x, y),( y, x) R
b, if => ( x, y),( y, x) R
iii, R : transitive
∵R : transitive
Let R( x, y ) 1 , R( y, z ) 2
Assume 1 2
Then 1 2 , 1 2 , or 1 2
R( x, z ) ,( x, z) R --- (B)
b. if 1 2
( x, y) R, ( y, z ) R , don’t care ( x, z )
c. if 1 2
( x, y) R, ( y, z ) R , don’t care ( x, z )
The similarity class for each element is a fuzzy set defined by the row of the membership
matrix corresponds to that element
0 0 1 0 1 0.9 0.5
For c :
a b c d e f g
compatibility
proximity
Crisp case :
Maximal compatibility classes – not properly contained within any other compatibility
class
Fuzzy case :
α-compatibility class ---- a subset A of X ,
completeα-cover
∵R : reflexive , symmetric
∴a compatibility relation
∵ R {0.0,0.4,0.5,0.7,0.8,1.0}
=>the completeα-covers
0.5
0.7 0.9
b d
a 1 0.7 0 1 0.7
b 0 1 0 0.9 0
c 0.5 0.7 1 1 0.8
d 0 0 0 1 0
e 0 0.1 0 0.9 1
1 1 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0
0 0 1 1 1 0 0 0 0
0 0 1 1 1 1 1 0 0
0.5
R 0 0 1 1 1 1 1 1 0
0 0 0 1 1 1 0 0 0
0 0 0 1 1 0 1 0 0
0 0 0 0 1 0 0 1 0
0 1
0 0 0 0 0 0 0
e.q. 0.4
x y xx yy xx X ( x, y ) XS {x1 , x2 } Xy Ax y
(x,y)y X ( x y, or y xAx XA Xx yR[ x ] ( y ) R ( y , x )
x U ( R, A)( x) R[ x ]y
xA
1 1 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0
0 0 1 1 1 0 0 0 0
0 0 1 1 1 1 1 0 0
0.4
R 0 0 1 1 1 1 1 1 0
0 0 0 1 1 1 1 0 0
0 0 0 1 1 1 1 0 0
0 0 0 0 1 0 0 1 0
0 1
0 0 0 0 0 0 0
(1,2) , (3,4,5),(4,5,6,7),(5,8),(9)
(34,),(4,5,6),(4,5,7),(3,5),(5,6)
(4,5),(5,6,7),(4,6,7),(4,6),(6,7)
e.q. 0.5
1 1 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0
0 0 1 1 1 0 0 0 0
0 0 1 1 1 1 1 0 0
0.5
R 0 0 1 1 1 1 1 1 0
0 0 0 1 1 1 0 0 0
0 0 0 1 1 0 1 0 0
0 0 0 0 1 0 0 1 0
0 1
0 0 0 0 0 0 0
(1,2),(3,4,5),(4,5,6),(4,5,7),(5,8),(9)
X Y : X : predecessor
precedes Y : successor
˙properties :
˙ A X
If x X , and y A , x y ,
x: lower bound of A on X
If ……… , x y
x : upper bound of A on X
※ any fuzzy partial ordering can be resolved into a series of crisp partial ordering .
R[ x ] ( y ) R( x, y )
R[ x ] ( y ) R( y, x)
˙ xundominatediff R(x,y) = 0 y x
X undominatingiff R(y,x) = 0 y x
U ( R, A) R[ x ]
xA
s.t. 1, 2,
y support [ U(R,A) ]
˙ Example 3.13
a b c d e
a 1 0.7 0 1 0.7
b 0 1 0 0.9 0
Fuzzy partial ordering R: c 0.5 0.7 1 1 0.8
d 0 0 0 1 0
e 0 0.1 0 0.9 1
2. d : undominated , C : undominating
3. For A = {a,b} , U(R,A) = the intersection of
0.7 0.9
The dominating classes of a and b =
b d
4, LUB(A) =b
e.g. 0.5
1 1 0 1 1
0 1 0 1 0
R 1 1 1 1 1
0 0 0 1 0
0 0 0 1 1
# is → 2 3 1 5 3
Antireflexive
Antisymmetric
Transitive
3.8. Morphisms
( x1 , x2 ) R (h( x1 ), h( x2 )) Q
‧ Fuzzy homomorphism h
If R(X,X), Q(Y,Y):Fuzzy binary relations
And R( x1 , x2 ) Q[h( x1 ), h( x2 )]
If ( x1 , x2 ) R (h( x1 ), h( x2 )) Q
And ( y1 , y2 ) Q (h 1 ( y1 ), h 1 ( y2 )) R
H imposes a partition h on X
Let
A {a1 , a2 , , an }
B {b1 , b2 , , bn } h
R,Q:fuzzy relations
h : strong homomorphism
y1 h(ai )ai A
where
y2 h(b j )b j B
‧Example 3.14
R(X,X)
0 0.5 0 0
0 0 0.9 0
R
1 0 0 0.5
0 0.6 0 0
Q(Y,Y)
0.5 0.9 0
Q 1 0 0.9
1 0.9 0
R(X,X)
0 .8 0 .4 0 0 0 0
0 0.5 0 0.7 0 0
0 0 0.3 0 0 0
0 0 .5 0 0 0.9 0.5
0 0 0 1 0 0
0 0 0 0 1 0.8
Q(Y,Y)
0.7 0 0.9
0.4 0.8 0
1 0 1
※Q represents a simplification of R
‧Isomorphism : (congruence)
h:1-1, onto X Y
Endomorphism : (subgraph)
h:X→Y, Y X
Automorphism :
i : t-norm
sup : t-conorm
‧ Properties
i i i i
1. ( P o Q ) o R P o( Q o R )
i i
2. P o( Q j ) ( P o Q j )
j j
i i
3. P o( Q j ) ( P o Q j )
j j
i i
4. ( Pj ) o Q ( Pj o Q)
j j
i i
5. ( Pj ) o Q ( Pj o Q)
j j
i i
6. ( P o Q) 1 Q 1 o R 1
Where P ( X , Y ) and Q (Y , Z ) are fuzzy relations.
i
pf. From Eq.(3.13), i.e., P Q ( x, z ) sup i P( x, y), Q( y, z )
yY
i
P ( Q j ) ( x, z ) sup i P( x, y), Q j ( y, z )
jJ yY
jJ
Let Q Qj
jJ
Q Q1 , Q Q2 , , Q QJ
i.e., ( y, z ), Q( y, z) Q1 ( y, z), , Q( y, z) Q J ( y, z)
i is monotonically increasing
sup i[ P( x, y ), Q j ( y, z )], ( x, y ), ( y, z )
jJ yY
i i
P ( Q j ) ( x, z ) ( P Q j ) ( x, z ), ( x, z )
jJ jJ
i i
i,e., P ( Qj ) (P Qj )
jJ jJ
i i
P Q1 P Q2 (5.20)
i.e., if Q1 Q2
i i
Q1 P Q2 P (5.21)
i
。Identity of
1 0
1, x y
E ( x, y )
0, x y
0 1
i i
i.e., E P P E P
2
。Relation Ron X :i-transitive
iff R( x, z ) i R ( x, y ), R ( y, z ) , x, y, z X
i
R RR
。i-transitive closure RT ( i )
i
R( n ) R R( n1)
i
i.e., RT ( i ) : i-transitive ( RT (i ) RT (i ) RT (i ) )
i i
R(2) R R S S S
mathematical
If R ( n ) S , i-transitive induction
i i
R( n1) R R( n) S S S
R( k ) S , k
RT (i ) R(k ) S
k 1
i.e., RT ( i ) : smallest
2
。Theorem 3.2: R: reflexive fuzzy relation on X , X n
R ( m ) R m 1
RT (i ) R ( n 1) n n 1
m
R R
proof: If x y, R( n1) ( x, x) 1
If x y, reflexive
i
Extension of definition
X n
X Z0 , Z1 , , Z n y contains
at least 2 identical element.
Say Z r Z s (r s)
R( n) R( n1) ( B) R( n) R( n1) ( A, B)
RT (i ) R( n1)
a b b
。 wi operation:
a b 1
。Theorem 3.3
wi (d , a) wi (d , b) --- ii
6, wi (inf a j , b) sup wi (a j , b)
j j
7, wi (sup a j , b) inf wi (a j , b)
j j
Associativity
communitation
i(i(a, b), x) d x wi (i(a, b)d )
wi (a, wi (b, d )) sup x | i(a, x) wi (b, d ) sup x | x wi (i(a, b), d ) wi (i(a, b), d )
wi ( s, b) wi (a j , b), j
By(1)
wi (s, b) inf wi (a j , b) --- (D)
j
By(B)(C)(D)
wi (sup a j , b) wi ( s, b) inf wi (a j , b)
j
j
i, If a>b
i( wi (a, b), a)
i(b, a) By Axiom i2 wi (a, b) b
i(b,1)
By Axiom i2 ( a 1 )
b
By Axiom i1
i(wi (a, b), a) b
i, If a b
ii, Wi (d , a) Wi (d , b)
ii, see i
Proof :∵ if a b Wi (a, b) b
if a b Wi (a, b) 1
Wi (a, d ) d
Wi (a, d ) 1
Wi (a, d ) 1
=> Wi (b, d ) d
Wi (a, d ) d
Wi (b, d ) d
Wi (a, d ) 1
Wi (b, d ) 1
Wi (a, d ) 1
Proof :∵ a b Wi (a, b) b
a b Wi (a, b) 1
A, if a b
B, if a b
Theorem 3.4 :
i Wi Wi
(1)( P Q R) (Q P 1 R ) ( P (Q R 1 ) 1 )
Wi Wi i Wi
(2)( P (Q S ) ( P Q) S
Theorem 3.5 :
Wi Wi
( Pj ) Q ( Pj Q)
j j
Wi Wi
( Pj ) Q) ( Pj Q)
j j
Wi Wi
P ( Q j ) ( Pj Q j )
j j
Wi Wi
P ( Q j ) ( Pj Q j )
j j
Wi Wi
Theorem 3.6 : if Q1 Q2 => P Q1 P Q2
Wi Wi
Q1 R Q2 R
Proof : Q1 Q2 => Q1 Q2 Q1 , Q1 Q2 Q2
Wi Wi Wi Wi
∵ ( P Q1 ) ( P Q2 ) P (Q1 Q2 ) P Q1
Wi Wi
=> P Q1 P Q2
Wi Wi Wi Wi
∵ (Q1 R) (Q2 R) (Q1 Q2 ) R Q2 R
Wi Wi
=> Q1 R Q2 R
Wi i
2. R P ( P 1 R )
Wi Wi
3. P ( P Q ) Q 1
Wi Wi
4. R ( R Q 1 ) Q
Proof :
Wi Wi
(1) P Q ( P 1 ) 1 Q ---- (A)
i
( A) P 1 ( P Q) Q
i i
(2) P1 R P1 R ---- (B)
i
Let P P , Q R , P 1 R R
1
Wi i
( B) R P ( P 1 R)
i Wi
(3) by (3.33) , [ P 1 ( P Q)]1 Q 1
Wi i
( P Q) 1 P Q 1 ---- (C)
Wi
Let P Q P, P Q, Q 1 R
Wi Wi
(C ) P ( P Q) 1 Q 1
Single Layer Perception: Perceptron convergence theorem, Method of steepest descent-least mean
square algorithms.
Single-Layer NN Systems
Here,asimplePerceptronModelandanADALINENetworkModelispresented.
Single layerPerceptron
Definition:Anarrangementofoneinputlayerofneuronsfeedforwardtooneoutputlayerofneurons
isknownasSingleLayerPerceptron.
Xn ym
1 if net j>0 n
where netj= xiwij
0 if net j<0 i=1
• LearningAlgorithm:TrainingPerceptron
ThetrainingofPerceptronisasupervisedlearningalgorithmwhereweightsareadjustedtominimize
errorwhenevertheoutputdoesnotmatchthedesiredoutput.
- Iftheoutputiscorrectthennoadjustmentofweightsisdone.
K+1 K
i.e.
W ij = W ij
K+1 K
+ . xi
i.e. Wij = Wij
Where
K+1 K
xi istheinputandisthelearningrateparameter.
smallleadstoslowandlargeleadstofastlearning.
• PerceptronandLinearlySeparableTask
Perceptroncannothandletaskswhicharenotseparable.
- Definition:Setsofpointsin2-Dspacearelinearlyseparableifthesetscanbeseparatedbyastraight line.
- Generalizing,asetofpointsinn-dimensionalspacearelinearlyseparableifthereisahyperplaneof (n-
1)dimensionsseparatesthesets.
Example
:Exclusive
ORoperation
Evenparityis,evennumberof1bitsintheinput
Oddparityis,oddnumberof1bitsintheinput
- Thereisnowaytodrawasinglestraightlinesothatthecirclesareononesideofthelineandthedots
ontheotherside.
- Perceptronisunabletofindalineseparatingevenparityinputpatternsfromoddparityinputpatterns.
• PerceptronLearningAlgorithm
* Step 1:
Createapeceptronwith(n+1)inputneuronsx0,x1,.............. , . xn,
where xo = 1 is the bias input. Let O be the output neuron.
* Step 2:
InitializeweightW=(w0,w1, ............. ,.wn)torandomweights.
* Step 3:
IteratethroughtheinputpatternsXjofthetrainingsetusingtheweightset;iecomputetheweightedsum of
inputs net j= xiwi foreachinputpatternj.
i=1
* Step 4:
Compute the output y j using the step function
1 if net j>0 n
yj=f(netj)= where netj= xi
wij0 if net j<0 i=1
* Step 6:
Otherwise, update the weights as given below :
Ifthecomputedoutputsyjis1butshouldhavebeen0,
Then wi = wi - xi, i= 0,1,2,..........,n
Ifthecomputedoutputsyjis0butshouldhavebeen1,
Thenwi=wi+xi, i= 0,1,2,..........,n
where is the learning parameter and is constant.
* Step 7:
goto step 3
* END
AnADALINEconsistsofasingleneuronoftheMcCulloch-Pittstype,whereitsweightsaredetermined
bythenormalizedleastmeansquare(LMS)traininglaw.TheLMSlearningruleisalsoreferredtoas
deltarule.Itisawell-establishedsupervisedtrainingmethodthathasbeenusedoverawiderangeof
diverseapplications.
• ArchitectureofasimpleADALINE
Desired Output
ThebasicstructureofanADALINEissimilartoaneuronwithalinearactivationfunctionandafeedback
loop.DuringthetrainingphaseofADALINE,theinputvectoraswellasthedesiredoutputarepresented to
thenetwork.
[The complete training mechanism has been explained in the next slide. ]
* ThebasicstructureofanADALINEissimilartoalinearneuronwithanextrafeedbackloop.
* DuringthetrainingphaseofADALINE,theinputvectorX=[x1,x2,...,xn]Taswellasdesired
outputarepresentedtothenetwork.
* Theweightsareadaptivelyadjustedbasedondeltarule.
* AftertheADALINEistrained,aninputvectorpresentedtothenetworkwithfixedweightswillresult
inascalaroutput.
* Thus,thenetworkperformsanndimensionalmappingtoascalarvalue.
* The activation function is not used during the training phase. Once the weights are properly adjusted,
he response of the trained unit can be tested by applying various inputs, which are not in the training
set.Ifthenetworkproducesconsistentresponsestoahighdegreewiththetestinputs,itissaidthatthe
etworkcouldgeneralize.Theprocessoftrainingandgeneralizationaretwoimportantattributesofthis
network.
Usage of ADLINE :
In practice, an ADALINE is used to
- Makebinarydecisions;theoutputissentthroughabinarythreshold.
- RealizationsoflogicgatessuchasAND,NOTandOR.
- Realizeonlythoselogicfunctionsthatarelinearlyseparable.
* Clustering:
Aclusteringalgorithmexploresthesimilaritybetweenpatternsandplacessimilarpatternsinacluster.
Bestknownapplicationsincludedatacompressionanddatamining.
* Classification/Pattern recognition:
Thetaskofpatternrecognitionistoassignaninputpattern(likehandwrittensymbol)tooneofmany
classes.Thiscategoryincludesalgorithmicimplementationssuchasassociativememory.
* Function approximation:
Thetasksoffunctionapproximationistofindanestimateoftheunknownfunctionsubjecttonoise.
Variousengineeringandscientificdisciplinesrequirefunctionapproximation.
* Prediction Systems:
Thetaskistoforecastsomefuturevaluesofatime-sequenceddata.Predictionhasasignificantimpact
ondecisionsupportsystems.Predictiondiffersfromfunctionapproximationbyconsideringtimefactor.
System may be dynamic and may produce different results for the same input data based on system
state(time).
Unedited Version: Neural Network and Fuzzy System
SC - Neural Network
Intheprevioussectionweshowedthatbyaddinganextrahiddenunit,theXORproblemcanbesolved.For
binaryunits,onecanprovethatthisarchitectureisabletoperformanytransformationgiventhecorrectconnections
andweights.Themostprimitiveisthenextone.Foragiventransformationy=d(x),wecandividethesetofall
possibleinputvectorsintotwoclasses:X+={x|d(x)=1}andX-={x|d(x).1}
p
Since there are Ninput units, the total number of possible input vectors x is 2N. For every x c X+
ahiddenunithcanbereservedofwhichtheactivationyhis1ifandonlyifthespecificpatternpispresentattheinput:
wecanchooseitsweightswihequaltothespecificpatternxandthebiasUhequalto1-Nsuchthat
p p 1
y = sgn[w x - N + -]
ih
h
i i 2
isequalto1forxp=wonly.Similarly,theweightstotheoutputneuroncanbechosensuchthattheoutputisoneas
h
soonasoneoftheMpredicateneuronsisone:
M
[w + M - - ]
p 1
y =sgn h
o
h=1 2
Thisperceptronwillgiveyo=1onlyifxcX+:itperformsthedesiredmapping.Theproblemisthelargenumberof
predicateunits,whichisequaltothenumberofpatternsinX+,whichismaximally2N.ofcoursewecandothesame
trickforX-,andwewillalwaystaketheminimalnumberofmaskunits,whichismaximally2N-1.Amoreelegantproof
isgivenbyMinskyandPapert,butthepointisthatforcomplextransformationsthenumberofrequiredunitsinthe
hiddenlayerisexponentialinN.
Back-Propagation
Aswehaveseeninthepreviouschapter,asingle-layernetworkhassevererestrictions:theclassoftasksthat
canbeaccomplishedisverylimited.Inthischapterwewillfocusonfeedforwardnetworkswithlayersofprocessing units.
MinskyandPapertshowedin1969thatatwolayerfeed-forwardnetworkcanovercomemanyrestrictions,
butdidnotpresentasolutiontotheproblemofhowtoadjusttheweightsfrominputtohiddenunits.Ananswertothis
questionwaspresentedbyRumelhart,HintonandWilliamsin1986,andsimilarsolutionsappearedtohavebeen
publishedearlier(Parker,1985;Cun,1985).
Sincewearenowusingunitswithnonlinearactivationfunctions,wehavetogeneralisethedeltarule,which
waspresentedinchapter11 forlinearfunctionstothesetofnon-linearactivationfunctions.Theactivationisa
differentiablefunctionofthetotalinput,givenby
To get the correct generalization of the delta rule as presented in the previous chapter, we must set
EP
pwjk = - ....(12.3)
Wjk
The error Ep is defined as the total quadratic error for pattern p at the output units :
No
EP =
1
( dp -yp)2 ..........................................................................................................................................................
2 o=1 o o
(12.4)
SkP
= yP jkj
w ...(12.6)
When we define
EP
k
P=
SPk ...(12.7)
wewillgetanupdaterulewhichisequivalenttothedeltaruleasdescribedinthepreviouschapter,resultingina
gradientdescentontheerrorsurfaceifwemaketheweightchangesaccordingto:
pWjk=PPk j ...(12.8)
The trick is to figure out what kp should be for each unit k in the network. The interesting result,
whichwe nowderive,isthatthereisasimplerecursive computationofthese’swhichcanbeimplemented
bypropagatingerrorsignalsbackwandthroughthenetwork.
Tocomputepweapplythechainruletowritethispartialderivativeastheproductoftwofactors,onefactor
reflectingthechangeinerrorasafunctionoftheoutputoftheunitandonereflectingthechangeinthecutputasa
functionofchangesintheinput.Thus,wehave
yk=
P
F(SP)
SPk
k
...(12.10)
which is the same result as we obtained with the standard delta rule. Substituting this and equation
(12.10)inequation(12.9),weget
foranyoutputunito,Secondly,ifkisnotanoutputunitk-h,wedonotreadilyknowthecontributionoftheunitto
theoutputerrorofthenetwork.However,theerrormeasurecanbewrittenasafunctionofthenetinputsfromhidden
tooutputlayerEo-Ep(xp,xp,......sp,)andweusethechainruletowrite.
No No N No N
E =
P
E
P
S P
o
= E
P
o
Win yPj
EP wij o
Po wij
y Pk 0=1
SP k S Pk 0=1
SP n yPn j=1 j=1
S P n j=1
...(12.13)
No
Pk =F(Sp) k w
j=1
P
o ho
12.3.1 UnderstandingBack-Propagation
The equations derived in the previous section may be mathematically correct, but what do they actually mean
?
Is there a way of understanding back-propagation other than reciting the necessary equations ?
The answer is, of course, yes. In fact, the whole back-propagation process is intuitively very clear. What
happensintheaboveequationsisthefollowing.Whenalearningpatternisclamped,theactivationvaluesarepropagated
totheoutputunits,andtheactualnetworkoutputiscomparedwiththedesiredoutputvalues,we usuallyendupwithan error
in each of the output units. Let’s call this error eofor a particular output unit o. We have to bring eotozero.
The simplest method to do this is the greedy method: we strive to change the connections in the neural
network
insuchawaythat,nexttimearound,theerroreowillbezeroforthisparticularpattern.Weknowfromthedeltarulethat, in
order to reduce an error, we have to adapt its incoming weightsaccording to
Owho=(do- yo) yh
Thatisstepone.Butitaloneisnotenough:whenweonlyapplythisrule,theweightsfrominputtohiddenunits are
never changed, and we do not have the full representational power of the feed-forward network as promised by the
universal approximationtheorem.
In order to adapt the weights from input to hidden units, we again want to apply the delta rule. In this case,
however, we do not have a value for 6 for the hidden units. This is solved by the chain rule which does the following:
distribute the error of an output unit o to all the hidden units that is it connected to, weighted by this connection.
Differentlyput,ahiddenunithreceivesadeltafromeachoutputunitoequaltothedeltaofthatoutputunitweightedwith (=
multiplied by) the weight of the connection between thoseunits.
function of the hidden unit; F’ has to be applied to the delta, before the backpropagation process can continue.
Theapplicationofthegeneraliseddeltarulethusinvolvestwophases:Duringthefirstphasetheinputxis
presentedandpropagatedforwardthroughthenetworktocomputetheoutputvaluesypforeachoutputunit.This
outputiscomparedwithitsdesiredvaluedo,resultinginanerrorsignal6pforeachoutputunit.
Thesecondphaseinvolvesabackwardpassthroughthenetworkduringwhichtheerrorsignalispassedto
eachunitinthenetworkandappropriateweightchangesarecalculated.
12.4.1 WeightAdjustmentswithSigmoidActivationFunction
The results from the previous section can be summarised in three equations:
* Theweightofaconnectionisadjustedbyanamountproportionaltotheproductofanerrorsignal6,
ontheunitkreceivingtheinputandtheoutputoftheunitjsendingthissignalalongtheconnection:
pWkj= PkP j
...(12.16)
* Iftheunitisanoutputunit,theerrorsignalisgivenby
Inthiscasethederivativeisequalto
1 + e-tP
F-1(SP) = __
p
1 = 1 p
(-e -t ) = - 1 (-e-t ) = yp(1 -yp) ...(12.19)
SP 1+e-S
P P P P
-S 2 -S 2 -S 2
(1+e ) (1+e ) (1+e )
such that the error signal for an output unit can be written as :
Po -(dp -y p p
o ) y o oo (1-yp)o ...(12.20)
* Theerrorsignalforahiddenunitisdeterminedrecursivelyintermsoferrorsignalsoftheunitstowhichit
directlyconnectsandtheweightsofthoseconnections.Forthesigmoidactivationfunction:
No No
Pk = Fp(Sp) k d w
j=1
o
P
ho
- yP (1
k
- yp) k d w
j=1
P
o ho ...( 12.21)
The learning procedure requires that the change in weight is 6E . True gradient descent proportional to
p
6
w
.
requiresthatinfinitesimalstepsaretaken.Theconstantofproportionalityisthelearningratey.Forpracticalpurposes
wechoosealearningratethatisaslargeaspossiblewithoutleadingtooscillation.Onewaytoavoidoscillationat
large,istomakethechangeinweightdependentofthepastweightchangebyaddingamomentumterm:
Owjk (t + 1) = y6p
yp+ aOw(t)... (12.22)
k j jk
where t indexes the presentation number and a is a constant which determines the effect of the previous weight
change.
TheroleofthemomentumtermisshowninFig.12.2.Whennomomentumtermisused,ittakesalongtime
beforetheminimumhasbeenreachedwithalowlearningrate,whereasforhighlearningratestheminimumisnever
reachedbecauseoftheoscillations.Whenaddingthemomentumterm,theminimumwillbereachedfaster.
The descent in weight space. (a) for small learning rate; (b) for large learning rate: note the oscillations, and
(c) with large learning rate and momentum term added.
Although,theoretically,theback-propagationalgorithmperformsgradientdescentonthetotalerroronlyiftheweights
areadjustedafterthefullsetoflearningpatternshasbeenpresented,moreoftenthannotthelearningruleisapplied
toeachpatternseparately,i.e.,apatternpisapplied,Ep iscalculated,andtheweightsareadapted(p=1,2,…,P).
Thereexistsempiricalindicationthatthisresultsinfasterconvergence.Carehastobetaken,however,withtheorder
inwhichthepatternsaretaught.Forexample,whenusingthesamesequenceoverandoveragainthenetworkmay
becomefocusedonthefirstfewpatterns.Thisproblemcanbeovercomebyusingapermutedtrainingmethod.
Example12.1:Afeed-forwardnetworkcanbeusedtoapproximateafunctionfromexamples.Supposewehavea
system(forexampleachemicalprocessorafinancialmarket)ofwhichwewanttoknowthecharacteristics.The
inputofthesystemisgivenbythetwo-dimensionalvectorxandtheoutputisgivenbytheone-dimensionalvectord.
Wewanttoestimatetherelationshipd=ƒ(x)from80examples{xp,dp}asdepictedinFig.12.3(topleft).Afeed-
forwardnetworkwasprogrammedwithtwoinputs,10hiddenunitswithsigmoidactivationfunctionandanoutput
unitwithalinearactivationfunction.Checkforyourselfhowequation(4.20)shouldbeadaptedforthelinearinstead
ofsigmoidactivationfunction.Thenetworkweightsareinitializedtosmallvaluesandthenetworkistrainedfor5,000
learningiterationswiththeback-propagationtrainingrule,describedintheprevioussection.Therelationshipbetween
xanddasrepresentedbythenetworkisshowninFig.12.3(topright),whilethefunctionwhichgeneratedthe
learningsamplesisgiveninFig.12.3(bottomleft).TheapproximationerrorisdepictedinFig.12.3(bottomright).
Weseethattheerrorishigherattheedgesoftheregionwithinwhichthelearningsamplesweregenerated.The
networkisconsiderablybetteratinterpolationthanextrapolation.
Unedited Version: Neural Network and Fuzzy System
Fig. Example of function approximation with a feed forward network. Top left: The original learning
samples; Top right: The approximation with the network; Bottom left: The function which
generated the learning samples; Bottom right: The error in the approximation.
Exercise:
Q1. What is mean by Single-Layer NN Systems.
Q2. Explain Architecture of a simple ADALINE.
Q3. What are the use of ADLINE.
Q4. What are the Applications of Neural Network.
Q5. What is mean by Learning Algorithm.
Q6. Explain MULTI-LAYER PERCEPTRONS
Q7. What is the process of Back Propagation.
Q8. Write a short note on MULTI - LAYER FEED - FORWARD NETWORKS.
Q 9. List out the GENERALISED DELTA RULE.
Q10. How we can understand the Back Propagation.
Radial Basis and Recurrent Neural Networks: RBF network structure , theorem and the reparability
of patterns, RBF learning strategies
RadialBasisandRecurrentNeuralNetworks:RBFnetworkstructure,theoremandthereparabilityofpatternsRBF
learningstrategies,K-meansandLMSalgorithms,comparisonofRBFandMLPnetworks:energyfunction,spurious
states, errorperformance.
Radialbasisfunction(RBF)networksarefeed-forwardnetworkstrainedusingasupervisedtrainingalgorithm.They
aretypicallyconfiguredwithasinglehiddenlayerofunitswhoseactivationfunctionisselectedfromaclassof
functionscalledbasisfunctions.Whilesimilartobackpropagationinmanyrespects,radialbasisfunctionnetworks
haveseveraladvantages.Theyusuallytrainmuchfasterthanbackpropagationnetworks.Theyarelesssusceptibleto
problemswithnon-stationaryinputsbecauseofthebehaviouroftheradialbasisfunctionhiddenunits.
PopularizedbyMoodyandDarken(1989),RBFnetworkshaveproventobeausefulneuralnetworkarchitecture.
ThemajordifferencebetweenRBFnetworksandbackpropagationnetworks(thatis,multilayerperceptrontrained
byBackPropagationalgorithm)isthebehaviourofthesinglehiddenlayer.RatherthanusingthesigmoidalorS-
shapedactivationfunctionasinbackpropagation,thehiddenunitsinRBFnetworksuseaGaussianorsomeother
basiskernelfunction.Eachhiddenunitactsasalocallytunedprocessorthatcomputesascoreforthematchbetween
theinputvectoranditsconnectionweightsorcentres.Ineffect,thebasisunitsarehighlyspecializedpatterndetectors.
Theweightsconnectingthebasisunitstotheoutputsareusedtotakelinearcombinationsofthehiddenunitsto
productthefinalclassificationoroutput.Inthischapterfirstthestructureofthenetworkwillbeintroducedanditwill
beexplainedhowitcanbeusedforfunctionapproximationanddatainterpolation.Thenitwillbeexplainedhowitcan
betrained.
RadialBasisFunctionsarefirstintroducedinthesolutionoftherealmultivariableinterpolationproblems.BroomheadandL
owe(1988),andMoodyandDarken(1989)werethefirsttoexploittheuseofradialbasisfunctionsinthe
designofneuralnetworks.
The structure of an RBF networks in its most basic form involves three entirely different layers
Theinputlayerismadeupofsourcenodes(sensoryunits)whosenumberisequaltothedimensionpoftheinput vectoru.
Thesecondlayeristhehiddenlayerwhichiscomposedofnonlinearunitsthatareconnecteddirectlytoallofthe
nodesintheinputlayer.Itisofhighenoughdimension,whichservesadifferentpurposefromthatinamultilayer
perceptron.
Eachhiddenunittakesitsinputfromallthenodesatthecomponentsattheinputlayer.Asmentionedabovethehidden
unitscontainsabasisfunction,whichhastheparameterscenterandwidth.Thecenterofthebasisfunctionforanode
iatthehiddenlayerisavectorciwhosesizeistheastheinputvectoruandthereisnormallyadifferentcenterforeach
unitinthenetwork.
First,theradialdistancedi,betweentheinputvectoruandthecenterofthebasisfunctionciiscomputedforeachunit
iinthehiddenlayeras
The output hiof each hidden unit t is then computed by applying the basis function G to this distance.
AsitisshowninFigure5.2.thebasisfunctionisacurve(tipicallyaGaussianfunction,thewidthcorrespondingtothe
variance.i)whichhasapeakatzerodistanceanditdecreasesasthedistancefromthecenterincreases.
Figure5.3ResponseofahiddenunitontheinputspaceforuR2
5.1.2 Outputlayer
Thetransformationfromtheinputspacetothehiddenunitspaceisnonlinear,whereasthetransformationtothe
hiddenunitspacetotheoutputspaceislinear.
Mathematicalmodel
In summary, the mathematical model of the RBF network can be expressed as;
x=f(u),f:RN RM (5.1.4)
y= g(u) w G (u)
|u|
t t
(5.2.2)
The aim is to minimize the error by setting the parameters of Giappropriately. A possible choice for the error
definition is the L2 norm of the residual function r(u) which is defined as
Now,considerthesingleinputsingleoutputRBFnetworkshowninFigure5.4.Thenxcanbewrittenas t
Bytheuseofsuchanetwork,ycanbewritten t
where f(u) is the output of the RBFNN given in Figure 5.4 and r(u) is the residual.
By setting the center ci, the variance i, and the weight withe error appropriately, the error can be
Whatever we discussed here for g:RR, can be generalized to g:RN RM easily by using an N input,
M output RBFNN given in figure 5.1 previously.
5.2.2 DataInterpolation
Giveninputoutputtrainingpatterns(uk,yk)tk=1,2...K,theaimofdatainterpolationistoapprimatethefunctiony
fromwhichthedataisgenerated.Sincethefunctionyisunknown,theproblemcanbestatedasaminimization
problemwhichtakesonlythesamplepointsintoconsideration:
TABLEI:13datapointsgeneratedbyusingsumofthreegaussisnswithc1=0.2000 c2 = 0.6000
data no 1 2 3 4 5 6 7 9 10 11 12 13
x 0.0500 0.2000 0.2500 0.3000 0.4000 0.4300 0.4800 0.6000 0.7000 0.8000 0.9000 0.9500
f(x) 0.863 0.2662 0.2362 0.1687 0.1260 0.1756 0.3290 0.6694 0.4573 0.3320 0.4063 0.3535
Figure 5.5 Output of the RBF network trained to fir the datapoints given in Table 5.1
ThetrainingofaRBFnetworkcanbeformulatedasanomilinearunconstrainedoptimizationproblemgivenbelow:
Giveninputoutputtrainingpatterns(uk,yk)tk=1,2...K,Choosewijandci,i=1,2...L.j=1,2...M
soastominimize
Initssimplestform,allhiddenunitsintheRBFnetworkhavethesamewidthordegreeofsensitivitytoinputs.
However,inportionsoftheinputspacewheretherearefewpatterns,itissometimedesirabletohavehiddenunits
withawideareaofreception.Likewise,inportionsoftheinputspace,whicharecrowded,itmightbedesirableto
haveveryhighlytunedprocessorswithnarrowreceptionfields.Computingtheseindividualwidthsincreasesth
e performanceoftheRBFnetworkattheexpenseofamorecomplicatedtrainingprocess.
Rememberthatinabackpropagationnetwork,allweightsinallofthelayersareadjustedatthesametime.Inradial
basisfunctionnetworks,however,theweightsintothehiddenlayerbasisunitsareusuallysetbeforethesecondlaye
r
ofweightsisadjusted.Astheinputmovesawayfromtheconnectionweights,theactivationvaluefallsoff.This
behaviorleadstotheuseoftheterm“center”forthefirst-layerweights.Thesecenterweightscanbecomputedusing
Kohonenfeaturemaps,statisticalmethodssuchasK-
Meansclustering,orsomeothermeans.Inanycase,theyare
thenusedtosettheareasofsensitivityfortheRBFnetwork’shiddenunits,whichthenremainfixed.
Oncethehiddenlayerweightsareset,asecondphaseoftrainingisusedtoadjusttheoutputweights.Thisprocess
typicallyusesthestandardsteepestdescentalgorithm.Notethatthetrainingproblembecomesquadraticonceifci’
s(radialbasisfunctioncenters)areknown.
k-meansisoneofthesimplestunsupervisedlearningalgorithmsthatsolvethewellknownclusteringproblem.The
procedurefollowsasimpleandeasywaytoclassifyagivendatasetthroughacertainnumberofclusters(assumek
clusters)fixedapriori.Themainideaistodefinekcenters,oneforeachcluster.Thesecentersshouldbeplacedina
cunningwaybecauseofdifferentlocationcausesdifferentresult.So,thebetterchoiceistoplacethemasmuchas
possiblefarawayfromeachother.Thenextstepistotakeeachpointbelongingtoagivendatasetandassociateitto
thenearestcenter.Whennopointispending,thefirststepiscompletedandanearlygroupageisdone.Atthispoint
weneedtore-calculateknewcentroidsasbarycenteroftheclustersresultingfromthepreviousstep.Afterwehave
theseknewcentroids,anewbindinghastobedonebetweenthesamedatasetpointsandthenearestnewcenter.A
loophasbeengenerated.Asaresultofthisloopwemaynoticethatthekcenterschangetheirlocationstepbystep
untilnomorechangesaredoneorinotherwordscentersdonotmoveanymore.Finally,this algorithmaimsat
minimizinganobjectivefunctionknowassquarederrorfunctiongivenby:
C Ci
i j
J(V) = (||x -v ||)2
i=1j=1
where,
‘||xi-vj||’istheEuclideandistancebetweenxiandvj.
‘c’isthenumberofdatapointsini
i
th
cluster.
Ci xij=1
Vi=(1/ci)
5) Recalculatethedistancebetweeneachdatapointandnewobtainedclustercenters.
1) Fast,robustandeasiertounderstand.
2) Relativelyefficient:O(tknd),wherenis#clusters,dis#dimensionofeachobject,andtis#iterations.
Normally,k,t,d<<n.
3) Givesbestresultwhendatasetaredistinctorwellseparatedfromeachother.
Disadvantages
1) Thelearningalgorithmrequiresopriorispecificationofthenumberofclustercenters.
2) TheuseofExclusiveAssigmnent-Iftherearetwohighlyoverlappingdatathenk-meanswillnotbeableto
resolvethattherearetwoclusters.
3) Thelearningalgorithmisnotinvarianttonon-lineartransformationi.e.withdifferentrepresentationofdata
wegetdifferentresults(datarepresentedinformofcartesianco-ordinatesandpolarco-ordinateswillgive
differentresults).
4) Euclideandistancemeasurescanunequallyweightunderlyingfactors.
5) Thelearningalgorithmprovidesthelocaloptimaofthesqurederrorfunction.
References :
1) AnEfficientk-meansClusteringAlgorithm:AnalysisandImplementationbyTapasKanungo,DavidM.
Mount,NathanS.Netanyahu,ChristineD.Piatko,RuthSilvermanandAngelaY.Wu.
2) ResearchissuesonK-meansAlgorithm: AnExperimentalTrialUsingMatlabbyJoaquinPerezOrtega,Ma
Del.RocioBooneRojasandMariaJ.SomodevillaGarcia.
3) Thek-méansalgorithm-NotesbyTan,Steinbach,KumarGhosh.
4) http://home.dei.polimi.it/matteucc/Clustering/tutorialhtml/kmeans.html
5) k-meansclusteriiigbykechen.
References
https://sites.google.com/site/dataclusteringalgorithm/k-means-clustering-algorithm
Theuseofahatinptandpuisintendedtosignifythatthesequantitiesare“estimates.”Thedefinitionsintroducedin
Eqs.(6.20)and(6.21)havebeengeneralizedtoincludeanonstationaryenvironment,inwhichcaseallthesensory
signalsandthedesiredresponseassumetime-varyingformstoo.Thus,substitutingp1(j,k,n)-x1(m)x1(n)andptt(k,
n) inplaceofr,(j,k)andfm(k)inEq.(6.17).Weget
t
wty(n + 1) = wt (n)y + [ x (n) tf(n) - w (n)x (n) x (n) ]
y j j
j-1
y t y j j
where y(n) is the output of the spatial filter computed at iteration n in accordance with the LMS algorithm; that is,
t
d(n) = w (m)x (n)
j-1
y j
(6.23)
NotethatinEq.(6.22)wehaveusedwy(n)inplaceofsy(n)toemphasizethefactthatEq.(6.22)invilves“estimates”
oftheweightsofthespatialfilter.
Figure6.3illustratestheoperationalenvironmentoftheLMSalgorithm,whichiscompletelydescribedby
Eqs.(6.22)and(6.23).AsummaryoftheLMSalgorithmispresentedinTable6.1,whichclearlyillustratesthe
simplicityofthealgorithm.Asindicatedinthistable,fortheinitializationofthealgorithm,itiscustomarytosetallthe
initialvaluesoftheweightsofthefilterequaltozero.
Inthemethodofsteepestdescentappliedtoa“known”environment,theweightvectorw(n),madeupofthe
weights wy(n), wy(n),,w, (n). starts at some initial value w(i), and then follows a precisely defined trajectory
(alongtheerrorsurface)thateventuallyterminatesontheoptimumsolutionw,providedthatthelearning-rateparameter
nischosenproperly.Inconstrast,intheLMSalgorithmappliedtoan“unknown”environment,theweightvector
w(n),respresentingan“estimate”ofw(n),followsarandomtrajectory.Forthisreason,theLMSalgorithmis
sometimesreferredtoasa“stochasticgradientalgorithm.”AsthenumberofiterationsintheLMSalgorithmapproaches
Unedited Version: Neural Network and Fuzzy System
5
infinity,w(n)performsarandomwalk(Brownianmotion)abouttheoptimumsolutionw,;seeAppendixD.
AnotherwayofstatingthebasicdifferencebetweenthemethodofsteepestdescentandtheLMSalgorithm
isintermsoftheerrorcalculationsinvolved.Atanyiterationn,themethodofsteepestdescentminimizesthemean-
squarederrorj(n).Thiscostfunctioninvolvesensembleaveraging,theeffectofwhichistogivethemethodof
steepestdescentan“exact”gradientvectorthatimprovesinpointingaccuracywithincreasingn.TheLMSalgorithm,
ontheotherhand,minimizesaninstantaneousestimateofthecostfunctionj(n).Consequently,thegradientvectorin
theLMSalgorithmis“random,”anditspointingaccuracyimproves“ontheaverage”withincreasingn.
ThebasicdifferencebetweenthemethodofsteepestdescentandtheLMSalgorithmmayalsobestatedin
termsoftime-domainideas,emphasizingotheraspectsoftheadaptivefilteringproblem.Themethodofsteepest
discentminimizesthesumoferrorsquaresx, (n),integratedoverallpreviousiterationsofthealgorithmuptoand
including estimates of the autocorrelation function y, and cross-correlation function yn. In constract, theLMS
1
algorithmsimplyminimizestheinstantaneouserrorsquaredy(n),definedas( )e2(n),therebyreducingthestorage
2
requirementtotheminimumpossible.Inparticular,itdoesnotrequirestoringanymoreinformationthanispresentin
theweightsofthefilter.
Equation(6.22)providesacompletedescriptionofthetimeevolutionoftheweightsintheLMSalgorithm.Rewriting
thesecondlineofthisequationinmatricform,wemayexpressitasfollows:
W(n+1)=w(n)+n[d(n)-x2(n)w(n)]x(n) (6.24)
where
t
w(n) - [w (n), tw (n), ......,w]
2 t
(6.25)
and
x(n) - [w (n), w (n), ......,x(n)]t (6.26)
t 2 t
W(n+1)=[I-nx(n)x2(n)]w(n)+nx(n)d(n) (6.27)
whereIistheidentitymatrix.InusingtheLMSalgorithm,wenotethat
wherez-1istheunit-delayoperatorimplyingstorage.UsingEqs.(6.27_and(6.28),wemaythusrepresenttheLMS
algorithmbythesignal-flowgraphdepictedinFig.6.4.
Thesignal-flowgraphofFig.6.4revealsthattheLMSalgorithmisanexampleofastochasticfeedback
system.ThepresenceoffeedbackhasaprofoundimpactontheconvergencebehaviouroftheLMSalgorithm,as
discussednext.
Hopfield Network
HopfieldneuralnetworkwasinventedbyDr.JohnJ.Hopfieldin1982.Itconsistsofasinglelayerwhichcontains one or
more fully connected recurrent neurons. The Hopfield network is commonly used for auto-association and
optimizationtasks.
A Hopfield network which operates in a discrete line fashion or in other words, it can be said the input and
outputpatternsarediscretevector,whichcanbeeitherbinary(0,1)orbipolar(+1,-1)innature.Thenetworkhas
symmetrical weights with no selfconnectionsi.e., wij= wjiand wii =0.
Architecture
Following are some important points to keep in mind about discrete Hopfield network -
* Thismodelconsistsofneuronswithoneinvertingandonenon-invertingoutput.
* Theoutputofeachneuronshouldbetheinputofotherneuronsbutnottheinputofself.
* Weight/connectionstrengthisrepresentedbywij.
* Connectionscanbeexcitatoryaswellasinhibitory.Itwouldbeexcitatory,iftheoutputoftheneuronissame
astheinput,otherwiseinhibitory.
* Weightsshouldbesymmetrical,i.e.wij=wji
Similarly,otherarcshavetheweightsonthem.
DuringtrainingofdiscreteHopfieldnetwork,weightswillbeupdated.Asweknowthatwecanhavethebinary input
vectors as well as bipolar input vectors. Hence, in both the cases, weight updates can be done with the
followingrelation.
Weight Matrix is given by
p
ij i j
Step1- Initializetheweights,whichareobtainedfromtrainingalgorithmbyusingHebbianprinciple.
Step2- Performsteps3-9,iftheactivationsofthenetworkisnotconsolidated.
Step3- Foreachinputvectorx,performsteps4-8.
Step4- Makeinitialactivationofthenetworkequaltotheexternalinputvectorxasfollows-
yi= xifor i = 1 to
nStep5- ForeachunitYi,performsteps6-9.
Step6- Calculatethenetinputofthenetworkasfollows-
y w
Yini= xi+ j ji
j
Step7- Applytheactivationasfollowsoverthenetinputtocalculatetheoutput-
1 ifyini>
iy i = yi
ifyini= i
0 ifyini< i
Here i is the threshold.
Unedited Version: Neural Network and Fuzzy System
3
Step9- Testthenetworkforconjunction.
Energy Function Evaluation
An energy function is defined as a function that is bonded and non-increasing function of the state of the system.
EnergyfunctionalsocalledLyapunovfunctiondeterminesthestabilityofdiscreteHopfieldnetwork,andischaracterized as
follows-
n n n n
Ef = -
1
2 i=1 j=1
yyw
i j ij - x y + y
i=1
ii
i=1
ii
Condition :In a stable network, whenever the state of node changes, the above energy function will decrease.
Suppose when node ihas changed state from y i (k) to y (k + 1)
i
then the Energy change Ef is given by the
following relation.
E f= E (y f(k+1) -i E (y (k) )f i
( w y
n
=-
j=1
iji
(k)
+x
i
)
- ( y (k+1) - y (k) )
i i i
= - (net )
i
y i
The change in energy depends on the fact that only one unit can update its activation at a time.
IncomparisonwithDiscreteHopfieldnetwork,continuousnetworkhastimeasacontinuousvariable.Itisalsoused
inautoassociationandoptimizationproblemssuchastravellingsalesmanproblem.
Model: Themodelorarchitecturecanbebuildupbyaddingelectricalcomponentssuchasamplifierswhichcan
maptheinputvoltagetotheoutputvoltageoverasigmoidactivationfunction.
n n n n n yi
1 1
E f = - xyii + a-1 (y)dy
2
yijyw
ij w ijri9
0
i=1 j=1 i=1 i=1 j=1
j i
j i
* Theyuserecurrentstructure.
* Theyconsistofstochasticneurons,whichhaveoneofthetwopossiblestates,either1or0.
* Someoftheneuronsinthisareadaptive(freestate)andsomeareclamped(frozenstate).
* IfweapplysimulatedannealingondiscreteHopfieldnetwork,thenitwouldbecomeBoltzmannMachine.
Reference :
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_hopfield.htm
spurious states
purious states are patterns xsPxsP, where Pp is the set of patterns to be memorized. In other
words, they
correspondtolocalminimaintheenergyfunctionthatshouldn’tbethere.Theycanbecomposedofvariouscombinatio
ns
oftheoriginalpatternsorsimplythenegationofanypatternintheoriginalpatternset.Thesetendtobecomepresent
when=|P|/N=|P|/N(whereNN isthenumberofneurons)becomestoohighforacertainlearningrule.
Here’ssomehand-
wavyintuition.Thelearningrulesprojectthecurrentconfigurationofthenetworkintothesubspace
spannedbythepatternvectorsandthencalculatethepatternvectorthatliesclosesttotheprojectedconfiguration
vector.Butevenifyouhadcompletelyorthogonalpatterns,youcannotspecifymorepatternsthanthenumberof
neurons(becausetheneitheryouduplicateapatternorthenextpatternyouaddisn’torthogonal).
Therealproblemisthatmostlearningrulesgive<<N<<N(e.g.theHebbrulegives0.1380.138using
mean-fieldderivations)becausetheprojectionintothesubspaceisnotorthogonal.Thisisnotanissueifthepatterns
Unedited Version: Neural Network and Fuzzy System
5
themselvesareorthogonal(i.e.completelyuncorrelated),butthatisveryrarelythecaseinpractice.
Therearewaysto“unlearn”thesespuriousminimatoo.Seethisquestionforgoodreferences,especiallycheckthe
Rojasbookwhichisavailableforfreeonline.Also,ifyoucangetyourhandsontheHertzbook,lookatEq.(10.22)
whichisthemeanfieldequationwhosesolutionsgivethepossiblestates,includingspuriousones(theyalsogivean
explanationforhowtofindthemspecifically).
Exercise:
Q1. Write short note onHopfield Network.
Q2. Explain Discrete Hopfield Network.
Q3. What are the important points of discrete Hopfield network.
Q4. Write a note on Training Algorithm with its cases
Q5. Explain Energy Function Evaluation.
Q6. What is mean by Continuous Hopfield Network.
Thestartingpointofstatisticalmechanicsisanenergyfunction.Weconsideraphysicalsystemwithasetof
probabilisticstatesx={x},eachofwhichhasenergyE(x).ForasystematatemperatureT>0,itsstatexvarieswith
time,andquantitiessuchasEthatdependonthestatefluctuates.Althoughtheremustbesomedrivingmechanismfor
thesefluctuations,partoftheideaoftemperatureinvolvestreatingthemasrandom.Whenasystemisfirstprepared,
orafterachangeofparameters,thefluctuationshasonaverageadefinitedirectionsuchthattheenergyEdecreases.
However,sometimeslater,anysuchtrendceasesandthesystemjustfluctuatesaroundaconstantaveragevalue.
Thenwesaythatthesystemisinthermalequilibrium.
Afundamentalresultfromphysicstellsusthatinthermalequilibriumeachofthepossiblestatesxoccurs
withprobability,determinedaccordingtoBoltzmann-Gibbsdistribution,
- E(x)
T
1
P(x)= e
Z
is called the partition function and it is independent of the state x but temperature.
T =KBTa
wherecoefficientkBisBoltzmann’sconstanthavingvalue1.38x10-16erg/K.Interestinglyenough,thesame
Unedited Version: Neural Network and Fuzzy System
distributioncanalsobeachievedintheviewpointofinformationtheory.AlthoughthetemperatureThasnophysical
meaningininformationtheory,itisinterpretedasapseudotemperatureinanabstractmanner.
Therefore, in equilibrium the Boltzmann Gibbs distribution given by equation (8.1.1) results in :
1
P(xj | xi) =
1+e
TheMetropolisalgorithmprovidesasimplemethodforsimulatingtheevolutionofphysicalsysteminaheat
bathtothermalequilibrium[Metropolisetal].ItisbasedonMonteCarloSimulationtechnique,whichaimsto
approximatetheexpectedvalue<g(x)>ofsomefunctiong(x)ofarandomvectorxwithagivendensityfunction fd(x).
For this purpose several x vectors, say x=Xk k=1..K, are randomly generated according to the density
functionfd(x)andthenYkisfoundasYk=g(Xk).Byusingthestronglawoflargenumbers:
K
lim1
k k
Y = < Y > = < g(x) >
the average of generated Y vectors can be used as an estimate of <g(x)> [Sheldon 1989].
In each step of the Metropolis algorithm, an atom (unit) of the system is subjected to a small random
displacement,andtheresultingchangeEintheenergyofthesystemisobserved.IfE<0,thenthedisplacementis
accepted,andthenewsystemconfigurationwiththedisplacedatomisusedasthestartingpointforthenextstepof
P(E)=e-
ReferringtoEq.(8.1.5),noticethatifP(xi)>P(xj)impliesE(xi)<E(xj),andviceversa.Somaximizingthe
probabilityfunctionisequivalentto minimizingtheenergy function.Furthermore,noticethatthis propertyisindependent
ofthetemperature,althoughthediscriminationbecomesmoreapparentasthetemperaturedecreases(Figure8.1).
Therefore,thetemperatureparameterTprovidesanewfreeparameterforsteeringthestepsizetowardthe
globaloptimum.Withahightemperature,theequilibriumcanbereachedmorerapidly.However,ifthetemperature
istoohigh,allthestateswillhaveasimilarlevelofprobability.Ontheotherhand,whenT0,theaveragestate
becomesvery closetotheglobalminimum.Thisidea,thoughveryattractiveatthefirstglance,cannotbeimplemented
directlyinpractice.Infact,withalowtemperature,itwilltakeaverylongtimetoreachequilibriumand,more
seriously,thestateismoreeasilytrappedbylocalminima.Therefore,itisnecessarytostartatahightemperatureand
thendecreaseitgradually.Correspondingly,theprobablestatethengraduallyconcentratearoundtheglobalminimum
(Figure8.2).
Low Temperature
High Temperature
Figure 8.2 The energy levels adjusted for high and low temperature
Thishasananalogywithmetallurgicalannealing,inwhichabodyofmetalisheatedneartoitsmeltingpoint
andisthenslowlycooledbackdowntoroomtemperature.Thisprocesseliminatesdislocationsandothercrystal
latticedisruptionsbythermalagitationathightemperature.Furthermore,itpreventstheformation ofnewdislocations
bycoolingthemetalveryslowly.Thisprovidesnecessarytimetorepairanydislocationsthatoccurasthetemperature
drops.Theessenceofthisprocessisthatglobalenergyfunctionofthemetalwilleventuallyreachanabsoluteminimum value.
Ifthematerialiscooledrapidly,itsatomsareoftencapturedinunfavorablelocationsinthelattice.Oncethe
temperaturehasdroppedfarbelowthemeltingpoint,thesedefectssurviveforever,sinceanylocal rearrangementsof
atomscostsmoreenergythanwhateveravailableinthermalfluctuations.Theatomiclatticethusremainscapturedin
alocalenergyminimum.Inordertoescapefromlocalminimaandtohavethelatticeintheglobalenergyminimum,the
thermalfluctuationscanbeenhancedbyreheatingthematerialuntilenergy-consuminglocalrearrangementsoccurat
areasonablerate.Thelatticeimperfectionsthenstarttomoveandannihilate,untiltheatomiclatticeisfreeofdefects-
exceptforthosecausedbythermalfluctuations.Thesecanbegraduallyreducedifthetemperatureisdecreasedso slowly
thatthermalequilibriumismaintainedatalltimesduringthecoolingprocess.Howmuchtime mustbespentfor
thecoolingprocessdependsonthespecificsituation.A greatdealofexperienceisrequiredtoperformtheannealing
inanoptimalway.Ifthetemperatureisdecreasedquickly,somethermalfluctuationsarefrozenin.Ontheotherhand,
ifoneproceedstooslowly,theprocessneverends.
Unedited Version: Neural Network and Fuzzy System
Unedited Version: Neural Network and Fuzzy System
Theamazingthingaboutannealingisthatthestatisticalprocessofthermalagitationleadstoapproximatelythesamef
inalenergystate.Thisresultisindependentoftheinitialconditionofthemetalandanyofthedetailsofthe
statisticalannealingprocess.Themathematicalconceptofsimulatedannealingderivesfromananalogywiththis
physicalbehavior.
Thesimulatedannealingalgorithm,isavariantoftheMetropolisalgorithminwhichthetemperatureistime
dependent.Inanalogywithmetallurgicalannealing,itstartswithahightemperatureandgraduallydecreasesit.At
eachtemperature,itappliesseveraltimestheupdaterulegivenbyEq.(8.1.8).Anannealingschedulespecifiesafinite
sequenceoftemperaturevaluesandafinitenumberoftransitionsattemptedateachvalueofthetemperature.The
annealingscheduledevelopedby[Kirkpatricketal1983]isasfollows.
TheinitialvalueT0ofthetemperatureischosenhighenoughtoensurethatvirtuallyallproposedtransitionsbe
acceptedbythesimulatedannealingalgorithm.Thenthecoolingisperformed.Ateachtemperature,enoughtransitions
areattemptedsothatthereisapredeterminednumberoftransitionsperexperimentontheaverage.Attheend,the
systemisfrozenandannealingstopsifthedesirednumberofacceptancesisnotachievedatpredeterminednumberof
successivetemperatures.Inthefollowing,weprovidetheannealingprocedureinmoredetail:
SIMULATED ANNEALING
Step1.SetInitialvalues:assignahighvaluetotemperatureasT(0)=T0,decide
onconstantsKT,KAandKS,Typicalvaluesforwhichare0.8<KT<0.99, KA=10.
and KS=3.
Step2.Decrementthetemperature:T(k)=KT,(k-1)whereKT,isaconstant
smallerbutclosetounity.
Step3.Attemptenoughnumberoftransitionsateachtemperature,sothatthere
areKAacceptedtransitionsperexperimentontheaverage.
Step4.StopifthedesirednumberofacceptancesisnotachievedatKSsuccessive
temperatureselserepeatsteps2and3.
Averyimportantpropertyofsimulatedannealingisitsasymptoticconvergence.Ithasbeenprovedin[GemanandGe
man84]thatifT(k)atiterationkischosensuchthatitsatisfies
T0
T(k) >
log (1+k)
providedtheinitialtemperatureT0ishighenough,thenthesystemwillconvergetotheminimumenergyconfiguration.
Themaindrawbackofsimulatedannealingisthelargeamountofcomputationaltimenecessaryforstochasticrelaxation.
Boltzmannmachine[Hintonetal83]isaconnectionistmodelhavingstochasticnature.Thestructureofthe
BoltzmannmachineissimilartoHopfieldnetwork,butitaddssomeprobabilisticcomponenttotheoutputfunction.It
usessimulatedannealingconcepts,inspiteofthedeterministicnatureinstatetransitionoftheHopfieldnetwork
[Hintonetal83,Aartsetal1986,AllwrightandCarpenter1989,LaarhovenandAarts1987].
ABoltzmannmachinecanbeviewedasarecurrentneuralnetworkconsistingofNtwostateunits.Depending
onthepurpose,thestatescanbechosenfrombinaryspace,thatisx¸{0,1}Norfrombipolarspacex¸{-1,1}N.
TheenergyfunctionoftheBoltzmannmachineis:
N N N
1 x
E(x) =- 2 w xx-
ijij ii
i j i
Theconnectionsaresymmetricalbydefinition,thatiswij=wji.Furthermoreinthebipolarcase,theconvergence
ofthemachinerequiresw=0(orequivalentltly
ii
=0).Howeverinthebinarycaseself-loopsareallowed.
i
TheobjectiveofaBoltzmannmachineistoreachtheglobalminimumofitsenergyfunction,whichisthestate
havingminimumenergy.Similartosimulatedannealingalgorithm,thestatetransitionmechanismofBoltzmannMachine
usesastochasticacceptancecriterion,thusallowingittoescapefromitslocalminima.InasequentialBoltzmann
machine,unitschangetheirstatesonebyone,whiletheychangestatealltogetherinaparallelBoltzmannmachine.
Note that the contribution of the connections wkmkj, mj, to E(x) and E(xj) is identical,
furthermore wij=wij. For the binary case, by using equations (9.2.1) and (9.2.2), we obtain
Therefore,thechangeintheenergycanbecomputedbyconsideringonlylocalinformation.Inasequential
Boltzmannmachine,atrialforastatetransitionisatwo-stepprocess.Givenastatex,firstaunitjisselectedasa
candidatetochangestate.Theselectionprobabilityusuallyhasuniformdistributionovertheunits.Thenaprobabilistic
functiondetermineswhetherastatetransitionwilloccurornot.Thestatexjisacceptedwithprobability
1
j
P(x |x) =
1+e E(x |x)/T
j
whereTisacontrolparameterhavinganalogyintemperature.Initiallythetemperatureissetlargeenoughtoaccept
almostallstatetransitionswithprobabilitycloseto0.5,andthenTisdecreasedintimetozero(Figure9.3).Witha
propercoolingschedule,thesequentialBoltzmannmachineconvergesasymptoticallytoastatehavingminimum
energy.
ABoltzmannmachinestartsexecutionwitharandominitialconfiguration.Initially,thevalueofTisverylarge.
Acoolingscheduledetermineshowandwhentodecrementthecontrolparameter.AsT 0,lessandlessstate
transitionsoccur.Ifnostatetransitionsoccurforaspecifiednumberoftrials,itisdecidedthattheBoltzmannmachine
hasreachedthefinalstate.
UseofBoltzmannmachineasaneuraloptimizerinvolvestwophasesasitisexplainedfortheHopfield
networkinChapter4.Inthefirstphase,theconnectionweightsaredetermined.Forthispurpose,anenergyfunctio
n forthegivenapplicationisdecided.Inthenon-
constrainedoptimizationapplications,theenergyfunctioncanbe
directlyobtainedbyusingthecostfunction.However,inthecaseofconstrainedoptimization,theenergyfunctio
n
mustbederivedusingboththeoriginalcostfunctionandtheconstraints.Thenextstepistodeterminetheconnection
weights{wij}byconsideringthisenergyfunction.Theninthesecondphase,themachinesearchestheglobalminimu
m throughtheannealingprocedure.
Exercise:
Severalversionsoftheheteroassociativerecurrentneuralnetwork,orbidirectionalassociativememory(BAM),
developedbyKosko(1988,1992a).-Abidirectionalassociativememory[Kosko,1988]storesasetofpattern
associationsbysummingbipolarcorrelationmatrices(annbymouterproductmatrixforeachpatterntobestored).
- Thearchitectureofthenetconsistsoftwolayersofneurons,connectedbydirectionalweightedconnectionpaths.
- Thenetiterates,sendingsignalsbackandforthbetweenthetwolayersuntilallneuronsreachequilibrium(i.e.,until
eachneuron’sactivationremainsconstantforseveralsteps).-Bidirectionalassociativememoryneuralnetscan
respondtoinputtoeitherlayer.-Becausetheweightsarebidirectionalandthealgorithmalternatesbetweenupdating
theactivationsforeachlayer,weshallrefertothelayersastheX-layerandtheY-layer(ratherthantheinputand
outputlayers).
Discrete BAM
Thetwobivalent(binaryorbipolar)formsofBAMarecloselyrelated.Ineach,theweightsarefoundfromthesum
oftheouterproductsofthebipolarformofthetrainingvectorpairs.Also,theactivationfunctionisastepfunction,
withthepossibilityofanonzerothreshold.Thebipolar
* The weight matrix to store a set of input and target vectors s(p) : t(p), p = 1, . . . , P, where
* Theformulasfortheentriesdependonwhetherthetrainingvectorsarebinaryorbipolar.
W = {wij} is given by
p
wij = p=1(2s (p) -1) (2t (p)-1)
i j
Forbipolarinputvectors,theweightmatrixW={wij} isgivenby
p
wij = p=1s (p)t (p)
i j
ActivationFunction:TheactivationfunctionforthediscreteBAMistheappropriatestepfunction,dependingon
whetherbinaryorbipolarvectorsareused.
Forbinaryinputvectors,theactivationfunctionfortheY-layeris
1 if y_inj > 0
yj = yj if y_inj = 0
0 if y_inj < 0
and the activation function for the X-layer is
1 if x_inj > 0
xj = xj if x_inj = 0
0 if x_inj < 0
Unedited Version: Neural Network and Fuzzy System
2
Forbinaryinputvectors,theactivationfunctionfortheY-layeris
1 if y_in j >0j
yj = yj if y_in j =0j
-1 if y_in j <0j.
and the activation function for the X-layer is
1 if x_ini>0 i
x i= xi if x_ini=0 i
-1 if x_ini<0i.
Notethatifthenetinputisexactlyequaltothethresholdvalue,theactivationfunction“decides”toleavetheactivation
ofthatunitatitspreviousvalue.
Theactivationsofallunitsareinitializedtozero.
ThefirstsignalistobesentfromtheX-layertotheY-layer.However,iftheinputsignalfortheX-layeristhe
zerovector,theinputsignaltotheY-layerwillbeunchangedbytheactivationfunction,andtheprocesswillbe
thesameasifthefirstpieceofinformationhadbeensentfromtheY-layertotheX-layer.
* Signals are sent only from one layer to the other at any step of the process, not simultaneously in both
directions.
Algorithm
1. InitializetheweightstostoreasetofPvectors;initializeallactivationsto0
2. Foreachtestinginput,doSteps3-7.
3a. PresentinputpatternxtotheX-layer,(i.e.,setactivationsofX-layertocurrentinputpattern). 3b.
PresentinputpatternytotheY-layer,(Eitheroftheinputpatternsmaybethezerovector.)
4. Whileactivationsarenotconverged,doSteps5-7.
5. UpdateactivationsofunitsinY-layer:
Compute net inputs :
y_in j= w ij x i
i
Compute activations :
yj = f (y.inj)
Send signal to X-layer.
6. UpdateactivationsofunitsinX-layer:
Compute net inputs :
x_in i= w ij y j
j
Acontinuousbidirectionalassociativememory[Kosko,1988]transformsinputsmoothlyandcontinuouslyintooutput
intherange[0,1]usingthelogisticsigmoidfunctionastheactivationfunctionforallunits.
Forbinaryinputvectors(s(p),t(p)),p=1,2,...,P,theweightsaredeterminedbytheformula
p
wij = p=1(2s (p) -1) (2t (p)-1)
i j
Theactivationfunctionisthelogisticsigmold
1
f(y,inj) =
1 + exp(-y_inj)
whereabiasisincludedincalculatingthenetinputtoanyunitandcorrespondingformulas
applyfortheunitsintheX-layer.
y_in j =b j + w ij xi
i
AnumberofotherformsofBAMshavebeendeveloped.Insome,theactivationschangebasedonadifferential
equationknownasCohenGrossbergactivationdyanamics(Cohen&Grossberg,1983).
Application
Considerthepossibilityofusinga(discrete)BAMnetwork(withbipolarvectors)tomaptwosimpleletters
(givenby5x3patterns)tothefollowingbipolarcodes:
he target output vector t for letter A is [-1 1] and for the letter C is [1 1],
ToillustratetheuseofaBAM,wefirstdemonstratethatthenetgivesthecorrectyvectorwhenpresentedwiththex
vectorforeitherthepatternAorthepatternC:
INPUT PATTERN A
INPUT PATTERN C
Toseethebidirectednatureofthenet,observethattheYvectorscanalsobeusedasinput.Forsignalssentfromthe
Y-layertotheX-layer,theweightmatrixisthetransposeofthematrixW,i.e.wT.
For the input vector associated with pattern A. namely. (-1, 1), we have
[-1 1] wT = [-2 2 -2 2 -2 2 2 2 2 2 -2 2 2 2 -2 2]
This is pattern A.
Fuzzy Logic
Fuzzy set theory was developed by Lotfi A. Zadeh [Zadeh, 1965], professor for computer science
at the University of California in Berkeley, to provide a mathematical tool for dealing with the concepts
used in natural language (linguistic variables). Fuzzy Logic is basically a multivalued logic that allows
intermediate values to be defined between conventional evaluations.
However, the story of fuzzy logic started much more earlier . To devise a concise theory of logic, and later
mathematics, Aristotle posited the so-called ¡¨Laws of Thought¡¨. One of these, the ¡¨Law of the Excluded Middle,¡¨
states that every proposition must either be True (T) or False (F). Even when Parminedes proposed the first version of
this law (around 400 Before Christ) there were strong and immediate objections: for example, Heraclitus proposed that
things could be simultaneously True and not True. It was Plato who laid the foundation for what would become fuzzy
logic, indicating that there was a third region (beyond T and F) where these opposites ¡¨tumbled about.¡¨ A systematic
alternative to the bi-valued logic of Aristotle was first proposed by ¢Gukasiewicz around 1920, when he described a
three-valued logic, along with the mathematics to accompany it. The third value he proposed can best be translated as
the term ¡¨possible,¡¨ and he assigned it a numeric value between T and F. Eventually, he proposed an entire notation
Later, he explored four-valued logics, five-valued logics, and then declared that in principle there was
nothing to prevent the derivation of an infinite-valued logic.¢Gukasiewicz felt that three- and infinite-valued logics
were the most intriguing, but he ultimately settled on a fourvalued logic because it seemed to be the most easily
adaptable to Aristotelian logic. It should be noted that Knuth also proposed a threevalued logic similar to
Lukasiewicz¡¦s, from which he speculated that mathematics would become even more elegant than in traditional
bi-valued logic. The notion of an infinite-valued logic was introduced in Zadeh¡¦s seminal work ¡¨Fuzzy Sets¡¨
where he described the mathematics of fuzzy set theory, and by extension fuzzy logic. This theory proposed
making the membership function (or the values F and T) operate over the range of real numbers [0, 1].
New operations for the calculus of logic were proposed, and showed to be in principle at least a generalization
of classic logic. Fuzzy logic provides an inference morphology that enables approximate human reasoning capabilities to
be applied to knowledge-based systems. The theory of fuzzy logic provides a mathematical strength to capture the
uncertainties associated with human cognitive processes, such as thinking and reasoning. The conventional approaches
to knowledge representation lack the means for representating the meaning of fuzzy concepts. As a consequence, the
approaches based on first order logic and classical probability theory do not provide an appropriate conceptual
framework for dealing with the representation of commonsense knowledge, since such knowledge is by its nature both
lexically imprecise and noncategorical. The developement of fuzzy logic was motivated in large measure by the need for
a conceptual framework which can address the issue of uncertainty and lexical imprecision. Some of the essential
characteristics of fuzzy logic relate to the following (Zadeh, 1992): In fuzzy logic, exact reasoning is viewed as a limiting
case of approximate reasoning. In fuzzy logic, everything is a matter of degree. In fuzzy logic, knowledge is interpreted a
collection of elastic or, equivalently, fuzzy constraint on a collection of variables. Inference is viewed as a process of
propagation of elastic constraints. Any logical system can be fuzzified. There are two main characteristics of fuzzy
systems that give them better performance for specific applications. Fuzzy systems are suitable for uncertain or
approximate reasoning, especially for the system with a mathematical model that is difficult to derive. Fuzzy logic allows
Theory has been attacked several times during its existence. For example, in 1972 Zadeh’s colleague R. E.
Kalman (the inventor of Kalman filter) commented on the importance of fuzzy logic: “...Zadeh’s proposal could be
severely, fericiously, even brutally criticized from a technical point of view. This would be out of place here. But a
blunt question remains: Is Zadeh presenting important ideas or is he indulging in wishful thinking?...”
The heaviest critique has been presented by probability theoreticians and that is the reason why
many fuzzy logic authors (Kosko, Zadeh and Klir) have included the comparison between probability and
fuzzy logic in their publications. Fuzzy researchers try to separate fuzzy logic from probability theory,
whereas some probability theoreticians consider fuzzy logic a probability in disguise.
Fuzzy Set
Since set theory forms a base for logic, we begin with fuzzy set theory in order to ¡§pave the
In classical set theory the membership of element x of a set A (A is a crisp subset of universe X )
is defined by
0, if xA,
A(x) =
1, if xA
6.1
Definition 6.1.1 (fuzzy set, membership function) Let X be a nonempty set, for example X=Rn, and be
called theuniverse of discourse. A fuzzy AX is characterized by the membership function
A:X [0, 1] =
6.2
From the definition we can see that the fuzzy set theory is a generalized set theory that includes the classical set
theory as a special case. Since {0,1} x [0,1] , crisp sets are fuzzy sets. Membership function (2.2) can also viewed as
a distribution of truth of a variable. In literature fuzzy set A is often presented as a set of ordered pairs :
where the first part determines the element and the second part determines the grade of membership.
Another way to describe fuzzy set has been presented by Zadeh, Dubois and Prade. If X is infinite, the
fuzzy set A can be expressed by
A = A(x)/ x
6.4
6.5
Note :Symbolin (2.4) has nothing to do with integral (it denotes an uncountable enumeration) and /
denotes atuple. The plus sign represents the union. Also note that fuzzy sets are membership
functions. Nevertheless, we may still use the set theoretic notations like A B.
This is the name of a fuzzy set given by AB .
Example 6.1.1
1-
x-c , x[c - h, h]
h
x-c
Continuous case : A(x) = , x [c, c + h]
h
Definition 6.1.2 (support) The support of a fuzzy set A is the crisp set that contains all elements of A with
non-zeromembership grade :
6.6
If the support is finite, it is called compact support. If the support of fuzzy set A consists of only one
point, it is called a fuzzy singleton. If the membership grade of this fuzzy singleton is one, A is called a
crisp singleton ‘Zimmermann, 1985’.
Definition 6.1.3 (core) The core (nucleus, center) of a fuzzy set A is defined by
core ( A) ={xX(x) = 1}
xz
x
and Ais called normal ifhgt( A) = 1, and subnormal ifhgt ( A) < 1.
Note :Non-empty fuzzy set can be normalized by dividingA(x)by supxA(x). Normalizing of A can
beregarded as a mapping from fuzzy sets to possibility distributions :
The relation between fuzzy set membership function , possibility distribution and probability
distribution
p :The definition pA(x)A(x) could hold, ifis additively normal. Additively normal means here that the
stochasticnormalization
A(x)dx = 1
would have to be satisfied. So it can be concluded that any given fuzzy set could define either a probability
distribution or a possibility distribution, depending on the properties of . Both probability distributions and
possibility distributions are special cases of fuzzy sets. All general distributions are in fact fuzzy sets [Joslyn, 1994].
Definition 6.1.6 (width of a convex fuzzy set) The width of a convex fuzzy setAis defined by
A
Definition 6.1.7 (-cut) The-cut of a fuzzy set is defined by
A
Definition 6.1.8 (fuzzy partition) set of fuzzy sets is called fuzzy partition if
6x, y X
NA
Ai
i=1
A
Definition 6.1.9 (fuzzy number) fuzzy set (subset of a real line R) is a fuzzy number, if the fuzzy set is
convex,normal, membership function is piecewise continuous and the core consists of G. one value only.
The family of fuzzy numbers is . In many situations people are only able to characterize numeric
information imprecisely. For example, people use terms such as, about 5000, near zero, or essentially
bigger than 5000. These are examples of what are called fuzzy numbers. Using the theory of fuzzy
subsets we can represent these fuzzy numbers as fuzzy subsets of the set of real numbers.
Note :Fuzzy number is always a fuzzy set, but a fuzzy set is not always a fuzzy number.
x +
Definition 6.1.10 (fuzzy interval)Afuzzy interval is a fuzzy set with the same restrictions as in definition
Definition 6.1.11 (LR-representation of fuzzy numbers) Any fuzzy number can be described by
L((a - x) / , x [a - a]
1 , x [a, b]
A (x) =
R((x - b) / , x [b, b ]
0 , otherwise
Where [a, b] is the core of A , and L:[0,1] [0,1], R :[0,1] [0,1] are shape functions (called brieflys-
functions) that are continuous and nonincreasing such that L(0) =R(0) = 1,L(1) =R(1) = 0 , where L stands
for lefthand side and R stands for right-hand side of membership function [Zimmermann, 1993].
Definition 6.1.12 (LR-representation of quasi fuzzy numbers) Any quasi fuzzy number can be described by
L((a - x) / , x a
A (x) = 1 , x[a, b]
R((x - b) / , x b
are shape functions that are continuous and non-increasing such that L(0) = R(0) = 1
For example, f(x) = e-xf, (x) = e-x2 and f(x) = max (0.1 - x) are such shape functions. In the following
the classical set theoretic operations are extended to fuzzy sets.
=
A =A (involution)
AB=BA
A A = A
A A = A (idempotence)
A (AB)= A
A (A B) = A (absorption
)
(A B) =
AB
(AB) =
AB
(De Morgan’s laws)
Proof :Above properties can be proved by simple direct calculations. For example,
=
A = 1 - (1 - A) =A .
Fuzzy Relations
R (u,) , where u X1 , X2 . Two fuzzy relations are combined by a so called sup-*- or max-min
composition, which will be given in the definition 2.1.19.
Note: Fuzzy relations are fuzzy sets, and so the operations of fuzzy sets (union, intersection, etc.) can be applied to them.
Example 2.1.2 Let the fuzzy relationR= “approximately equal” correspond to the equality of two
numbers.The intensity of cell R(u,) of the following matrix can be interpreted as the degree of
membership of the ordered pair in R. The numbers to be compared are {1,2,3,4} and {3,4 ,5,6}.
u \ 1 2 3 4
3 .6 .8 1 .8
4 .4 .6 .8 1
5 .2 .4 .6 .8
6 .1 .2 .4 .6
The matrix shows, that the pair (4, 4) is approximately equal with intensity 1 and the pair (1, 6) is
approximately equal with intensity 0.1.
Definition 2.1.18 (Cartesian product) LetAi x XIbe fuzzy sets. Then the Cartesian product isdefined
replaced by the sum. If S is just a fuzzy set (not a relation) in V , then (2.15) becomes
Example 2.1.3 LetX= {1,2,3,4},fuzzy setA= small = {(1,1), (2,0.6), (3,0.2), (4.0)}
and fuzzy relation R = “approximately equal”.
1 .5 0 0
R .5 1 .5 0
0 .5 1 .5
0 0 .5 1
The interpretation of example: x is small. If x and y are approximately equal, y is more or less
small.
Exercise:
The extension principle is said to be one of the most important tools in fuzzy logic. It gives means
to generalize non-fuzzy concepts, e.g., mathematical operations, to fuzzy sets. Any fuzzifying
generalization must be consistent with the crisp cases.
Definition 2.1.20 (extension principle) LetA1,..., An be fuzzy sets, defined on X1...Xn and let f be a function
f : Xix ...x XnVThe extension of f operating on A1 ,..., Angives a membership function (fuzzy set F )
when the inverse of f exists. Otherwise define F() = 0 . Function f is called inducing mapping.
If the domain is either discrete or compact, sup-min can be replaced by max-min. On continuous domains
sup-operation and the operation that satisfies criterion
x,y=0
Sw (x, y) = y,x=0
1 , otherwise
Fuzzy Rules
Fuzzy logic was originally meant to be a technique for modeling the human thinking and reasoning, which is
done by fuzzy rules. This idea has been replaced by the thought that the fuzzy rules form an interface between humans
and computers [Brown & Harris, 1994]. Humans explain their actions and knowledge using linguistic rules and fuzzy logic
is used to represent this knowledge on computers. There are three principal ways to obtain these rules :
information about the system. In practice human experts may not provide a sufficient number of rules and especially in
the case of complex systems the amount of knowledge may be very small or even non-existent. Thus the second way
must be used instead of the first one (provided the data is available). The third way is suited for the cases when some
knowledge exists and sufficient amount of data for training is available. In this case fuzzy rules got from experts roughly
Unedited Version: Neural Network and Fuzzy System
1
approximate the behavior of the system, and by applying training this approximation is made more precise. Rules
provided by the expert’s form an initial point for the training and thus exclude the necessity of random initialization and
diminish the risk of getting stuck in a local minimum (provided the expert knowledge is good enough).
It has been shown in [Mouzouris, 1996] that linguistic information is important in the absence of
sufficient numerical data but it becomes less important as more data become available.
Fuzzy rules define the connection between input and output fuzzy linguistic variables and they
can be seen to act as associative memories. Resembling inputs are converted to resembling outputs.
Rules have a structure of the form :
where Ajii and Bi are fuzzy sets (they define complete fuzzy partitions) in % U R % V “R, respectively. Linguistic variable x
is a vector of dimension d in d UU CC ...1 and linguistic variable V y# “. Vector x is an input to the fuzzy system and y is an
output of the fuzzy system. Note that Bi can also be a singleton (consequence part becomes: ...THEN ( y is i z )). Further, if
fuzzy system is used as a classifier, the consequence part becomes: ...THEN class is c.
A fuzzy rule base consists of a collection of rules {R1, R2,..., RM}, where each rule i R can be considered to be
of the form (2.34). This does not cause a loss of generality, since multi-input-multioutput (MIMO) fuzzy logic system can
always be decomposed into a group of multi-input-singleoutput (MISO) fuzzy logic systems. Furthermore (2.34) includes
the following types of rules as a special case [Wang, 1994] :
non-fuzzy rules
where the <process state> part contains a description of the process output at the kth sampling
instant. Usually this description contains values of error and change-of-error. The <control
action> part describes the control output (change-in-control) which should be produced given
the particular <process state>. For example, a fuzzified PI-controller is of the type (2.35).
Important properties for a set of rules are completeness, consistency and continuity. These are
defined in the following.
Definition 6.1.27 (completeness) A rule base is said to be complete if any combination of input
6 x X :hgt(out(x)) >0
Definition 6.1.28 (inconsistency) A rule base is inconsistent if there are two rules with the same
This means that two rules that have the same antecedent map to two non-overlapping
fuzzy output sets. When the output sets are non-overlapping, then there is something wrong
with the output variables or the rules are inconsistent or discontinuous.
Definition 6.1.29 (continuity) A rule base is continuous if the neighboring rules do not have fuzzy output sets
that have
empty intersection.
Exercise:
Unedited Version: Neural Network and Fuzzy System
3
Q1.What is mean by The Extension Principle.
Q2.What are the Fuzzy Rule.
Q3. Write a definition of completeness.
Q4. Write a Definition of inconsistency.
1, if x = x’
A’ (x) =
0, otherwise
Where X’ is the input. Fuzzifier of the form (2.43) is called a singleton fuzzifier. If the input contains noise
it can be modeled by using fuzzy number. Such fuzzifier could be called a nonsingleton fuzzifier. Because
Figure 6.15 Fuzy logic controller Figure 2.16 Fuzzy singleton as fuzzifier
The defuzzifier maps fuzzy sets to a crisp point. Several defuzzification methods have been suggested.
The following five are the most common :
Center of Gravity (CoG) : In the case of 1-dimensional fuzzy sets it is often called the Center of Area (CoA)method.
Some authors (for example, [Driankov et. al., 1993]) regard CoG and CoA as a same method, when other (for
example, [Jager, 1995]) give them different forms. If CoA is calculated by dividing the area of combination of output
membership functions by two and then taking from the left so much that we get an area equal to the right one, then
it is clearly a distinct method. CoG determines center of gravity of the mass, which is formed as a combination of the
clipped or scaled output fuzzy membership functions. The intersection part of these membership functions can be
taken once or twice into calculation. Driankov [1993] separates the Center of Gravity methods such that, if the
of Sums (CoS). In Fig. 2.17 defuzzified value obtained by CoS is slightly smaller than obtained by the CoA - method.
Height method (HM) : Can be considered as a special case of CoG, whose output membership functions
aresingletons. If symmetric output sets are used in CoG, they have the same centroid no matter how wide the set is
and CoG reduces to HM. HM calculates a normalized weighted sum of the clipped or scaled singletons. HM is
First of Maxima (FoM): As MoM, but takes the leftmost value instead of center value.
Defuzzification methods can be compared by some criteria, which might be the continuity of the output and the
computational complexity. HM, CoA and CoS produce continuous outputs. The simplest and quickest method of these is
the HM method and for large problems it is the best choice. The fuzzy systems using it have a close relation to some
well-known interpolation methods (we will return to this relation later).
The maximum methods (MoM, FoM) have been widely used. The underlying idea of MoM (with max-min
inference) can be explained as follows. Each input variable is divided into a number of intervals, which means that the
whole input space is divided into a large number of d-dimensional boxes. If a new input point is given, the corresponding
value for y is determined by finding which box the point falls in and then returning the average value of the
corresponding y-interval associated with that input box. Because of the piecewise constant output, MoM is inefficient
for approximating nonlinear continuous functions. Kosko has shown [Kosko, 1997] that if there are many rules that fire
simultaneously, the maximum function tends to approach a constant function. This may cause problems especially in
control.
The property of CoS is that the shape of final membership function used as a basis for defuzzification will
resemble more and more normal density function when the number of functions used in summation grows.
Systems of this type that sum up the output membership functions to form final output set are called additive
fuzzy systems.
Fuzzy logic is widely used in machine control. The term “fuzzy” refers to the fact that the logic
involved can deal with concepts that cannot be expressed as the “true” or “false” but rather as “partially
true”. Although alternative approaches such as genetic algorithms and neural networks can perform just
as well as fuzzy logic in many cases, fuzzy logic has the advantage that the solution to the problem can be
cast in terms that human operators can understand, so that their experience can be used in the design of
the controller. This makes it easier to mechanize tasks that are already successfully performed by
humans.
Fuzzy controllers are very simple conceptually. They consist of an input stage, a processing stage,
and an output stage. The input stage maps sensor or other inputs, such as switches, thumbwheels, and
Unedited Version: Neural Network and Fuzzy System
3
so on, to the appropriate membership functions and truth values. The processing stage invokes each
appropriate rule and generates a result for each, then combines the results of the rules. Finally, the
output stage converts the combined result back into a specific control output value.
The most common shape of membership functions is triangular, although trapezoidal and bell curves are also
used, but the shape is generally less important than the number of curves and their placement. From three to seven
curves are generally appropriate to cover the required range of an input value, or the “universe of
discourse” in fuzzy jargon.
As discussed earlier, the processing stage is based on a collection of logic rules in the form of IF
THEN statements, where the IF part is called the “antecedent” and the THEN part is called the
“consequent”. Typical fuzzy control systems have dozens of rules.
This rule uses the truth value of the “temperature” input, which is some truth value of “cold”, to generate
a result in the fuzzy set for the “heater” output, which is some value of “high”. This result is used with the results of
other rules to finally generate the crisp composite output. Obviously, the greater the truth value of “cold”, the
higher the truth value of “high”, though this does not necessarily mean that the output itself will be set to “high”
since this is only one rule among many. In some cases, the membership functions can be modified by “hedges” that
are equivalent to adverbs. Common hedges include “about”, “near”, “close to”, “approximately”, “very”, “slightly”,
“too”, “extremely”, and “somewhat”. These operations may have precise definitions, though the definitions can
vary considerably between different implementations. “Very”, for one example, squares membership functions;
since the membership values are always less than 1, this narrows the membership function. “Extremely” cubes the
values to give greater narrowing, while “somewhat” broadens the function by taking the square root.
In practice, the fuzzy rule sets usually have several antecedents that are combined using fuzzy operators, such as
AND, OR, and NOT, though again the definitions tend to vary: AND, in one popular definition, simply uses the minimum
weight of all the antecedents, while OR uses the maximum value. There is also a NOT operator that subtracts a
membership function from 1 to give the “complementary” function.
There are several ways to define the result of a rule, but one of the most common and simplest is the
“max-min” inference method, in which the output membership function is given the truth value generated by the
premise.
Rules can be solved in parallel in hardware, or sequentially in software. The results of all the rules
that have fired are “defuzzified” to a crisp value by one of several methods. There are dozens, in theory,
each with various advantages or drawbacks.
The “centroid” method is very popular, in which the “center of mass” of the result provides the
crisp value. Another approach is the “height” method, which takes the value of the biggest contributor.
The centroid method favors the rule with the output of greatest area, while the height method obviously
favors the rule with the greatest output value.
Notice how each rule provides a result as a truth value of a particular membership function for the output
variable. In centroid defuzzification the values are OR’d, that is, the maximum value is used and values are not
added, and the results are then combined using a centroid calculation. Fuzzy control system design is based on
empirical methods, basically a methodical approach to trial-and-error. The general process is as follows :
Document the system’s operational specifications and inputs and outputs.
Run through test suite to validate system, adjust details as required.
As a general example, consider the design of a fuzzy controller for a steam turbine. The block
diagram of this control system appears as follows :
The input and output variables map into the following fuzzy set :
N3 : Large negative.
N2 : Medium negative.
N1 : Small negative.
Z: Zero.
P1 : Small positive.
P2 : Medium positive.
P3 : Large positive.
In practice, the controller accepts the inputs and maps them into their membership functions and truth values.
These mappings are then fed into the rules. If the rule specifies an AND relationship between the mappings of the two
input variables, as the examples above do, the minimum of the two is used as the combined truth value; if an OR is
specified, the maximum is used. The appropriate output state is selected and assigned a membership value at the truth
level of the premise. The truth values are then defuzzified. For an example, assume the temperature is in the “cool”
state, and the pressure is in the “low” and “ok” states. The pressure values ensure that only rules 2 and 3 fire:
Unedited Version: Neural Network and Fuzzy System
10
+150
The output value will adjust the throttle and then the control cycle will begin again to generate the next value .
A fuzzy set is defined for the input error variable “e”, and the derived change in error, “delta”,
as well as the “output”, as follows:
LP : large positive
SP : small positive
ZE : zero
SN : small negative
LN : large negative
If the error ranges from -1 to +1, with the analog-to-digital converter used having a resolution of 0.25,
then the input variable’s fuzzy set (which, in this case, also applies to the output variable) can be
described very simply as a table, with the error / delta / output values in the top row and the truth
values for each membership function arranged in rows beneath :
_____________________________________________________________________________________
-- where :
mu(1) : Truth value of the result membership function for rule 1. In terms of a centroid calculation, this is
the “mass” of this result for this discrete case.
output(1) : Value (for rule 1) where the result membership function (ZE) is maximum over the output variable fuzzy
set range. That is, in terms of a centroid calculation, the location of the “center of mass” for this individual result. This
value is independent of the value of “mu”. It simply identifies the location of ZE along the output range.
-for the final control output. Simple. Of course the hard part is figuring out what rules actually
work correctly in practice.
If you have problems figuring out the centroid equation, remember that a centroid is defined by
summing all the moments (location times mass) around the center of gravity and equating the sum to
zero. So if is the center of gravity, is the location of each mass, and is each mass, this gives :
In our example, the values of mu correspond to the masses, and the values of X to location of the masses
(mu, however, only ‘corresponds to the masses’ if the initial ‘mass’ of the output functions are all the
same/equivalent. If they are not the same, i.e. some are narrow triangles, while others maybe wide
trapizoids or shouldered triangles, then the mass or area of the output function must be known or
calculated. It is this mass that is then scaled by mu and multiplied by its location X_i).
This system can be implemented on a standard microprocessor, but dedicated fuzzy chips are now available. For
example, Adaptive Logic INC of San Jose, California, sells a “fuzzy chip”, the AL220, that can accept four analog inputs
and generate four analog outputs. A block diagram of the chip is shown below :
SH
Unedited Version: Neural Network and Fuzzy System
16
:
sampl
e/hold
Antilock brakes
As a first example, consider an anti-lock braking system, directed by a microcontroller chip. The microcontroller
has to make decisions based on brake temperature, speed, and other variables in the system.
The variable “temperature” in this system can be subdivided into a range of “states”: “cold”,
“cool”, “moderate”, “warm”, “hot”, “very hot”. The transition from one state to the next is hard
to define.
An arbitrary static threshold might be set to divide “warm” from “hot”. For example, at exactly 90
degrees, warm ends and hot begins. But this would result in a discontinuous change when the
input value passed over that threshold. The transition wouldn’t be smooth, as would be required
in braking situations.
The way around this is to make the states fuzzy. That is, allow them to change gradually from one
state to the next. In order to do this there must be a dynamic relationship established between
different factors.
With this scheme, the input variable’s state no longer jumps abruptly from one state to the next. Instead, as the
temperature changes, it loses value in one membership function while gaining value in the next. In other words,
its ranking in the category of cold decreases as it becomes more highly ranked in the warmer category.
At any sampled timeframe, the “truth value” of the brake temperature will almost always be in some degree
part of two membership functions: i.e.: ‘0.6 nominal and 0.4 warm’, or ‘0.7 nominal and 0.3 cool’, and so on.
The above example demonstrates a simple application, using the abstraction of values from
multiple values. This only represents one kind of data, however, in this case, temperature.
Exercise:
Q1. Explain Fuzzifier and Defuzzifier.
Q2.Write a short note onFuzzification.
Q3. Explain the concept of defuzzyfication methods.
Q4.What is the process of Building a fuzzy controller.
Q5.Write a short note on centroid computation yields.
Q6. Write a short note on Antilock brakes.