Module 5
Module 5
Course Code:
CSE1008
By:
Dr. Monali Bordoloi,
Assistant Professor Senior Grade 2
1 SCOPE.
02/07/2025 08:18 AM
Module 5
Normal Forms of
CFG
Normal forms for CFGs: CNF and GNF, Closure properties of CFLs, Decision
Properties of CFLs: Emptiness, Finiteness and Membership, Pumping lemma
for CFLs
2 07/02/2025
Normal Forms of CFGs
CFGs can be transformed into certain standardized forms, known as Normal Forms
simplify the CFG’s structure – less complexity
make the CFGs easier to analyze or use in algorithms.
Others:
Popular Normal Kuroda Normal Form
Extended Chomsky Normal Form
Forms of CFG
Each normal form imposes specific restrictions on the production rules of the grammar.
Every CFG that does not generate the empty string can be transformed into an
equivalent CNF or GNF.
"Equivalent" here means that the two grammars generate the same language.
Chomsky Normal Forms
Goal
To show that every CFL (without ε) is generated by a CFG in which all productions are of the form:
A BC or
A a,
where A, B, C are variables, and a is a terminal such that
The RHS consists only of variables
The RHS is of length 2
S does not occur in the RHS
Simplification of CFG
1.The elimination of useless symbols, variables, or terminals that
do not appear in any derivation of a terminal string from the start
symbol.
2.The elimination of ε-productions, those of the form A ε for
some variable A.
3.The elimination of unit productions, those of the form A B for
variables A and B.
Note: The eliminations are applied in a fixed order. Applying the eliminations in a different order may
result in a grammar not having all the desired properties.
Chomsky Normal Forms
Theorem: If G is a CFG which generates a language that consists of at least one string
along with ε, then there is another CFG G1 such that:
L{G1} = L{G} – {ε} , “no ε-productions”,
and G1 has neither unit productions
nor useless symbols
Proposition
For any non-empty context-free language L, there is a grammar G, such that L(G ) = L and each rule in
G follows any of the forms as shown below:
1. S → where S is the start symbol (iff ∈ L)
2. A non-terminal generating a terminal A → a where a ∈ Σ if |a|=1,
(if |a|=0, A must be the start symbol)
3. A non-terminal generating two non-terminals A → BC, where neither B nor C is the start
symbol, (Start symbol cannot be in the RHS)
Also, G doesn’t contain any useless symbols.
Example:
1. G1 = {S → AB, S → c, A → a, B → b}
2. G2 = {S → aA, A → a, B → c}
The production rules of Grammar G1 satisfy the rules specified for CNF, so the grammar G1 is in
CNF.
The production rule of Grammar G2 does not satisfy the rules specified for CNF as S → aZ contains
a terminal followed by a non-terminal. So, the grammar G2 is not in CNF.
CFG to CNF
Algorithm:
Step 1: Eliminate the start symbol from the RHS.
If the start symbol S is at the right-hand side of any production, create a new production
as:
S1 → S
where S1 is the new start symbol.
Step 2: In the grammar, remove the null, unit and useless productions. [Strictly follow
the order of eliminations] [Refer to Module 4 ppt]
Step 3: Eliminate terminals from the RHS of the production if they exist with other
non-terminals or terminals.
For example, production S → aA can be decomposed as:
S → RA
R→a
Step 1: We will create a new production S1 → S, as the start symbol S appears on the RHS. The
grammar will be:
S1->S
S → ASB
A → aAS|a|ε
B → SbS|A|bb
Step 2: As grammar G1 contains A → ε null production, its removal from the grammar yields:
S1->S
S → ASB|SB
A → aAS|aS|a
B → SbS| A|ε|bb
Now, it creates null production B→ ε, its removal from the grammar yields:
S1->S
S → AS|ASB| SB| S
A → aAS|aS|a
B → SbS| A|bb
CFG to CNF
Example 1: Production Rules of the CFG are :
S → ASB
A → aAS|a|ε
B → SbS|A|bb
Step 2: Now, we have unit production B->A. Its removal from the grammar yields:
S1->S
S → AS|ASB| SB| S
A → aAS|aS|a
B → SbS|bb|aAS|aS|a
Also, remove the unit production S1 → S. Its removal from the grammar yields:
S1-> AS|ASB| SB| S
S → AS|ASB| SB| S
A → aAS|aS|a
B → SbS|bb|aAS|aS|a
Also, removal of unit production S->S and S1->S from the grammar yields:
S1-> AS|ASB| SB
S → AS|ASB| SB
A → aAS|aS|a
B → SbS|bb|aAS|aS|a
Step 3: In production rule A->aAS |aS and B-> SbS|aAS|aS, terminals a and b exist on RHS
with non-terminals. Removing them from the RHS yields:
S1-> AS|ASB| SB
S → AS|ASB| SB
A → XAS|XS|a
B → SYS|bb|XAS|XS|a
X →a
Y→b
Step 4: Similarly, B->SYS has more than two Similarly, B->XAX has more than two
symbols, removing it from the grammar symbols, removing it from the grammar
yields: yields:
S1 -> AS|PB| SB S1-> AS|PB| SB
S → AS|QB| SB S → AS|QB| SB
A → RS|XS|a A → RS|XS|a
B → TS|VV|XAS|XS|a B → TS|VV|US|XS|a
X→a X→a
Y→b Y→b
V→b V→b
P → AS P → AS
Q → AS Q → AS
R → XA R → XA
T → SY T → SY
U → XA
This is the required CNF
CFG to CNF
Example 2: Production Rules of the CFG are : Step 1: Eliminate the start symbol from the RHS.
Step 2: In the grammar, remove the null, unit and useless
S → a | aA | B productions.
A → aBB | ε Step 3: Eliminate terminals from the RHS of the
production if they exist with other non-terminals or
B → Aa | b terminals.
Step 4: Eliminate RHS with more than two non-terminals.
Step 4: In the production rule A → XBB, the RHS has more than two symbols, removing it from the
grammar yields:
S → a | XA | AX | b
A → RB
B → AX | b | a
X→a
R → XB
For example:
1.G1 = {S → aAB | aB, A → aA| a, B → bB | b}
2.G2 = {S → aAB | aB, A → aA | ε, B → bB | ε}
The production rules of Grammar G1 satisfy the rules specified for GNF, so the grammar G1 is in
GNF.
The production rule of Grammar G2 does not satisfy the rules specified for GNF as A → ε and B →
ε contains ε(only start symbol can generate ε). So the grammar G2 is not in GNF.
For a given grammar, there can be more than one GNF.
Every CFG can be converted to an equivalent grammar in GNF. GNF produces the same language as
generated by CFG.
Greibach Normal Form
Importance of GNF
1. Simplified Grammars: GNF transforms CFGs into a simpler form where all rules
start with a terminal symbol, followed by zero or more non-terminal symbols.
2. Efficient Parsing: This structure makes it easier to build top-down parsers, which
analyze the input string from the start symbol, making it suitable for compiler design
and other parsing algorithms.
3. Real-time PDA: The conversion to GNF is crucial in proving that every CFL can be
recognized by a real-time PDA, a specific type of automaton that reads an input
symbol for every transition.
4. Elimination of Left Recursion: GNF, by its nature, eliminates left recursion, a
common problem encountered in parsing.
5. Understanding Language Structure: By standardizing the grammar structure, GNF
helps in understanding the language's characteristics and facilitates analysis of the
language's properties.
6. Foundation for Compiler Design: GNF provides a solid foundation for compiler
design by facilitating the development of efficient parsing algorithms, ensuring easier
analysis and optimization of compilers.
Greibach Normal Form
To transform a CFG to GNF, we have to eliminate left recursion
Original Formula:
We can eliminate left recursion by replacing the pair of productions with-
A → βZ
Z→ αZ/ ∈
(Right Recursive Grammar)
Step 1: Eliminate null productions, unit productions and useless symbols from the grammar G and then
construct a G0 = (V0, T, R0, S) in CNF generating the language L(G0) = L(G) − {ε}
Step 5: Modify the Ai → Ajγ for i = n−1, n−2, ., 1 in the desired form and at the same time change the Z
production rules.
Greibach Normal Form
Example 1: Convert the following grammar G into GNF.
S → XA|BB
B → b|SB
X→b
A→a
1. Rewrite G in CNF
It is already in CNF
3. Identify all productions which do not conform to any of the types listed below:
Ai → Aj such that j > i
Zi → Aj such that j ≤ n
Ai → such that ∈ V ∗ and ∈ T
A4 → A1A4|b.
The above two productions still do not conform to any of the types in step 3
Substituting for A2 → b
A4 → bA3A4|A4A4A4|b
Greibach Normal Form
Example 1: Convert the following grammar G into GNF.
S → XA|BB
B → b|SB
X→b
A→a
Final GNF:
A1 → b A3 A2 A1 A3 | a A1 A3| b A3 A2 B3 A1 A3 | a B3 A1 A3 | b A3
A2 → b A3 A2 A1 | a A1 | b A3 A2 B3 A1 | a B3 A1 | b
A3 → b A3 A2 | a | b A3 A2 B3 | a B3
B3 → b A3 A2 A1 A3 A3 A2 | a A1 A3 A3 A2 | b A3 A2 B3 A1 A3 A3 A2 | a B3 A1 A3 A3
A2 | b A3 A3 A2 | b A3 A2 A1 A3 A3 A2 B3 | a A1 A3 A3 A2 B3 | b A3 A2 B3 A1 A3
A3 A2 B3 | a B3 A1 A3 A3 A2 B3 | b A3 A3 A2 B3
Greibach Normal Form
Algorithm:
Example 3: Convert the following grammar G into GNF.
S → AA | 0 Step 1: Eliminate null productions, unit productions and useless symbols
from the grammar G and then construct a G0 = (V0, T, R0, S) in CNF
A → SS | 1 generating the language L(G0) = L(G) − {ε}
Step 5: Modify the Ai → Ajγ for i = n−1, n−2, ., 1 in the desired form and at
Example 5: Convert the following grammar G into GNF. the same time change the Z production rules.
S→SS|(S)|a
Try Yourself!!!
Context-Free Languages- Properties
Closure Properties
Closed Under: Not Closed Under:
1. Union Operation 1. Intersection
2. Concatenation 2. Complement
3. Kleene closure 3. Subset
4. Reversal operation 4. Superset
5. Homomorphism 5. Infinite Union
6. Inverse Homomorphism 6. Difference, Symmetric difference (xor,
7. Substitution Nand, nor or any other operation which
8. init or prefix operation gets reduced to intersection and
9. Quotient with regular language complement
10. Cycle operation
11. Union with regular language
12. Intersection with regular language
13. Difference with regular language
Decision Properties:
1. Test for Membership: Decidable. The rest of the decision properties
2. Test for Emptiness: Decidable as compared to a RL, are
undecidable in CFL
3. Test for finiteness: Decidable
28
Context-Free Languages- Properties
Closure Properties
Suppose = (, ) and = (, ).
Example: For we have
→ab|ε
For we have
→c d|ε
Then L( ) = { : n ≥ 0}. Also, L( ) = { : n ≥ 0}.
Union:
Context-Free Languages- Properties
Closure Properties
Suppose = (, ) and = (, ).
Example: For we have
→ab|ε
For we have
→c d|ε
Then L( ) = { : n ≥ 0}. Also, L( ) = { : n ≥ 0}.
Concatenation:
Context-Free Languages- Properties
Closure Properties
Suppose = (, ) and = (, ).
Example: For we have
→ab|ε
For we have
→c d|ε
Then L( ) = { : n ≥ 0}. Also, L( ) = { : n ≥ 0}.
Kleen Star:
Context-Free Languages- Properties
Closure Properties
Suppose = (, ) and = (, ).
Example: For we have
→ab|ε
For we have
→c d|ε
Then L( ) = { : n ≥ 0}. Also, L( ) = { : n ≥ 0}.
1. Emptiness Problem
Check whether the CFG can generate a language or not
OR
Check whether the given CFG can generate strings or not
If the Grammar cannot derive or produce any string from it, then that Grammar is said to be an
Empty Grammar.
Procedure:
1. Simplify the CFG.
2. If you find the Start symbol in the set of useless symbols, then that Grammar is empty.
3. If you cannot find the start symbol in the set of useless symbols, try to generate any of the strings
from that Grammar after removing all useless symbols.
4. If it can generate a string, then that Grammar is non-empty; otherwise, it is said to be an empty
grammar.
Context-Free Languages- Properties
Decision Properties
1. Emptiness Problem
Example 1: Check whether the given CFG is empty or not.
S → AB | a
A→ a
B → bB
C→ a
Solution : S → AB
S → aB
We can conclude here that B is a useless symbol, and from the starting symbol, A is arriving with B,
and B is not reaching any terminal; hence, symbol A is also a useless symbol.
S → a
It is only a useful symbol.
Here we see that symbol S is a starting symbol and does not belong to the set of useless symbols;
hence, this Grammar is non-empty.
Context-Free Languages- Properties
Decision Properties
2. Finiteness Problem
Procedure:
1. Convert the Grammar into CNF.
2. After converting the Grammar into the CNF form then draw the CNF graph.
3. Make all Non-terminals or variables independent nodes of the graph.
4. After making nodes, then make the edges from the nodes that are directed towards
another node.
5. Please do not repeat the edges once you have marked them in the graph.
6. After constructing the whole graph, then check whether the cycle is present in the
graph or not.
7. If there is any cyclic-like structure in the graph, then the language generated by that
Grammar is not finite.
Context-Free Languages- Properties
Decision Properties
2. Finiteness Problem
Example: Check whether the given Grammar is finite or not?
S → AB/ a
A → BC / a
B → CC / b
C→a
Solution:
There is no epsilon ( Є ) and unit productions. All the Non-terminals or variables are present in the
above CFG is useful, not a single variable is useless. The given grammar is in CNF.
Now, drawing the CNF Graph by converting the non-terminals or variables into nodes and deriving
arrows behaves as an edge of the graph.
NO LOOP or CYCLES!!
It is a finite CFG.
Context-Free Languages- Properties
Decision Properties
3. Membership Problem
Check whether a given string of any CFG is a member of the grammar or not.
After applying the CYK Algorithm, match the last field of the table with the CNF form
of the given CFG. Find whether one of the variables or non-terminals from the obtained
set is the Grammar's start symbol.
If one of the Variables is the starting symbol, then conclude that the given string is a
member of the given CFG. Otherwise, the given string is not a member of the given
CFG.
Context-Free Languages- Properties
Decision Properties
c. Membership Problem
CYK Algorithm
1. The CYK Algorithm is a bottom-up parsing algorithm.
2. As the height of the table increases, the number of productions increases.
3. For the nth row, we are required to apply n-1 productions.
Procedure:
1. Check the length of a given string.
2. Construct the table using that length; let say the length of the string is ‘ n ’.
3. Make ' n ' number of columns and ' n ' number of rows in the table, but consider one thing that
height of the table is ' n ' for the first column, height will be ' n-1 ' for the second column, and so
on, in the last column, the height of the table will be one.
4. After constructing the outlines of the table, then write the corresponding terminals of the strings
on the top of the table.
5. Start from the first row and check the string's first terminal in the CFG of CNF form.
6. If you find that terminal in the Grammar, check its corresponding variable present on the left-hand
side. Mark the variables in the first field of the first row.
Context-Free Languages- Properties
Decision Properties
c. Membership Problem
Procedure:
7. If you find that, two Variables contain the same terminal. Now, mark the whole as a set in that field.
8. Fill the first row in the same way as mentioned above.
9. For the first field of the second row, multiply the two fields, i.e., the first field is just above the
current field, and the second field is adjacent to the first field.
10. Do not change the order of variables after multiplication. Check each multiplied value in the CFG
of CNF form and find the variables present in the Grammar.
11. Then, mark all the Corresponding variables from the left-hand side in CFG of CNF form in the
specific field of the second row.
12. Similarly, in the way mentioned above, fill all the fields of the second row in the table.
13. You need to multiply two times for the first field of the third row because two levels are increased.
14. For first multiplication, multiply the first field elements of the first row with the second field
elements of the second row.
15. For second multiplication, multiply the elements of the first field of the second row with the
element of the second field of the first row, moving in a diagonal direction.
16. After that, match all the non-terminals with the CNF form of CFG and if you find that any of the
elements matches, then write the corresponding variables in the specified field.
17. Similarly, in this way, we will fill the second third.
Context-Free Languages- Properties
Decision Properties
c. Membership Problem
Procedure:
18. For second multiplication, multiply the elements of the first field of the second row with the
element of the second field of the first row, moving in a diagonal direction.
19. After that, match all the non-terminals with the CNF form of CFG and if you find that any of the
elements matches, then write the corresponding variables in the specified field.
20. Similarly, in this way, we will fill the second third.
21. As you move downwards, as the level increase, you have to fill all the fields in the same way.
22. As shown in the figure, fill all the fields of the respective rows.
23. In the circles, you need to place the respective results after the multiplication of corresponding
fields
Context-Free Languages- Properties
Decision Properties
c. Membership Problem
Example 1: Verify whether the string ' bab ' is a member of the given CFG or not?
S → AB / a
A → BC / a
B → CC / b
C→a
The grammar is already is in the CNF. Now, applying the CYK Algorithm –
For Row 1 –
• Now for first field x11, finding ' b ' in the CNF form of given CFG. We can find the ‘ b ' on the right-hand side
with the corresponding variable ' B ' present on the left-hand side.
• Now similarly for second field x12, finding ' a ' in the above CFG. We can find the ' a ' on the right-hand side
with corresponding variables ' S ', ' A ' and ' C ' present on the left-hand side.
• And same for third field x13, finding ' b 'in the above CFG. Similarly, as the first field, we were able to find the
corresponding Variable, i.e., ‘ B ’.
For Row 2-
• For the first field, i.e., x21, we need to multiply the just above field, i.e., the first field of the first row ( x11 ),
with the adjacent field, i.e., the second field of the first row
• { B } x { S , A , C } results out as –
• { BS, BA, BC }
• Now finding the above set of elements in the above Grammar, we observe that only BC matches the
corresponding variable ‘ A ’ on the left-hand side.
• Similarly, for the second field, i.e., x22, we need to multiply the just above field, i.e., the second field of the
first row ( x12 ), with the adjacent field, i.e., the third field of the first row ( x13 ).
• { S , A , C } x { B } results out as –{ SB, AB, CB }
• Now finding the above set of elements in the above Grammar, we observe that only AB matches the
corresponding variable ‘ S ’ on the left-hand side.
Context-Free Languages- Properties
Decision Properties
c. Membership Problem
Example 1:
For Row 3 –
For the remaining field, i.e., x31, we need two productions, first is the multiplication of the first field of the first
row, i.e., x11, with the second field of just above the row of row 3, i.e., second row, i.e., x22. The second is the
multiplication of the first field of the second row, i.e., x21, with the third field of the first row, i.e., x13.
Two productions are as follows –
{ B } x { S } = { BS }
{ A } x { B } = { AB }
Now, finding the above set of elements in the above Grammar, we observe that only AB matches the corresponding
variable ‘ S ’ on the left-hand side.
Here we find that the variable present at the last field of the last row is the start symbol.
Hence the given string, i.e., ‘ bab ', is the member of the Given Context-free Grammar.
Pumping Lemma for CFL
The pumping lemma gives us a technique to show that certain languages are not context free.
• Just like we used the pumping lemma to show certain languages are not regular
• But the pumping lemma for CFL’s is a bit more complicated than the pumping lemma for
regular languages
The pumping lemma can be used to construct a refutation by contradiction that a specific
language is not context-free.
Conversely, the pumping lemma does not suffice to guarantee that a language is context-free;
there are other necessary conditions, such as Ogden's lemma, or the Interchange lemma.
Informally
• The pumping lemma for CFL’s states that for sufficiently long strings in a CFL, we can
find two, short, nearby substrings that we can “pump” in tandem and the resulting string
must also be in the language.
Game View
Game between Defender, who claims L satisfies the
pumping condition, and Challenger, who claims L
does not.
1. Assume L is context-free:
Start by assuming the language you want to prove is not context-free is context-free.
2. Find a suitable string:
Choose a string 'w' in L where |w| >= p (the pumping length).
3. Divide into uvwxy:
Try to find a way to divide 'w' into five parts (uvwxy) that satisfy the conditions of the
pumping lemma.
4. Show that pumping/unpumping fails:
Demonstrate that for any possible way to divide 'w', pumping or unpumping (changing
the number of times v and x are repeated) will always result in a string that is not in L.
5. Contradiction:
Since the pumping lemma conditions cannot be satisfied, the initial assumption that L is
context-free must be false. Therefore, L is not context-free.
Pumping Lemma for CFL
Example 1:
Let L be the language {| n ≥ 1 }. Show that this language is not a CFL.
• The string vwx cannot contain a’s, b’s, and c’s because the string is not large enough to span all
three symbols.
• Now “pump down” where i=0. This results in the string uwy and can no longer contain an equal
number of a’s, b’s, and c’s because the strings v and x contains at most two of these three symbols.
This language is similar to the previous one, except proving that it is not context free requires the
examination of more cases.
Note: As |vwx| <= n, vwx cannot contain both "a"s and "c".
Pumping Lemma for CFL
Example 2:
Show that the language L = {| 0 < i < j < k } is not a context-free language.
Case 1: vwx is entirely within the a’s (vwx = aᵖ)
Example: Let’s say vwx = a² ⇒ v = a, x = a Let z =
As |vwx| <= n, there are five possible
descriptions of uvwxy:
Then u = aⁿ⁻², w = ε, y = bⁿ⁺¹ cⁿ⁺² 1. vwx is for some p<=n, p>=1
2. vwx is for some p+q<=n, p+q>=1
Now pump i = 2: 3. vwx is for some p<=n, p>=1
Result: aⁿ⁺² bⁿ⁺¹ cⁿ⁺² 4. vwx is for some p+q<=n, p+q>=1
5. vwx is for some q<=n, q>=1
Now i = n+2, j = n+1, k = n+2 ⇒ i > j, which violates the condition i < j
Not in L
Case 2: vwx is in a’s and b’s (e.g., vwx = )
Suppose vwx = a¹b² ⇒ v = a, x = b
Then u = aⁿ bⁿ w = ε, y = cⁿ⁺¹
Pump i = 0:
Result: aⁿ bⁿ⁻¹ cⁿ⁺¹
⇒ i = n, j = n-1, k = n+1
⇒ i > j, violates i < j
Not in L
Pumping Lemma for CFL
Example 2:
Show that the language L = {| 0 < i < j < k } is not a context-free language.
In case 1, if i=2 we will be adding an a to the string, Let z =
making the number of "a"s n+1 and thus the string is As |vwx| <= n, there are five possible
not in the language. descriptions of uvwxy:
E.g. aabbbcccc 1. vwx is for some p<=n, p>=1
2. vwx is for some p+q<=n, p+q>=1
The same argument holds for case 3 in which the 3. vwx is for some p<=n, p>=1
4. vwx is for some p+q<=n, p+q>=1
number of "b"s will be equal to the number of "c"s. 5. vwx is for some q<=n, q>=1
e.g. aabbbcccc
A similar argument holds in case 5. In case 5 if i=0 In case 4, when i=0 either the number of
then the number of "c"s will be less than or equal to "b"s will be less than or equal to number of
the number of "b"s. "a"s or the number of "c"s will be less than
E.g. aabbbcccc or equal to the number of "b"s (depending
on the distribution of v and x).
In case 2, when i=2 either the number of "a"s will be
greater than the number of "b"s or the number of "b"s For all of these cases, u w y does not
will be greater than the number of "c"s (depending on belong to the language L.
the distribution of v and x). This is a contradiction to our assumption.
E.g. aabbbcccc So, our assumption is wrong.
L is not a CFL.
Pumping Lemma for CFL
Practice Questions:
Example3:
Show that the language L = {: i is a prime} is not a context-free language.
Example 4:
Is the language L = { : w is in {a,b}*} a context-free language? Prove or disprove your
answer.
Example 5:
Show that the language L = {|n} is not a context-free language.
By: Dr. Monali Bordoloi, Asst. Prof. Sr. Gra 56 02/07/2025 08:19 AM
de 1, VIT-AP