Chapter 5
Chapter 5
Context-free Languages
Context-free Grammars (CFG)
(1)Definition
A grammar G = (V, T, S, P) is said to be
context-free
if all production rules in P have the
form
A
x
where
A
V
and
x
(V
T)*
A language is said to be context-free iff there
Context-free Grammars (CFG)
(2)
Context-free means that there is a single
variable on the left side of each grammar rule.
Example of rule where this condition does not
hold :
1Z1
101
Non-regular languages
There are non-regular languages that can be
generated by CFG.
•
The grammar G = ({S}, {a, b}, S, P), with production
rules:
S
aSa | bSb | λ
is context-free
•
This grammar is
linear
(at most a single variable on
RHS), but is
neither right-linear nor left-linear
, so
it is not
regular
.
•
Example
: The language {a
nb
n: n
0} is not regular, is
Example of a CFL :
Palindromes
Palindromes are strings which are spelled the same
way backwards and forwards. The language of
palindromes, PAL, is not regular
Given the grammar G = ({S}, {a, b}, S, P), with
production rules: S
aSa | bSb | λ
A typical derivation in this grammar might be:
S
aSa
aaSaa
aabSbaa
aabbaa
Regular vs. context-free
Are regular languages context-free ?
• Yes
, because
context-free
means that there is a
single variable on the LHS of each rule
. All
regular languages are generated by grammars that
have a single variable on the LHS of each grammar
rule
• But
, as we have seen, not all context-free
grammars are regular.
Derivation
Given the grammar,
S
aaSB | λ
B
bB | b
the string
aab
can be
derived
in different ways.
Parse tree
• The tree structure shows the rule that is applied to each non terminal, without showing the order of rule applications.
• Each internal node of the tree corresponds to a non terminal, and the leaves of the derivation tree represent the string of terminals.
S
a S B
a
b λ
Both derivations on the previous slide correspond
to the following
parse (or derivation) tree
.
S aaSB aaB aab
In the derivation
S
aaSB
aaB
aab
•
the first step was to replace S with λ, and then to
replace B with b.
•
we moved from left to right, replacing the
leftmost variable at each step.
•
this is called a
leftmost derivation
.
Similarly, the derivation
S
aaSB
aaSb
aab
•
is called a
rightmost derivation
.
Leftmost (rightmost) derivation
(2)
Definition
In a leftmost derivation, the leftmost nonterminal
is replaced at each step. In a rightmost derivation,
the rightmost nonterminal is replaced at each step.
•
Many derivations are neither leftmost nor rightmost.
Parse (derivation) trees
(2)
A
partial derivation
tree is one in which property
1
does
not necessarily hold
and in which property
2
is
replaced
by
:
Every leaf has a label from V
T
{λ}
The yield of the tree is the string of symbols in the order
they are encountered when the tree is traversed in a
depth-first manner
, always taking the
leftmost
unexplored branch.
A
partial derivation tree yields
a
sentential form
of the
grammar G that the tree is associated with.
Parse (derivation) trees
(3)
Theorem
Let G = (V, T, S, P) be a context-free grammar. Then for
every w
L(G) there exists a derivation tree of G whose
yield is w. Conversely, the yield of any derivation tree of
G is in L(G).
If tG is any partial derivation tree for G whose root is labeled S, then the yield of tG is a sentential form of G.
Any w L(G) has a leftmost and a rightmost derivation.
• The leftmost derivation is obtained by always expanding the leftmost variable in the derivation tree at each step
Ambiguity
A grammar is ambiguous if there is a string with two
possible parse trees.
A string has
more than one parse tree
if and only if it
has
more than one leftmost derivation
.
Example:
V
= {S}
T
= {+, *, (, ), 0, 1}
• This parse corresponds to: compute
Example
Our string is still 0 * 0 + 1 V = {S} T = {+, *, (, ), 0, 1}
P = {S S + S | S * S | (S) | 1 | 0} • But there is another different parse tree that also generates the string 0 * 0 + 1. The derivation begins from S, the leftmost variable is S. we can replace it with : S + S or S * S or (S) or 1 or 0. Pick another one of these at random, say S * S
Equivalent grammars
Here is a non-ambiguous grammar that generates
the same language.
S
S + A | A
A
A * B | B
B
(S) | 1 | 0
Two grammars that generate the
same language
are
said to be
equivalent
.
To make parsing easier,
we prefer grammars
that
Ambiguous grammars &
equivalent grammars
There is no general algorithm for determining
whether a given CFG is ambiguous.
There is no general algorithm for determining
Dangling else
What value does x have at the end?
Ambiguous grammar
<statement> := IF < expression> THEN <statement> |
IF <expression> THEN <statement> ELSE <statement> |
Ambiguous grammars
Definition
If L is a context-free language for
Parsing
(1)
In practical applications, it is usually not enough
to decide whether a string belongs to a language.
It is also important to know how to derive the
string from the language.
Parsing
(2)
Let G be a context-free grammar for C++.
Let the string w be a C++ program.
One thing a compiler does - in particular, the part of the
compiler called the “parser” - is determine whether w is a
syntactically correct C++ program. It also constructs a
parse tree for the program that is used in code generation.
The Decision question for CFL’s
(1) What if a string w belongs to L(G) generated by a CFG, can we always decide that it does belong to L(G)?
The Decision question for CFL’s
What we need to do is to restrict the kinds of rules in our CFG’s so that each rule, when it is applied, is guaranteed to either increase the length of the sentential form generated or to increase the number of terminals in the sentential form.
That means that we don’t want rules of the following two forms in our CFG’s:
A λ A B
The Decision question for CFL’s
The Decision question for CFL’s
Consider the grammar G = ({S}, {a, b}, S, P), where P is: S SS | aSb | bSa | ab |ba
Looking at the production rules, it is easy to see that the
length of the sentential form produced by the application of any rule grows by at least one symbol during each derivation step.
Thus, in |w| derivation steps, G will produce either produce a string of all terminals, which may be compared directly to w, or a sentential form too long to be capable of producing w.
The Decision question for CFL’s
Theorem :
Assume that G = (V, T, S, P) is a context-free
grammar with no rules of the form A
λ or
A
B, where A, B
V. Then the exhaustive
search parsing technique can be made into an
algorithm which, for any w
*, either
The Decision question for CFL’s
Since we don’t know ahead of time which derivation sequences to try, we have to try all of the possible applications of rules which result in one of two conditions:
a string of all terminals of length |w|, or a sentential form of length |w| + 1.
The application of any one rule must result in either: replacing a variable with one or more terminals, or
increasing the length of a sentential form by one or more characters.
28
The Decision question for CFL’s
How many sentential forms will we have to examine?
Restricting ourselves to leftmost derivations, it is obvious that, with |P| production rules, applying each rule one time to S
gives us |P| sentential forms. Example:
Given the 5 production rules
S SS | aSb | bSa | ab |ba,
one round of leftmost derivations produces 5 sentential forms: S SS
S aSb S bSa
S ab
The Decision question for CFL’s
The second round of leftmost derivations produces 15 sentential forms:
SS SSS SS aSbS SS bSaS SS abS SS baS
aSb aSSb aSb aaSbb aSb abSab aSb aabb aSb abab bSa bSSa bSa baSba bSa bbSaa bSa baba bSa bbaa
ab and ba don’t produce any new sentential forms, since they consist of all terminals. If they had contained variables, then the second
round of leftmost derivations would have produced 25, or |P|2
sentential forms.
The Decision question for CFL’s
We know from our worst case scenario that we never have to run through more than 2|w| rounds of rule applications in any one
derivation sequence before being able to stop the derivation.
Therefore, the total number of sentential forms that we may have to generate to decide whether string w belongs to L(G) generated by grammar G = (V, T, S, P) is
|P| + |P|2 + ... + |P|2|w|
Unfortunately, this means that the work we might have to do to
The Decision question for CFL’s
It can be shown that some more efficient parsing
techniques for CFG’s exist.
Theorem 5.3:
For every context-free grammar there exists
an algorithm that parses any w
L(G) in a number of steps
proportional to |w|
3.
Your textbook does not offer a proof for this theorem.
32
S-grammars
Definition 5.5:
A context-free grammar G = (V, T, S, P) is
said to be a simple grammar or s-grammar if all of its
productions are of the form
A
ax,
where A
V, a
T, x
V
*, and any pair (A, a) occurs at
most once in P.
Example: The following grammar is an s-grammar:
S
aS | bSS | c
S-grammars
34