• Tidak ada hasil yang ditemukan

Chap 5 Context free Languages (3)

N/A
N/A
Protected

Academic year: 2019

Membagikan "Chap 5 Context free Languages (3)"

Copied!
40
0
0

Teks penuh

(1)

Chapter 5

Chapter 5

Context-free Languages

(2)

Context-free Grammars (CFG)

(1)

Definition

A grammar G = (V, T, S, P) is said to be

context-free

if all production rules in P have the

form

A

x

where

A

V

and

x

(V

T)*

A language is said to be context-free iff there

(3)

Context-free Grammars (CFG)

(2)

Context-free means that there is a single

variable on the left side of each grammar rule.

Example of rule where this condition does not

hold :

1Z1

101

(4)

Non-regular languages

There are non-regular languages that can be

generated by CFG.

The grammar G = ({S}, {a, b}, S, P), with production

rules:

S

aSa | bSb | λ

is context-free

This grammar is

linear

(at most a single variable on

RHS), but is

neither right-linear nor left-linear

, so

it is not

regular

.

Example

: The language {a

n

b

n

: n

0} is not regular, is

(5)

Example of a CFL :

Palindromes

Palindromes are strings which are spelled the same

way backwards and forwards. The language of

palindromes, PAL, is not regular

Given the grammar G = ({S}, {a, b}, S, P), with

production rules: S

aSa | bSb | λ

A typical derivation in this grammar might be:

S

aSa

aaSaa

aabSbaa

aabbaa

(6)

Regular vs. context-free

Are regular languages context-free ?

• Yes

, because

context-free

means that there is a

single variable on the LHS of each rule

. All

regular languages are generated by grammars that

have a single variable on the LHS of each grammar

rule

• But

, as we have seen, not all context-free

grammars are regular.

(7)

Derivation

Given the grammar,

S

aaSB | λ

B

bB | b

the string

aab

can be

derived

in different ways.

(8)

Parse tree

• The tree structure shows the rule that is applied to each non terminal, without showing the order of rule applications.

• Each internal node of the tree corresponds to a non terminal, and the leaves of the derivation tree represent the string of terminals.

S

a S B

a

b λ

Both derivations on the previous slide correspond

to the following

parse (or derivation) tree

.

S aaSB aaB aab

(9)

In the derivation

S

aaSB

aaB

aab

the first step was to replace S with λ, and then to

replace B with b.

we moved from left to right, replacing the

leftmost variable at each step.

this is called a

leftmost derivation

.

Similarly, the derivation

S

aaSB

aaSb

aab

is called a

rightmost derivation

.

(10)

Leftmost (rightmost) derivation

(2)

Definition

In a leftmost derivation, the leftmost nonterminal

is replaced at each step. In a rightmost derivation,

the rightmost nonterminal is replaced at each step.

Many derivations are neither leftmost nor rightmost.

(11)
(12)

Parse (derivation) trees

(2)

A

partial derivation

tree is one in which property

1

does

not necessarily hold

and in which property

2

is

replaced

by

:

Every leaf has a label from V

T

{λ}

The yield of the tree is the string of symbols in the order

they are encountered when the tree is traversed in a

depth-first manner

, always taking the

leftmost

unexplored branch.

A

partial derivation tree yields

a

sentential form

of the

grammar G that the tree is associated with.

(13)

Parse (derivation) trees

(3)

Theorem

Let G = (V, T, S, P) be a context-free grammar. Then for

every w

L(G) there exists a derivation tree of G whose

yield is w. Conversely, the yield of any derivation tree of

G is in L(G).

 If tG is any partial derivation tree for G whose root is labeled S, then the yield of tG is a sentential form of G.

 Any w L(G) has a leftmost and a rightmost derivation.

• The leftmost derivation is obtained by always expanding the leftmost variable in the derivation tree at each step

(14)

Ambiguity

A grammar is ambiguous if there is a string with two

possible parse trees.

A string has

more than one parse tree

if and only if it

has

more than one leftmost derivation

.

Example:

V

= {S}

T

= {+, *, (, ), 0, 1}

This parse corresponds to: compute

(15)

Example

Our string is still 0 * 0 + 1 V = {S} T = {+, *, (, ), 0, 1}

P = {S S + S | S * S | (S) | 1 | 0} • But there is another different parse tree that also generates the string 0 * 0 + 1. The derivation begins from S, the leftmost variable is S. we can replace it with : S + S or S * S or (S) or 1 or 0. Pick another one of these at random, say S * S

(16)

Equivalent grammars

Here is a non-ambiguous grammar that generates

the same language.

S

S + A | A

A

A * B | B

B

(S) | 1 | 0

Two grammars that generate the

same language

are

said to be

equivalent

.

To make parsing easier,

we prefer grammars

that

(17)

Ambiguous grammars &

equivalent grammars

There is no general algorithm for determining

whether a given CFG is ambiguous.

There is no general algorithm for determining

(18)

Dangling else

What value does x have at the end?

Ambiguous grammar

<statement> := IF < expression> THEN <statement> |

IF <expression> THEN <statement> ELSE <statement> |

(19)

Ambiguous grammars

Definition

If L is a context-free language for

(20)

Parsing

(1)

In practical applications, it is usually not enough

to decide whether a string belongs to a language.

It is also important to know how to derive the

string from the language.

(21)

Parsing

(2)

Let G be a context-free grammar for C++.

Let the string w be a C++ program.

One thing a compiler does - in particular, the part of the

compiler called the “parser” - is determine whether w is a

syntactically correct C++ program. It also constructs a

parse tree for the program that is used in code generation.

(22)

The Decision question for CFL’s

(1)

What if a string w belongs to L(G) generated by a CFG, can we always decide that it does belong to L(G)?

(23)

The Decision question for CFL’s

What we need to do is to restrict the kinds of rules in our CFG’s so that each rule, when it is applied, is guaranteed to either increase the length of the sentential form generated or to increase the number of terminals in the sentential form.

That means that we don’t want rules of the following two forms in our CFG’s:

A λ A B

(24)

The Decision question for CFL’s

(25)

The Decision question for CFL’s

Consider the grammar G = ({S}, {a, b}, S, P), where P is: S SS | aSb | bSa | ab |ba

Looking at the production rules, it is easy to see that the

length of the sentential form produced by the application of any rule grows by at least one symbol during each derivation step.

Thus, in |w| derivation steps, G will produce either produce a string of all terminals, which may be compared directly to w, or a sentential form too long to be capable of producing w.

(26)

The Decision question for CFL’s

Theorem :

Assume that G = (V, T, S, P) is a context-free

grammar with no rules of the form A

λ or

A

B, where A, B

V. Then the exhaustive

search parsing technique can be made into an

algorithm which, for any w

*

, either

(27)

The Decision question for CFL’s

Since we don’t know ahead of time which derivation sequences to try, we have to try all of the possible applications of rules which result in one of two conditions:

a string of all terminals of length |w|, or a sentential form of length |w| + 1.

The application of any one rule must result in either: replacing a variable with one or more terminals, or

increasing the length of a sentential form by one or more characters.

(28)

28

The Decision question for CFL’s

How many sentential forms will we have to examine?

Restricting ourselves to leftmost derivations, it is obvious that, with |P| production rules, applying each rule one time to S

gives us |P| sentential forms. Example:

Given the 5 production rules

S SS | aSb | bSa | ab |ba,

one round of leftmost derivations produces 5 sentential forms: S SS

S aSb S  bSa

S ab

(29)

The Decision question for CFL’s

The second round of leftmost derivations produces 15 sentential forms:

SS SSS SS aSbS SS bSaS SS abS SS baS

aSb aSSb aSb aaSbb aSb abSab aSb aabb aSb abab bSa bSSa bSa baSba bSa bbSaa bSa baba bSa bbaa

ab and ba don’t produce any new sentential forms, since they consist of all terminals. If they had contained variables, then the second

round of leftmost derivations would have produced 25, or |P|2

sentential forms.

(30)

The Decision question for CFL’s

We know from our worst case scenario that we never have to run through more than 2|w| rounds of rule applications in any one

derivation sequence before being able to stop the derivation.

Therefore, the total number of sentential forms that we may have to generate to decide whether string w belongs to L(G) generated by grammar G = (V, T, S, P) is

 |P| + |P|2 + ... + |P|2|w|

Unfortunately, this means that the work we might have to do to

(31)

The Decision question for CFL’s

It can be shown that some more efficient parsing

techniques for CFG’s exist.

Theorem 5.3:

For every context-free grammar there exists

an algorithm that parses any w

L(G) in a number of steps

proportional to |w|

3

.

Your textbook does not offer a proof for this theorem.

(32)

32

S-grammars

Definition 5.5:

A context-free grammar G = (V, T, S, P) is

said to be a simple grammar or s-grammar if all of its

productions are of the form

A

ax,

where A

V, a

T, x

V

*

, and any pair (A, a) occurs at

most once in P.

Example: The following grammar is an s-grammar:

S

aS | bSS | c

(33)

S-grammars

(34)

34

S-grammars

Let’s consider the grammar expressed by the

following production rules: S

aS | bSS | c

Since G is an s-grammar, all rules have the form

A

ax. Assume that w =

abcc.

Due to the restrictive condition that any pair (A,

a) may occur at most once in P, we know

immediately which production rule must have

generated the a in abcc – the rule S

aS.

(35)

Exercise

Let G be the grammar

S

abSc | A

A

cAd | cd

1) Give a derivation of ababccddcc.

(36)

Programming languages

• Programming languages are context-free, but

not regular

• Programming languages have the following

features that require infinite “stack memory”

– matching parentheses in algebraic expressions

(37)

Programming languages

• Programming languages are often defined

using a convention for specifying grammars

called Backus-Naur form, or BNF.

Example:

(38)

Programming languages

Backus-Naur form is very similar to the standard CFG

grammar form, but variables are listed within angular

brackets, ::= is used instead of

, and {X} is used to

mean 0 or more occurrences of X. The | is still used

to mean “or”.

Pascal’s if statement:

(39)

Programming languages

S-grammars are not sufficiently powerful to handle all

the syntactic features of a typical programming

language

LL grammars and LR grammars (see next chapter) are

normally used for specifying programming

languages. They are more complicated than

s-grammars, but still permit parsing in linear time.

(40)

Example of a Non-linear

Context-free Grammar

Consider the grammar G = ({S}, {a, b}, S, P),

with production rules:

S

aSa | SS | λ

This grammar is context-free. Why?

Referensi

Dokumen terkait

Sebuah Skripsi yang diajukan untuk memenuhi salah satu syarat memperoleh gelar Sarjana pada Fakultas Ilmu Pendidikan Departemen Pendidikan Khusus. © Elis

that discuss the concept of debt in Islam, legitimacy of debt in Islam, the condition of eligibility for zakah and other related literatures with regards to the indebtedness

So, financial managers of small size firms should avoid debt financing while for large and medium size firms, managers need to adjust their debt ratio to its optimal level.. Keywords:

Although her own memory is of &#34;doing no work whatsoever&#34; and instead she &#34;wore heavy eyeliner, listened to Rowling graduated from Exeter in 1986and moved to London

Pembuktian kualifikasi harus dihadiri oleh penanggung jawab atau yang menerima kuasa dari direktur utama/pimpinan, atau kepala cabang perusahaan yang diangkat oleh kantor pusat

Realistas di lembaga pendidikan, sering didengar banyak kata atau istilah untuk menggambarkan bagaimana bentuk dari kekerasan ini yang tentunya juga tidak

De acordo com o Artigo 6.3 da Lei do Fundo Petrolífero, a taxa de gestão paga à Autoridade Bancária e de Pagamentos é reconhecida como uma dedução a partir das receitas brutas do

Teknologi Mekanik, Mekanika Teknik dan Elemen Mesin 144 C3.. Teknik Pengelasan Oksi-Asetelin (OAW)