THE METALANGUAGE: LANGUAGE PROCESSING
5.2 FEATURES
as
.objects of the LWL type char; thus, the non-terminal vocabulary is made to correspond to unprintable characters of EBCDIC. These do
notover
Iap the codes for the four characters defined above or the char constants described in the section on List Processing in Chapter IV.
Turning to Chomsky's second question, we understand "structural description" by the description introducing functions in the previous chapter. The structural description is the phrase marker, representing both the sentence's syntactic analysis
_and the composition of functions required to compute its meaning.
Corresponding to Chomsky's third question, about "generative
grammar", we need to define our complete notion of "analytical grammar
11•The fundamentals of the rule statement appear above; in the succeeding
sections, we elaborate the definition of rules by introducing useful
extensions.
Current techniques in the syntactic description of programming languages often rely on the invention of new parts of speech to assure the unambiguous parsing of certain phrases.
guarantees that
cannot be interpreted as
{a+ b)*
cFor instance, PASCAL
by introducing parts of speech <factor>, <term>, <simple expression> and
<expression>, each of which may represent objects of the same type.
This practice clutters the connection between syntax and semantics.
A similar difficulty arises in BIBLI0. Just as &-Je introduced the category <q_subject> above to represent a list of subjects involved in a query, we need a category <q_publication>. As we defined BIBLIO, the manner of expressing <q_publ ication>s must be far more flexible than that for <q_subjects>. Indeed, the abi
Iity to handle queries
Iike
Works by Wegbreit or about parsing and C-relevant to extensible languages?
requires a reasonably complex syntax. We begin with
<query>::= <q_publication>
which is obvious. Since the above query must be treated as an expression, with the possibility of "operators" of different precedence, a grammar
Iike the fol lowing is plausible:
<q_factor>
<q_factor>
<q_factor>
.. -
.. -
.. -
.. - .. -
.. -
'by' <q_author>
'about ' <q_subject>
<rating> '-relevant
_to' <q_subject>
<q_term> ::= <q_factor> I <q_term>' and' <q_factor>
<q_express
·i on> : : = <q_term> I <q_express ion> ' or ' <q_term>
<q_publ ication> ::='works' <q_expression>
It is never good practice to hide knowledge that the programmer uses and depends on; yet, the above does
·exactly that, by failing to show that each o f the phrases a c tu a
I Iy r· e pres en ts a
Ii s t of pub
Ii ca t i on s • I t does violence to a notion of str;-ucturing that identifies syntactic
withsemantic structures.
Features provide a syntactic subcategorization of the categories introduced by the language writer. Each feature is a binary flag which qua I ifies the part of speech of a phrase. Features may be tested for presence or absence on phrases in the right hand side of a ru I e; they may be carried over, set, reset or reversed on phrases of the left hand side. Using three features to subcategorize <q_pub Ii cation>, conjuncted~ disjuncted and completed, we rewrite the above rules as
<q_pub Ii cation> : : = 'by ' <q_author> : (works_by)
<q_pub
Ii cation> : :
='about ' <q_pub I i cation> : (works_about)
<q_publ ication> ::= <rating> '-relevant to' <q_subject>:
(works _re I e van t }
<q_publ ication,+conjuncted> ::= <q_publ ication,-disjuncted- completed>' and' <q_publication,-disjuncted-conjuncted- completed> : (q_and)
<q_publ ication,+disjuncted> ::= <q_publication,-completed>'
or ' . <q_publ ication,-disjuncted-completed> : {q_or)<q_publ ication,+completed> ::=
'works '
<q_publ ication,-completed>(q_i dent i
ty)<query>::= <q_publ ic;ation,+completed> {format_publications)
Features to be carried over are represented by a constituent number in the
Ihs (e.g., +1 means to copy over a
11features now on the first rhs non-terminal phrase}. Checking for the presence of a feature and setting it are indicated by the+ sign {e.g., +compieted}. Checking for absence and resettfng are shown by the - sign, and reversing the setting of a feature in the lhs is shown by a*·
If the verbosity of such long descriptions is an objection, the
LWL definition mechanism may be used to provide a convenient shorthand.
For instance,
define q_term
=q_publ ication,+conjuncted-disjuncted-completed
With such definitions, a language writer committed to a syntax
Iike the first proposed above can write it in exactly that way; however, it is preferred to retain the visible correspondence between syntactic category and semantic data type that is posslbl·e through the use of
features.
The above example shows how features may be used to simulate a
precedence mechanism in the parser. However, features are also useful in
many other instances. In natural language processing, for exa~ple, they
have been used to record essentially semantic distinctions in the
syntax, to aid disambiguation by the grammar {e.g., the animate feature,
discussed in [Oostert 1972)}. Features are somewhat like the mechanism
Knuth describes as the semantics of context free languages [Knuth 1968].
·They are not as genera I, because their va I ues are restricted to one Boolean·, and their dependence is expressible only in
a
bottom-to-top direction.
Dalam dokumen
lment of the Requirements for the Degree of
(Halaman 99-103)