• Tidak ada hasil yang ditemukan

Compilers: Principles, Techniques, and Tools

N/A
N/A
Nguyễn Gia Hào

Academic year: 2023

Membagikan "Compilers: Principles, Techniques, and Tools"

Copied!
1035
0
0

Teks penuh

It will take at least two quarters or even two semesters to cover all or most of the material in this book. We include some observations on the relationship between compiler design and computer logistics theory and an overview of the applications of compiler technology beyond compilation.

Language Processors

Exercises for Section 1.1

The Structure of a Compiler

  • Lexical Analysis
  • Syntax Analysis
  • Semantic Analysis
  • Intermediate Code Generation
  • Code Optimization
  • Code Generation
  • Symbol-Table Management
  • The Grouping of Phases into Passes
  • Compiler-Construction Tools

The syntax tree for the token stream (1.2) is shown as the result of the parser in the figure. The code generator takes as input an intermediate representation of the source program and maps it to the target language.

Figure 1.6: Phases of a compiler entire source program, is used by all phases of the compiler.
Figure 1.6: Phases of a compiler entire source program, is used by all phases of the compiler.

The Evolution of Programming Languages

The Move to Higher-level Languages

The first step towards more human-friendly programming languages ​​was the development of mnemonic assembly language in the early 1950s. Functional languages ​​like ML and Haskell and constraint logic languages ​​like Prolog are often considered declarative languages.

Impacts on Compilers

Programs written in scripting languages ​​are often much shorter than equivalent programs written in languages ​​such as C.

Exercises for Section 1.3

Awk, JavaScript, Perl, PHP, Python, Ruby and Tcl are popular examples of scripting languages.

The Science of Building a Compiler

Modeling in Compiler Design and Implementation

The Science of Code Optimization

One of the most important skills in compiler design is the ability to formulate the right problem to solve. Finally, a compiler is a complex system; we need to keep the system simple to ensure that the engineering and maintenance costs of the compiler are manageable.

Applications of Compiler Technology

Implementation of High-Level Programming LanguagesLanguages

Example 1.2: Theregister keyword in the C programming language is an early example of the interplay between compiler technology and language evolution. Thus, compiler optimizations must be able to perform well beyond the procedural boundaries of the source program.

Optimizations for Computer Architectures

A system's performance is often limited not by processor speed, but by the performance of the memory subsystem. It is possible to improve the effectiveness of the memory hierarchy by changing the layout of the data or by changing the order of the instructions that access the data.

Design of New Computer Architectures

RISC

Program Translations

In particular, due to the dominance of the x86 personal computer market, most software titles are available as x86 code. Composite simulation is used in many modern tools that simulate designs written in Verilog or VHDL.

Software Productivity Tools

It can be used to catch errors, for example, where an operation is applied to the wrong type of object, or if parameters passed to a procedure do not match the procedure's signature. An attacker can manipulate the input data causing the program to misbehave and compromise the security of the system.

Programming Language Basics

  • The Static/Dynamic Distinction
  • Environments and States
  • Static Scope and Block Structure
  • Explicit Access Control
  • Dynamic Scope
  • Parameter Passing Mechanisms
  • Aliasing
  • Exercises for Section 1.6

The value is placed in the location belonging to the corresponding formal parameter of the called procedure. In call-by-reference, the address of the actual parameter is passed to the caller as the value of the corresponding formal parameter.

Figure 1.8: Two-stage mapping from names to values
Figure 1.8: Two-stage mapping from names to values

Summary of Chapter 1

Parameters are passed from a calling procedure to the caller either by value or by reference. When parameters are (actively) passed by reference, two formal parameters can refer to the same object.

This chapter is an introduction to the translation techniques in Chapters 3 through 6 of this book. He illustrates the techniques by developing a working Java program that translates representative statements of the programming language into a three-letter code, an intermediate representation.

Introduction

One form, called abstract syntax trees or simply syntax trees, represents the hierarchical syntactic structure of the source program. The left child of the root represents the body of the loop, which only comes from the command i = i + 1; exist.

Figure 2.3: A model of a compiler front end
Figure 2.3: A model of a compiler front end

Syntax Denition

  • Denition of Grammars
  • Derivations
  • Parse Trees
  • Ambiguity
  • Associativity of Operators
  • Precedence of Operators
  • Exercises for Section 2.2

We say that a production is for a nonterminal if the nonterminal is the head of the production. Example 2.2: The language defined by the grammar of Example 2.1 consists of lists of digits separated by plus and minus signs. The ten productions for the nonterminal digit allow it to stand for any of the terminals 0;1;::: ;9.

The left child of the root is similar to the root, with the child denoted by z- instead of +. Another definition of a language generated by a grammar is a set of strings that can be produced by some parse tree. An expression (which is also not a factor) is an expression that can be broken by the highest-precedence operators: * and /, but not by the lower-precedence operators.

Figure 2.5: Parse tree for 9-5+2 according to the grammar in Example 2.1 tree. In Fig
Figure 2.5: Parse tree for 9-5+2 according to the grammar in Example 2.1 tree. In Fig

Syntax-Directed Translation

  • Postx Notation
  • Synthesized Attributes
  • Simple Syntax-Directed Denitions
  • Tree Traversals
  • Translation Schemes
  • Exercises for Section 2.3

With each output, a set of semantic rules for computing attribute values ​​associated with the symbols appearing in the output. The output expr!expr1+term outputs an expression containing a plus operator.3 The left operand of the plus operator is given by expr1 and the right operand by term. A tree traversal starts at the root and visits each node of the tree in a given order.

Again, the subscript in rest1 distinguishes this nonterminal rest instance in the body of the output from the rest instance at the top of the output. For example, part of the parse tree for the above production and action is shown in Fig. In a post-order traversal, we first perform all operations on the leftmost subtree of the root, for the left operand, also named expr as the root .

Figure 2.9: Attribute values at nodes in a parse tree
Figure 2.9: Attribute values at nodes in a parse tree

Parsing

  • Top-Down Parsing
  • Predictive Parsing
  • When to Use -Productions
  • Designing a Predictive Parser
  • Left Recursion
  • Exercises for Section 2.4

At nodeN, labeled nonterminalA, select one of the productions for A and construct children at N for the symbols in the production body. For some grammars, the above steps can be implemented during a single scan from left to right of the input string. Initially, the terminalfor is the lookahead symbol, and the familiar part of the parse tree consists of the root, labeled with the starting non-terminal stmt in Fig.

The goal is to construct the rest of the parse tree so that the string generated by the parse tree matches the input string. 2.18(c) the arrow in the parse tree has passed to the next child of the root, and the arrow. Otherwise, if it appears at the start of production, it will be copied just before the code for the production instance.

Figure 2.16: A grammar for some statements in C and Java
Figure 2.16: A grammar for some statements in C and Java

A Translator for Simple Expressions

  • Abstract and Concrete Syntax
  • Adapting the Translation Scheme
  • Procedures for the Nonterminals
  • Simplifying the Translator
  • The Complete Program

Right-recursive productions lead to trees that grow down to the right, as in Fig. For example, subexpressions of the addition operator are given by expr and term in the production body expr+term. Since each of these productions generates a digit and prints it, the same code in Fig.

Before showing a complete program, we need to make two simplified transformations to the code in Fig. When the last statement executed in a procedure body is a recursive call to the same procedure, the call is said to be tail-recursive. Those unfamiliar with Java may find the following notes about Java useful in reading the code in Fig.

Figure 2.21: Actions for translating into postx notation
Figure 2.21: Actions for translating into postx notation

Lexical Analysis

  • Removal of White Space and Comments
  • Reading Ahead
  • Constants
  • Recognizing Keywords and Identiers
  • A Lexical Analyzer
  • Exercises for Section 2.6

Otherwise, the > operator itself produces \greater than", and the lexical analyzer has read one character too many. The lexical analyzer in this section reads a character while collecting digits for numbers or characters for identifiers; for example, it reads past1 to distinguish between 1 and 10 , and it reads past to distinguish between and true In such cases peek is set to an empty field which will be skipped when the lexical analyzer is called to find the next token.

The immutable assertion in this section is that when the lexical analyzer returns a token, variable peek either holds the character after the lexeme for the current token or contains a blank. When a sequence of digits appears in the input stream, the lexical analyzer passes a token to the parser consisting of the final number along with an attribute with integer values ​​computed from the digits. When the lexical analyzer reads a string or lexeme that could form an identifier, it first checks whether the lexeme is in the string table.

Figure 2.33: Subclasses Num and Word of Token
Figure 2.33: Subclasses Num and Word of Token

Symbol Tables

  • Symbol Table Per Scope

With his knowledge of the syntactic structure of a program, a parser is often in a better position than the lexical parser to distinguish between different declarations of an identifier. We also see the uses of x andyin the outer block, with their types, as given by declarations of the outer block: integer and character, respectively. The term "scope of identify x" actually refers to the scope of a particular declaration of x.

If blocks can be nested, there can be multiple declarations of the same ident within one block. Symbol table implementations for blocks can take advantage of the most tightly nested rule. The most closely nested rule for blocks is that an identifier x is within the range of the most closely nested declaration vanx; that is, the declaration of x found by examining blocks inside out, starting with the block in which x appears.

  • The Use of Symbol Tables
  • Intermediate Code Generation
    • Two Kinds of Intermediate Representations
    • Construction of Syntax Trees
    • Static Checking

We don't need to go into all the fields of a symbol object, but we assume that there is a field type that indicates the type of the symbol. The retrieved record contains all necessary information about the identifier, such as the type of the identifier. By doing so, the parts of the syntax tree needed to construct the three-address code are available when needed, but disappear when no longer needed.

Each construct is represented by a node, with children for the semantically meaningful components of the construct. All nonterminals in the translation scheme have an attribute n, which is a node of the syntax tree. In general, the grouping of operators in the abstract syntax is based on the needs of later stages of the compiler.

Example 2.16: Figure 2.36 shows symbol tables for the pseudocode in Exam- Exam-ple 2.15
Example 2.16: Figure 2.36 shows symbol tables for the pseudocode in Exam- Exam-ple 2.15
  • Three-Address Code
  • Exercises for Section 2.8
  • Summary of Chapter 2
  • The Role of the Lexical Analyzer
    • Lexical Analysis Versus Parsing
    • Tokens, Patterns, and Lexemes
    • Attributes for Tokens
    • Lexical Errors
    • Exercises for Section 3.1
  • Input Buering
    • Buer Pairs
    • Sentinels
  • Specication of Tokens
    • Strings and Languages
    • Operations on Languages
    • Regular Expressions
    • Regular Denitions
    • Extensions of Regular Expressions
    • Exercises for Section 3.3
  • Recognition of Tokens
    • Transition Diagrams
    • Recognition of Reserved Words and Identiers
    • Completion of the Running Example
    • Architecture of a Transition-Diagram-Based Lexical AnalyzerAnalyzer
    • Exercises for Section 3.4
  • for ( i = 1; i m ; i ++) f
  • g return \no";
    • The Lexical-Analyzer Generator Lex
    • Finite Automata
    • From Regular Expressions to Automata
  • while ( c != eof ) f
    • Eciency of NFA Simulation
    • Construction of an NFA from a Regular Expression
    • Eciency of String-Processing Algorithms
    • Exercises for Section 3.7
    • Design of a Lexical-Analyzer Generator
    • Optimization of DFA-Based Pattern MatchersMatchers

This driver and the automaton specification are the core of the lexical analyzer. The normal strategy is to take the longest prex of the input that matches whatever pattern. Of course, the next character of the text string must be bebf(s)+1, otherwise we're still in trouble and have to consider an even shorter prex, which is bebf(f(s)).

Finally, we examine some patterns and rules in the middle part of the image. If the first letter is a consonant, move it to the end of the word and then add a day. This means that the time spent is proportional to the length of the input multiplied by the size (nodes and edges) of the traversal graph.

The rest of the lexical analyzer consists of components created by Lexitself from the Lex program. The important states of the NFA correspond directly to the positions in the regular expression containing symbols of the alphabet.

Figure 2.42: Code layout for if-statements
Figure 2.42: Code layout for if-statements

Gambar

Figure 1.5: A language-processing system
Figure 1.6: Phases of a compiler entire source program, is used by all phases of the compiler.
Figure 1.7: Translation of an assignment statement
Figure 1.10: Blocks in a C++ program
+7

Referensi

Dokumen terkait

20 year 2003 stated that, National Education functions to develop and form the character and civilization of a dignified nation in order to educate the nation's life, aims to develop