Customized Product Copies
2. Foundations
2.4. Software Maintenance and Evolution
2.4.6. Software Model Extraction
An important part for the reverse engineering is the extraction of software models. A software model provides the representation on a higher level of abstraction, as mentioned by Chikofsky and Cross [32, page 15]. Extracting a model representation of a software implementation in general is a very com-mon task and done by every compiler. However, these models are typically optimized for program compilation. In the context of this thesis, software models are used for the purpose of program comprehension and analysis.
2.4. Software Maintenance and Evolution
Design recovery Forward
engineering Requirements
(constraints, objectives, business rules)
Restructuring
Design
Restructuring
Implementation
Redocumentation, restructuring Forward
engineering
Reverse engineering Reverse
engineering
Design recovery
Reengineering (renovation)
Reengineering (renovation)
Figure 2.7.:Reengineering overview by Chikofsky and Cross [32]
Thus, the following subsections introduce software model extraction with a focus on models related to this purpose.
2.4.6.1. Parsing and Resolving
To analyze a software to its full extent, an extraction requires two phases:
parsing and resolving. These phases are the same if software models are extracted by a compiler or for program comprehension and analysis.
Parsing Parsing extracts software elements from a textual representation.
First, a lexer is used to identify sequences of characters that form a lexical unit representing a token defined in a grammar. Afterwards, a parser is applied to build the actual elements of a software model. At this point in process, containment relationships between these elements can be detected if they are defined in the grammar. Thus, the parsing phase provides a model of the software elements represented in the source code.
Resolving The resolving is done when the parsing is finished. Resolving is the process of identifying references between software elements other than containment relationships defined in the grammar (i.e., cross refer-ences). Resolving such references typically requires to evaluate the scope of an element and thus technology-specific logic. For example, in the Java programming language, resolving the reference to a variable requires to take the current context of the variable identifier, such as a method body or conditional statement, into account. Accordingly, resolving requires more processing effort than the parsing before.
Partial Program Analysis To cope with the processing effort, Dagenais and Hendren [39] proposed Partial Program Analysis as a technique to analyze parts of a program without resolving all dependencies. Especially, they propose not only a lazy resolving strategy, but to cope with not clearly resolvable bindings. Whether this technique can be applied depends on the individual analysis.
2.4.6.2. Software Models
Software models as used in this thesis conform to traditionally Abstract Syntax Tree (AST) models. For a consistent use of models throughout the SPLEVOapproach, software models conforming to the EMOF/Ecore specifications 2.1.3 are used. Existing solutions for such Ecore-based models are designed in three different ways, as summarized in Table 2.7.
OMG KDM standardization initiative The OMG Architecture-Driven Modernization Task Force has developed the KDM standard for software reverse engineering [139]. The KDM standard includes a metamodel for Ab-stract Syntax Tree Models [138]. As shown in Figure 2.8, the OMG defined a metamodel system to represent language independent ASTs (i.e., Generic Abstract Syntax Tree Model - GAST), language specific ASTs (i.e., Specific Abstract Syntax Tree Model - SAST), and proprietary ASTs (i.e., Proprietary Abstract Syntax Tree Model - PAST). Table 2.8 of the specification [138, page 10] summarizes the purpose of these different metamodels.
2.4. Software Maintenance and Evolution
Approach Description Example
Top Down The model is designed first.
The textual syntax is either derived or the extraction needs to translate between them.
OMG KDM [139, 138]
Bottom Up Parser oriented. The textual syntax exists first and the model is designed according to the grammar of the lan-guage.
JaMoPP [78]
IDE Oriented The model is derived from the internal model used within an IDE and neither aligned to a grammar or the purpose of the model.
MoDisco [26]
Table 2.7.:Ecore software model design approaches
Available implementations The OMG provided the specification for the metamodels only. However, they refer to the Eclipse MoDisco project [26]
as the de facto reference implementation of the OMG KDM specifica-tion [139].
The MoDisco project provides an Ecore-based implementation of the over-arching KDM metamodel. However, this model is not integrated with the AST model provided by the MoDisco model extractor. As shown in Table 2.7, the model extracted by MoDisco is aligned to the IDE internal model (i.e., Ec-lipse JDT AST) and does not conform to the OMG AST metamodel. Thus, there is no implementation according to the OMG AST specification avail-able yet.
Name Title Description
GASTM Generic AST Model A generic set of language modeling elements common across numerous languages establishes a common core for language modeling, called the Generic Abstract Syntax Trees. In this specification, the GASTM model elements are expressed as UML class diagrams.
SASTM Language Specific AST Models
Metamodels for particular languages such as Ada, C, Fortran, Java, etc.
are modeled in Meta Object Facil-ity (MOF) or MOF compatible forms and expressed as the GASTM along with modeling element extensions suf-ficient to capture the language.
PASTM Proprietary AST Models
Metamodels that express ASTs for languages such as Ada, C, COBOL, etc. modeled in formats that are not consistent with MOF, the GSATM, or SASTM. For such proprietary AST, this specification defines the minimum conformance specifications needed to support model interchange.
Table 2.8.:AST Models of the OMG AST specification [138, page 10]
2.4. Software Maintenance and Evolution
Source Code Repository
PAST 1 PAST
Source Code 1 Source Code n
ASTM Meta-Meta Model ASTM
Core
ASTM Meta-Model GASTM
Core
SASTM Pkg 1
SASTM Pkg n
ASTM Model GAST
Core
SAST Pkg 1
SAST Pkg n
Figure 2.8.:OMG AST metamodel structure [138]