Extracting Architectural Features from Source Code*
6. Experience
128 D.R. HARRIS, H.B. REUBENSTEIN, A.S. YEH
In summary, recognizer authors build the RRL descriptions using the RRL language con-structs and special analysis functions. They set pre-conditions and environment attributes to link the recognizer into the library. At this time they may add the new recognizer's name to default recognizer lists for the style-level entities/relations.
Subsequently, during an investigation, an analyst retrieves the recognizer either by se-lecting an entity/relation with a default, by recognizer name, by indicating a text fragment of the description, or by indicating the effect desired. The implementation recursively runs recognizers in the pre-condition attribute, asks the analyst to set any of the required parameters, and interprets the RRL code in the recognizer's method.
If the analyst employed the recognizer in architecture recovery, the results are added to the as-built architecture with respect to some style. We provide additional support via specialization hierarchies among the architectural entities and relation. Upon finding that few examples of an architectural feature are recognized, the analyst has the option of expanding a search by following generalization and specialization links and searching for architecturally related information. This capability complements the recognizer indexing scheme based on code level relationships.
EXTRACTING ARCHITECTURAL FEATURES FROM SOURCE CODE 129
tasks. Figure 12 is actually a view of a thinned-out XSN: several tasks and data files have been removed to reduce diagram clutter. In the view's legend, "query" is another term for
"recognizer".
We next looked for layering structure. We attempted an approach that bundled up cycles within the procedure calling hierarchy but otherwise used the procedure calling hierarchy in its entirety. This approach lead to little reduction over the basic structure chart report.
We felt that additional clustering was possible using either deeper dominance analysis or domain knowledge, but we did not pursue these approaches. We did build some preliminary capabilities based on advertised API's for commercial subsystems or program layers. These capabilities found portions of the code identified as users of some API. One predominate example, particularly informative for XSN, was the code that accesses the underlying X window system. We have not yet been able to implement a method that would combine such bottom-up recognition with more globally-based layering recovery methods.
XSN acts as a client (sometimes a server) in its interactions with network services such as sendmail or ftp. A service-invocation recognizer shown in Figure 13 recovered elements of this style successfully. Over time we made several enhancements to the recognizer to improve its explanation power. First, we refined its ability to identify the source of an interaction. The notion we settled on was to identify the procedures that set port numbers (i.e., indicate the service to be contacted) rather than the procedure containing the service invocation call. Second, we enhanced the recognizer so that it would recognize a certain pattern of complex, but stereotypical client/server interaction. In this pattern, we see the client setting up a second communication channel in which it now acts as the server. It was necessary to recognize this pattern in order to identify the correct external program associated with the second channel.
At this point, we inspected the code to see if there were any obvious gaps in system coverage by the as-built architecture we had found. We discovered that there were several large blocks of code that did not participate in any of the styles. By examining the code, it was clear that the developers had implemented several abstract data types - tables, lists.
Thus, we set about building and applying OBAD to the XSN system. These table and list abstractions were recognized interactively by our OBAD sub-tool (see Section 4.3).
We developed over sixty recognizers for this analysis. Thirteen were used for client/server recovery, seven for task spawning, nine were used for some form of layering, four for repos-itory, seven for code level features, two for ADT recovery, and one for implicit invocation.
Thirteen of the recognizers were utilities producing intermediate results that could be used for recovering features of multiple styles. The library also contains seven recognizers that make some simplifying assumptions in order to approximate the results of more computa-tional intensive recognizers. These recognizers proved to be particularly useful in situations where it was not possible to obtain a complete program slice (Section 4.1).
Since the above profile of recognizers is based on recognition adequacy with respect to only a few systems, the numbers should be taken in context. What is important is that they indicate the need for serious recognition library management of the form we have described in this paper.
We feel that we have gone a long way toward recognizing standard C/Unix idioms for encoding architectural features. We are still at a stage where each new system we analyze
130 D.R. HARRIS, H.B. REUBENSTEIN, A.S. YEH
UJ
Figure 12. Task spawning view of (a thinned-out version of) XSN
EXTRACTING ARCHITECTURAL FEATURES FROM SOURCE CODE 131
let (result = [] )
for-every call in invocations-of-type('service-invocations) for-each port in where-ports-set(call)
let (target = service-at(second(port)))
let (proc = enclosing-procedure(first(port)))
(result <- prepend(result, [call, target, proc])), result
Figure 13. A service recognizer uses the invocations-of-type construct.
requires some modifications to our implementation, but the number of required modifi-cations is decreasing. In one case, we encoded a new architecture style called "context"
(showing the relationship between system processes and the connections to external files and devices) as a means to best describe a new system's software architecture. We were able to recognize all features of this style by just authoring one new recognizer and reusing several others. More frequently, we have found that the set of recognizers is adequate but we need to refine existing recognizers to account for subtleties that we had not seen before.
Table 2 summarizes the amount of code in XSN covered when viewed with respect to the various styles. The first row gives the percentage of the lines of code used in the connectors for that style. The second row gives the percentage of the procedures covered by that style.
A procedure is covered if it is included in some component in that style.
Table 2. Code coverage measures for XSN
Style:
% connector LOC:
% of procedures:
ADT 0 39.3
API 0 13.9
c/s
0.3 3.3
Repository 2.2 13.1
Task Spawning 0.7 2.5
Combining all the styles whose statistics are given results in a total connector coverage of about 3% of the lines of code and over 47% of the procedures. Procedure coverage total is less than the sum of its constituents in the above table because the same procedure may be covered by multiple styles.
We offer these statistics as elementary examples of architectural recovery metrics. This endeavor is important both to determine the effectiveness of the style representations (e.g., what is the value-added of authoring a new style) and to provide an indicator for analysts of how well they have done in understanding the system under analysis.
The measures we provide are potentially subject to some misinterpretation. It is difficult to determine how strongly a system exhibits a style and how predictive that style is of the entire body of code. As an extreme example, one could fit an entire system into one layer. This style mapping is perfectly legal and covers the whole system, but provides no abstraction of the system and no detailed explanation of the components.
132 D.R. HARRIS, H.B. REUBENSTEIN, A.S. YEH
In spite of these limits, there are experimental and programmatic advantages for defin-ing code coverage metrics. The maintenance conmiunity can benefit from discussion on establishing reasonable measures of progress toward understanding large systems.