• Tidak ada hasil yang ditemukan

Introduction

Dalam dokumen REVERSE ENGINEERING (Halaman 47-54)

Understanding Interleaved Code

1. Introduction

Automated Software Engineering, 3, 47-76 (1996)

© 1996 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.

4 8 RUGABER, STIREWALT, AND WILLS

the problem being solved may not be explicitly stated in the program text; nor is the rationale the programmer had for choosing the particular solution usually visible.

This paper is concerned with a specific difficulty that arises when trying to answer "why"

questions about computer programs. In particular, it is concerned with the phenomenon of interleaving in which one section of a program accomplishes several purposes, and disentangling the code responsible for each purpose is difficult. Unraveling interleaved code involves discovering the purpose of each strand of computation, as well as understanding why the programmer decided to interleave the strands. To demonstrate this problem, we examine an example program in a step-by-step fashion, trying to answer the questions "why is this program the way it is?" and "what makes it difficult to understand?"

/ . / . NPEDLN

The Fortran program, called NPEDLN, is part of the SPICELIB library obtained from the Jet Propulsion Laboratory and intended to help space scientists analyze data returned from space missions. The acronym NPEDLN stands for Nearest Point on Ellipsoid to Line. The ellipsoid is specified by the lengths of its three semi-axes (A, B, and c), which are oriented with the X, y, and z coordinate axes. The line is specified by a point (LINEPT) and a direction vector (LINEDR). The nearest point is contained in a variable called PNEAR. The full program consists of 565 lines; an abridged version can be found in the Appendix with a brief description of subroutines it calls and variables it uses. The executable statements, with comments and declarations removed, are shown in Figure L

The lines of code in NPEDLN that actually compute the nearest point are somewhat hard to locate. One reason for this has to with error checking. It turns out that SPICELIB includes an elaborate mechanism for reporting and recovering from errors, and roughly half of the code in NPEDLN is used for this purpose. We have indicated those lines by shading in Figure 2.

The important point to note is that although it is natural to program in a way that intersperses error checks with computational code, it is not necessary to do so. In principal, an entirely separate routine could be constructed to make the checks and NPEDLN called only when all the checks are passed. Although this approach would require redundant computation and potentially more total lines of code, the resultant computations in NPEDLN would be shorter and easier to follow.

In some sense, the error handling code and the rest of the routine realize independent plans. We use the term plan to denote a description or representation of a computational structure that the designers have proposed as a way of achieving some purpose or goal in a program. This definition is distilled from definitions in (Letovsky and Soloway, 1986, Rich and Waters, 1990, Selfridge et al., 1993). Note that apian is not necessarily stereotyp-ical or used repeatedly; it may be novel or idiosyncratic. Following (Rich and Waters, 1990, Selfridge et al., 1993) , we reserve the term cliche for a plan that represents a standard, stereotypical form, which can be detected by recognition techniques, such as (Hartman, 1991, Letovsky, 1988, Kozaczynski and Ning, 1994, Quilici, 1994, Rich and Wills, 1990, Wills, 1992). . Plans can occur at any level of abstraction from architectural overviews to code. By extracting the error checking plan from NPEDLN, we get the much smaller and, presumably, more understandable program shown in Figure 3.

UNDERSTANDING INTERLEAVED CODE 49

SUBROUTINE NPEDLN (

IF ( RETURN 0 ) RETURN ELSE

CALL CHKIN ( END IF

A, B, C, LINEPT, LINEDR, PNEAR, DIST )

CALL UNORM ( LINEDR, UDIR, MAG ) IF ( MAG .EQ. 0 ) THEN

CALL SETMSG( 'Line direction vector CALL SIGERR(

CALL CHKOUTI RETURN ELSE IF {

.OR.

.OR.

THEN CALL SETMSG

is the zero vector.

'SPICE(ZEROVECTOR)' 'NPEDLN' )

( A { B { C

LE.

LE.

LE.

DO ) DO ) DO ) A = #,

= #. '

'Semi-axes:

B = #, C '#', A ) '#', B ) '#', C )

CALL SIGERR ('SPICE(INVALIDAXISLENGTH)') CALL CHKOUT ( 'NPEDLN' )

CALL ERRDP CALL ERRDP CALL ERRDP

RETURN END IF SCALE = SCLA SCLB SCLC IF (

. .OR.

. .OR.

CALL SETMSG { MAX ( DABS (A) A / SCALE B / SCALE SCALE ( SCLA**2 .LE ( SCLB**2 .LE.

( SCLC**2 .LE.

DABS(B), DABS(C) )

= C /

CALL ERRDP CALL ERRDP CALL ERRDP

O.DO ) O.DO ) O.DO ) ) THEN 'Semi-axis too small:

A = #, B = #, C = #. ' ) '#', A )

'#', B ) '#*, C )

CALL SIGERR ('SPICE(DEGENERATECASE)') CALL CHKOUT ( 'NPEDLN' )

RETURN END IF Scale LINEPT.

S C L P T d ) = LINEPT (1) / SCALE SCLPT{2) = LINEPT(2) / SCALE SCLPT{3) = LINEPT(3) / SCALE CALL VMINUS (UDIR, OPPDIR )

CALL SURFPT (SCLPT, UDIR, SCLA, SCLB, SCLC, PT(1,1), FOUND(l)) CALL SURFPT (SCLPT, OPPDIR, SCLA,

SCLB, SCLC, PT{1,2), FOUND(2)) DO 50001

I = 1, 2 IF ( FOUND(I) ) THEN

DIST = 0.ODO CALL VEQU ( CALL VSCL ( CALL CHKOUT ( RETURN END IF 50001 CONTINUE C

N O R M A L d ) = U D I R d ) NORMAL(2) = UDIR(2) NORMAL(3) = UDIR(3) CALL NVC2PL ( NORMAL,

CALL INEDPL ( SCLA, SCLB, SCLC CAND, XFOUND ) IF ( .NOT. XFOUND ) THEN

PT(1,I), PNEAR ) SCALE,PNEAR, PNEAR )

'NPEDLN' )

/ SCLA**2 / SCLB**2 / SCLC**2 O.DO, CANDPL )

CANDPL,

CALL SETMSG ( CALL SIGERR { CALL CHKOUT { RETURN END IF

CALL NVC2PL ( UDIR, CALL PJELPL ( CAND,

Candidate ellipse could not be found.' )

SPICE(DEGENERATECASE)'

NPEDLN* ) )

O.DO, P R J P L ,

P R J P L ) P R J E L )

C A L L V P R J P ( S C L P T , C A L L N P E L P T ( P R J P T ,

P R J P L , P R J E L ,

P R J P T ) P R J N P T ) D I S T = V D I S T ( P R J N P T , P R J P T )

C A L L V P R J P I ( P R J N P T , P R J P L , I F O U N D ) I F ( . N O T . I F O U N D ) T H E N

C A N D P L , P N E A R ,

C A L L S E T M S G (

C A L L S I G E R R ( C A L L C H K O U T ( R E T U R N E N D I F

C A L L V S C L ( S C A L E , P N E A R , D I S T = S C A L E * D I S T C A L L C H K O U T { ' N P E D L N ' ) R E T U R N

E N D

I n v e r s e p r o j e c t i o n c o u l d n o t b e f o u n d . ' )

S P I C E ( D E G E N E R A T E C A S E ) ' ) N P E D L N • )

Figure 1. NPEDLN minus comments and declarations.

50 RUGABER, STIREWALT, AND WILLS

SUBROUTINE NPEDLN { A, B, C, LINEPT, LINEDR, PNEAR, DIST )

IF ( RETURN 0 f RETUiUS EL$E

CAX*L CHKIN { mo XF

'N|»E0I^* )

CALL UNORM ( LINEDR, UDIR, MAG ) I F ( JJAG ,EQ. 0

CALL SETMSG{

) THfiN

'Line direction vector is the zer6 vector, ' CALL SIQERR< * SPICE(ZEROVECTOB)' \ CALL CHKODTi *NPEI>LK' )

RETPim ELSE IF {

.OR.

.OR.

mm

CALL SKtKSG {

( A X E . 0,00 ) { B ,LE, Q,m )

i c .m. o.m ) }

fi » #. C « #•' i CALL ERRDP { * #' , A }

CALL ERRDP i ' # \ B ) CALL EKRDP. { '#', C )

CALL SIGBRR T SPICE {IHVALIDAXISJi^ESGTH) ^ ) CALL CHKOUT f 'nPSlDlM' J

RETtmN e»D IF SCALE SCLA SCLB SCLC IF {

= MAX { DABS(A), DABS(B), DABS(C) )

. ,OR, CALL SETMSG i

A / SCALE B / SCALE C / SCALE

{ SCLA**2 .LE. O.m } ( SCLB**2 .LE. O.bO ) i SCLC**2 .LE, a.m ) \ THEN

'Semi-axis too SMiall:

A « #, B = #. C == i. ' } CALL ERRDP i '#*, A )

CAtt. ERRDP i ' # % B ) CALt WmW { '#'r C J

CALL SIGEKR ('SPtCfiCDEtSENERATSCASE) ') CALL CHKOUT i 'NPEDLN* }

RETURN END IF Scale LINEPT.

SCLPT(l) = LINEPT(l) / SCALE SCLPT(2) = LINEPT(2) / SCALE SCLPT{3) = LINEPT(3) / SCALE CALL VMINUS (UDIR, OPPDIR )

ODO

( PT(1,I), PNEAR ) { SCALE,PNEAR, PNEAR )

)

CALL SURFPT (SCLPT, UDIR, SCLA, SCLB, SCLC, PT(1,1), FOUND(l)) CALL SURFPT (SCLPT, OPPDIR, SCLA,

SCLB, SCLC, PT(1,2), F0UND(2)) DO 50001

I = 1, 2 IF ( FOUND(I) ) THEN

DIST = 0 CALL VEQU CALL VSCL CALL CRKOUT ( RETURN END IF 50001 CONTINUE C

NORMAL(1) = UDIR(l) / SCLA**2 NORMAL(2) = UDIR(2) / SCLB**2 NORMAL(3) = UDIR(3) / SCLC**2 CALL NVC2PL ( NORMAL, O.DO, CANDPL ) CALL INEDPL ( SCLA, SCLB, SCLC, CANDPL,

CAND, XFOUND ) IF ( ,NOT. XFOUND ) THEN

CALL SETMSG < 'Candidate ellipse could not be found,' >

CALL SIGERR ( ' SPICE (DES^NERATECASEP ) CALL CHKOUT ( 'NP13>WI' \

RETtmJJ FND I F

CALL NVC2PL ( UDIR, O.DO, CALL PJELPL ( CAND, PRJPL,

PRJPL ) PRJEL ) CALL VPRJP ( SCLPT, PRJPL, PRJPT ) CALL NPELPT ( PRJPT, PRJEL, PRJNPT ) DIST = VDIST ( PRJNPT, PRJPT )

CALL VPRJPI ( PRJNPT, PRJPL, CANDPL, PNEAR, IFOUND )

IF ( ,NOT. IPOmm ) THEN

CALL SETMSG ( 'Inverse projection could not be found.' >

CALL SIGERR i 'SPICS(DSJeENERATSCASEj ' } CAU, CHKOUT { 'JtJPEDLN' >

RETURN END I F

CALL VSCL ( SCALE, PNEAR, PNEAR ) DIST = SCALE * DIST

CALL CHKOUT ( 'STPEDLN* ) RETURN

END

Figure 2. Code with error handling highlighted.

The structure of an understanding process begins to emerge: detect a plan, such as error checking, in the code and extract it, leaving a smaller and more coherent residue for further analysis; document the extracted plan independently; and note the ways in which it interacts with the rest of the code.

We can apply this approach further to NPEDLN'S residual code in Figure 3. NPEDLN has a primary goal of computing the nearest point on an ellipsoid to a specified line. It also has a related goal of ensuring that the computations involved have stable numerical behavior;

that is, that the computations are accurate in the presence of a wide range of numerical inputs. A standard trick in numerical programming for achieving stability is to scale the data involved in a computation, perform the computation, and then unscale the results. The

UNDERSTANDING INTERLEAVED CODE

51

JROUTINE NPEDLN ( A, B, C, LINEPT, PNEAR, DIST ) CALL UNORM ( LINEDR, UDIR, MAG ) SCALE =

SCLA SCLB SCLC SCLPT{1) SCLPT{2) SCLPT(3)

MAX { DABS(A), DABS(B), A / SCALE

B / SCALE C / SCALE

= LINEPT(1) / SCALE

= LINEPT(2) / SCALE

= LINEPT(3) / SCALE

LINEDR,

DABS{C)

CALL VMINUS (UDIR, OPPDIR )

CALL SURFPT (SCLPT, UDIR, SCLA, SCLB, SCLC, PT(1,1), F O U N D d ) ) CALL SURFPT (SCLPT, OPPDIR, SCLA,

SCLB, SCLC, PT(1,2), F0UND(2)) DO 50001

I = 1, 2 IF ( F O U N D d ) ) THEN

DIST = O.ODO CALL VEQU CALL VSCL RETURN END IF 50001 CONTINUE

( PT(1,I), PNEAR ) ( SCALE,PNEAR, PNEAR )

NORMAL (1) = U D I R d ) NORMAL(2) = UDIR(2) NORMAL(3) = UDIR{3) CALL NVC2PL ( NORMAL, CALL INEDPL ( SCLA,

/ SCLA**2 / SCLB**2 / SCLC**2 O.DO, CANDPL ) SCLB, SCLC, CANDPL, CAND, XFOUND )

CALL NVC2PL { UDIR, O.DO, PRJPL ) CALL PJELPL ( CAND, PRJPL, PRJEL ) CALL VPRJP ( SCLPT,

CALL NPELPT ( PRJPT, DIST = VDIST ( PRJNPT,

PRJPL, PRJPT ) PRJEL, PRJNPT ) PRJPT )

CALL VPRJPI ( PRJNPT, PRJPL, CANDPL, PNEAR, IFOUND )

CALL VSCL ( SCALE, PNEAR, PNEAR ) DIST = SCALE * DIST

RETURN END

Figure 3. The residual code without the error handling plan.

code responsible for doing this in NPEDLN is scattered throughout the program's text. It is highlighted in the excerpt shown in Figure 4.

The delocalized nature of this "scale-unscale" plan makes it difficult to gather together all the pieces involved for consistent maintenance. It also gets in the way of understanding the rest of the code, since it provides distractions that must be filtered out. Letovsky and Soloway's cognitive study (Letovsky and Soloway, 1986) shows the deleterious effects of delocalization on comprehension and maintenance.

When we extract the scale-unscale code from NPEDLN, we are left with the smaller code segment shown in Figure 5 that more directly expresses the program's purpose: computing the nearest point.

There is one further complication, however. It turns out that NPEDLN not only computes the nearest point from a line to an ellipsoid, it also computes the shortest distance between the line and the ellipsoid. This additional output (DIST) is convenient to construct because it can make use of intermediate results obtained while computing the primary output (PNEAR).

This is illustrated in Figure 6. (The computation of D I S T using VDIST is actually the last computation performed by the subroutine NPELPT, which NPEDLN calls; we have pulled this computation out of NPELPT for clarity of presentation.)

Note that an alternative way to structure SPICELIB would be to have separate routines for computing the nearest point and the distance. The two routines would each be more coherent, but the common intermediate computations would have to be repeated, both in the code and at runtime.

The "pure" nearest point computation is shown in Figure 7. It is now much easier to see the primary computational purpose of this code.

52 RUGABER, STIREWALT, AND WILLS

SUBROUTINE NPEDLN { A, B, C, LINEPT, LINEDR, PNEAR, DIST )

CALL UNORM { LINEDR, UDIR, MAG ) SCALE

SClk SCLB SCLC

- MAX { DABS {Ah mBS(B), DABS(C) }

SCLP1*aj SCLPTI2) SCLPTU)

/ SCALE / SCALE / SCALE

LINEPT (2) LINEPTO)

ISCALB SCALE SCALE CALL VMINUS (UDIR, OPPDIR )

CALL SURFPT (SCLPT, UDIR, SCLA, SCLB, SCLC, PT{1,1), FOUND(l)) CALL SURFPT (SCLPT, OPPDIR, SCLA,

SCLB, SCLC, PT(1,2), F0UND(2)) DO

50001 50001

I = 1, 2 IF ( FOUND(I)

DIST = 0.

CALL VEQU CALL VSCL RETURN END IF CONTINUE

) THEN ,0D0

( PT(1< ,1), PNEAR ) { SCALE, fWEAR, mBAR }

NORMAL(1) = UDIR(l) / SCLA**2 N0RMAL(2) = UDIR(2) / SCLB**2 NORMALO) = UDIR(3) / SCLC**2 CALL NVC2PL ( NORMAL, O.DO, CANDPL ) CALL INEDPL ( SCLA, SCLB, SCLC, CANDPL,

CAND, XFOUND ) CALL NVC2PL ( UDIR, O.DO, PRJPL ) CALL PJELPL ( CAND, PRJPL, PRJEL ) CALL VPRJP ( SCLPT, PRJPL, PRJPT ) CALL NPELPT ( PRJPT, PRJEL, PRJNPT ) DIST = VDIST ( PRJNPT, PRJPT )

CALL VPRJPI ( PRJNPT, PRJPL, CANDPL, PNEAR, IFOUND )

CALL VSCL ( SCALE, PimAR, PJIEAH J DIST =? SCALE * DIST

RETURN END

Figure 4. Code with scale-unscale plan highlighted.

SUBROUTINE NPEDLN ( A, B, C, LINEPT, LINEDR, PNEAR, DIST )

C

CALL UNORM ( LINEDR, UDIR, MAG ) C

CALL VMINUS (UDIR, OPPDIR )

CALL SURFPT (SCLPT, UDIR, SCLA, SCLB, SCLC, PT(1,1), FOUND{1)) CALL SURFPT (SCLPT, OPPDIR, SCLA,

SCLB, SCLC, PT(1,2), FOUND(2)) DO 50001

I = 1, 2

IF { FOUND(I) ) THEN DIST = O.ODO

CALL VEQU { PT(1,I), PNEAR ) RETURN

END IF 50001 CONTINUE

NORMAL(1) = NORMAL(2) = NORMAL(3) = CALL NVC2PL { CALL INEDPL { CALL NVC2PL ( CALL PJELPL ( C

CALL VPRJP { CALL NPELPT { DIST = VDIST C

CALL VPRJPI ( RETURN END

UDIR(l) / SCLA**2 UDIR(2) / SCLB**2 UDIR(3) / SCLC**2 NORMAL, O.DO, CANDPL ) SCLA, SCLB, SCLC, CANDPL, CAND, XFOUND )

UDIR, O.DO, PRJPL ) CAND, PRJPL, PRJEL ) SCLPT, PRJPL, PRJPT ) PRJPT, PRJEL, PRJNPT ) ( PRJNPT, PRJPT )

PRJNPT, PRJPL, CANDPL, PNEAR, IFOUND )

Figure 5. The residual code without the scale-unscale plan.

UNDERSTANDING INTERLEAVED CODE

53

SUBROUTINE NPEDLN ( A, B, C, LINEPT, LINEDR, PNEAR, DIST )

C

CALL UNORM { LINEDR, UDIR, MAG ) C

CALL VMINUS (UDIR, OPPDIR )

CALL SURFPT (SCLPT, UDIR, SCLA, SCLB, SCLC, PT(1,1), FOUNDd)) CALL SURFPT (SCLPT, OPPDIR, SCLA,

SCLB, SCLC, PT(1,2), F0UND{2)) DO 50001

I = 1, 2 IF { FOUND(I) ) THEN

DISfT » O.ODQ

CALL VEQU ( PT(1,I), PNEAR ) RETURN

END IF 50001 CONTINUE

NORMAL(1) = NORMAL(2) = NORMAL(3) = CALL NVC2PL ( CALL INEDPL ( CALL NVC2PL ( CALL PJELPL ( CALL VPRJP ( CALL NPELPT ( DlSfT s VDXST

UDIRd) / SCLA**2 UDIR(2) / SCLB**2 UDIR(3) / SCLC**2 NORMAL, O.DO, CANDPL ) SCLA, SCLB, SCLC, CANDPL, CAND, XFOUND )

UDIR, O.DO, PRJPL ) CAND, PRJPL, PRJEL ) SCLPT, PRJPL, PRJPT ) PRJPT, PRJEL, PRJNPT ) ( PRJN&T, tRJPT }

CALL VPRJPI ( PRJNPT, PRJPL, CANDPL, PNEAR, IFOUND )

RETURN END

Figure 6. Code with distance plan highlighted.

SUBROUTINE NPEDLN ( A, B, C, PNEAR )

LINEPT, LINEDR,

CALL UNORM ( LINEDR, UDIR, MAG ) CALL VMINUS (UDIR, OPPDIR )

CALL SURFPT (SCLPT, UDIR, SCLA, SCLB, SCLC, PT(1,1), FOUNDd) ) CALL SURFPT (SCLPT,

SCLB,

OPPDIR, SCLA,

SCLC, PT(1,2), FOUND(2)) DO 50001

I = 1, 2 IF ( FOUND(I)

CALL VEQU RETURN END IF 50001 CONTINUE

THEN

( PT(1,I), PNEAR )

NORMAL(1) = NORMAL(2) = NORMAL(3) = CALL NVC2PL CALL INEDPL CALL NVC2PL CALL PJELPL CALL VPRJP CALL NPELPT

UDIRd) / SCLA**2 UDIR(2) / SCLB**2 UDIR(3) / SCLC**2 ( NORMAL, O.DO, CANDPL ) ( SCLA, SCLB, SCLC, CANDPL,

CAND, XFOUND ) ( UDIR, O.DO, PRJPL ) ( CAND, PRJPL, PRJEL ) ( SCLPT,

( PRJPT, PRJPL, PRJEL,

PRJPT ) PRJNPT ) CALL VPRJPI ( PRJNPT, PRJPL, CANDPL, PNEAR,

IFOUND ) RETURN

END

Figure 7. The residual code without the distance plan.

5 4 RUGABER, STIREWALT, AND WILLS

The production version of NPEDLN contains several interleaved plans. Intermediate For-tran computations are shared by the nearest point and distance plans. A delocalized scaling plan is used to improve numerical stability, and an independent error handling plan is used to deal with unacceptable input. Knowledge of the existence of the several plans, how they are related, and why they were interleaved is required for a deep understanding of NPEDLN.

1,2. Contributions

In this paper, we present a characterization of interleaving, incorporating three aspects that make interleaved code difficult to understand: independence, delocalization, and re-source sharing. We have distilled this characterization from an empirical examination of existing software - primarily SPICELIB. Secondary sources of existing software which we also examined are a Cobol database report writing system from the US Army and a pro-gram for finding the roots of functions, presented and analyzed in (Basili and Mills, 1982) and (Rugaber et al., 1990). We relate our characterization of interleaving to existing con-cepts in the literature, such as delocalized plans (Letovsky and Soloway, 1986), coupling (Yourdon and Constantine, 1979), and redistribution of intermediate results (Hall, 1990, Hall, 1991).

We then describe the context in which we are exploring and applying these ideas. Our driving program comprehension problem is to elaborate and validate existing partial spec-ifications of the JPL library routines to facilitate the automation of specification-driven generation of programs using these routines. We have developed analysis tools, based on the Software Refinery, to detect interleaving. We describe the analyses that we have for-mulated to detect specific classes of interleaving that are particularly useful in elaborating specifications. We then discuss open issues concerning requirements on software and plan representations that detection imposes, the role of application knowledge in addressing the interleaving problem, scaling up the scope of interleaving, and the feasibility of building tools to assist interleaving detection and extraction. We conclude with a description of how related research in cliche recognition as well as non-recognition techniques can play a role in addressing the interleaving problem.

Dalam dokumen REVERSE ENGINEERING (Halaman 47-54)