and An

(1)

An examination of the relation between architecture and compiler design leads to several principles which can simplify compilers

and improve the object code they produce.

William A. Wulf Carnegie-Mellon University

Theinteractions between the designofacomputer's instructionsetandthedesignofcompilersthatgenerate code for that computer have serious

implications

for overall computational cost and efficiency. This article, which investigates those interactions, should ideallybe based oncomprehensive data; unfortunately, thereis a paucityof suchinformation. Andwhilethere is data on the use ofinstructionsets, therelationofthisdata to com- pilerdesignislacking. This is,

therefore,

afranklyper- sonalstatement,but onewhichisbasedonextensiveex- perience.

Mycolleagues andIare in themidstof aresearch effort aimed at automating the construction of production- qualitycompilers. (Tolimitthescopeofwhatisalready anambitiousproject,wehaveconsidered only algebraic languagesandconventional computers.)Inbrief, unlike manycompiler-compiler effortsofthepast, oursinvolves automatically generating all of the

phases

of a compiler-including the optimization and code generation phases found inoptimizing compilers. The only inputto thisgenerationprocessis aformaldefinitionofthe source language and target computer. Theformulation of com- pilationalgorithms that,withsuitableparameters, areef- fective acrossabroad classofcomputerarchitectures has beenfundamentalto thisresearch.Inturn,findingthese algorithmshasledus tocritically examine many architectures andtheproblems theypose. Muchofthe opinion that followsisbased on ourexperiences intrying todo this,withnotes on thedifficultiesweencountered.

Articles of this sortcommonly begin by observing that the costofhardware is

falling

rapidly

while

thecost of software is rising. The

inevitable

conclusionis that we oughttofindways forthe hardwaretosimplifythe software task.Oneareainwhichthismightbedoneis in the designofinstruction sets that better reflecttheneeds of high-level languages. Better instructionsets would both

simplify compilers

and

improve

thesizeandspeedofthe programs

they

generate.This observation and conclusion are

absolutely

correct. Treated too

simplistically,

however,

they

leadtomistakeninferences.Forinstance,many

people

haveconcluded that

efficiency

ofthe objectcode fromacotnpilerisnolonger important-or,atleast,that itwillnotbeinthenearfuture. This is patently false. To suggest

why

Ibelieveit isfalse andtolay the

groundwork

for much of what

follows,

letmeillustratewith anexam-

ple.

TodayI use atimesharingsystem that isfivetimes faster thantheoneI usedadecadeago;italsohas

eight

times the

primary

memory and vastly more

secondary

storage.

Yet,

it supports aboutthesamenumber-of people,and.response is worse. The reason for thisanomaly is

simply

thatour

aspirations

have grown much fasterthan technologyhas been ableto satisfy them. It now takes more cycles, more memory, more disk-more every- thing-to do what a typical user wants and expects. I believethis is a

good thing;

despiteitsdegraded

perfor-

mance, the system is more responsive to my overall needs-it's more humane. Nonetheless, there is a

premium

onthe

efficiency

oftheprogramsit executes.

Thepastis alwaysoursafest predictor ofthefuture,at leastifweinterpret itproperly. Althoughhardwarecosts will continuetofalldramaticallyandmachine speeds will increase

equally

dramatically, wemustassumethat our

aspirations

willriseevenmore.Becauseof thiswearenot

aboutto face eitheracycle ormemorysurplus. For the near-termfuture, the dominant effect will not be machine cost orspeed alone,

but

tatheracontinuingattempt toin- creasethereturn

from

afiniteresource-that is, a par- ticularcomputeratourdisposal. Thisisn't asimplemat- ter of "yankee thrift." Given any system, people will want toimproveitandmakeit moreresponsive tohuman needs; inefficienciesthatthwartthoseaspirations willnot betolerated.

I

(2)

The

cost

equation

Before discussing the mutual impact of compilers and target-machine architectures, ^we needtoclarifytheob- jectivesoftheexercise.A number ofcosts,andconversely benefits,areinvolved. Theyinclude

* designing(writing) compilers,

* designingthehardware architecture,

* designingthe hardwareimplementation ofthat ar-

chitecture,

* manufacturing the hardware (i.e., replicating theimplementation),

* executingthecompiler, and

* executingthecompiledprograms.

The relative importance of these costs is, of course,

installation- and application-dependent-one cannot makestatementsabouttheirrelative importance without

morespecific information. Nonetheless, therearesome

generalobservations that bear^onwhatfollows.

First, notethat only the fourth item, thecost ofthe replication of hardware, has decreased dramatically.

While standardized MSI and LSI chips haveworked to decrease thecostof hardwaredesign,theengineeringof completeprocessors on asingle chip has worked toin-

creaseitagain. Designing irregularstructuresatthechip level isveryexpensive.

Designing irregular structures at the chip level

isveryexpensive.

Second,all of thedesignactivities(compilers, architectures, andimplementations)areone-timecosts. Fromthe customer's perspective, they are amortized over the number ofunits sold. Thus,itmakes sensetodesignanar- chitecturemoreresponsiveto

compilation

problems only ifthe reduction in the cost of writing thecompiler, ex- ecutingit, orexecutingthe codegenerated byit,offsets the increase in thecostofdesign.Often this iseasily done, aswillbediscussedlater. However, we mustremember that it is stillmoreexpensivetodesignhardware thanto designsoftwarethat does thesamejob,unlessthe jobis verysimple.This explains why it doesn't make sense, for example,to movethe entire

compiler

intohardware.

Third, both software and architectures have a life substantially longerthanthat of the hardwaretechnology for which they are initially implemented.

Despite

the predictable declineinhardware costs,toooften architec- tureshavebeen

designed

to caterto the anomalies of the technologyprevalentatthe timetheyweredesigned.Inef- fect, there has beenaconfusion between thesecond and thirdcostslisted above.

Finally,the last twoitems-the costs ofcompilingand executingprograms-arenotstrictlycomparabletothose preceding them. Dollar costscanbeassignedtothese,of course. Often, however, the correct measure is not in termsofdollars but intermsofthingsthat cannot be done as aconsequenceof

compilation

orexecution inefficiencies.Atypicalinstallationcanonlyoccasionallyacquirea newcomputer(andsuffer thetraumaofdoingso). Be-

tweenacquisitions the available computer is a fixed and finiteresource. Hence inefficiencies in compilation and executipnmanifest themselves as decreased productivity, unavailable functionality, and similar costs. Although

difficult

tomeasure,

these

costs are the first-order effects noticed by users once amachine hasbeen acquired.

Some general principles

Principles that would improve the impedence match between compilers and a target computer have been followed in many contemporaryarchitectures:

* Regularity. If something is done in one way inone place, it ought to bedonethe same wayeverywhere.

This principle has also been

called

the "law of least astonishment"inthelanguage design community.

* Orthogonality. It should be possible to divide the machinedefinition (or language definition) itito a set of separate concerns and define each in isolation fromthe others.Forexample,it

ought

to bepossible todiscuss data types, addressing, and instruction sets independently.

* Composability. If theprinciples ofregularityand orthogonality have been followed, then it

should

also bepossible tocomposethe

orthogonqi,

regular notions inarbitrary ways. Itoughttobepossible, for example, to use every addressing mode with every operatorand every data type.

However, these principles have not been completely followed, and the deviations from them present the compiler writer withsomeofhis worstproblems.

I will have a few more principles to suggest below.

Before doing so, however, let me comment on the preceding ones.

Although many compiler optimizations are cast in terms of esoteric-sounding operations such as flow analysisandalgebraic

simplification-in

fact these tech- niquesaresimply aimedatperforming an enormous case analysis. The objective is to determine the best object code foragivensourceprogram. Becausethe caseanalysis is soenormous, the variousoptimizationalgorithms attempttogleaninformation and/orshape the intermedi- ate

form(s)

ofthe program ina mannerthat allowsthe

finai

caseanalysis and code selection to be donerapidly andthoroughly.

Viewing acompiler's task as alargecaseanalysishelpsto explain whyregularity, orthogonality,and

cornposability

are soimportanttosimplifyingthe task.Every deviation

from

theseprinciples manifestsitselfasanad hoc caseto beconsidered. And, alas, theconsiderationofaspecial caseisnotlimited to thecode selectionprocess;because everypreparatoryphaserequirescertain information in- termediate representations, the manifestations of the anomalycreepbackthrough nearlyall of them. Iwill try toillustratethis withmoreexamplesin thenextsection.

Forthemoment, however,consider the genre ofgeneral- register machines. Thenamesuggeststhat theregistersare all

"general"-i.e.,

thatthey^canbeused for any of the purposes to which registers are put on the machine. In fact, however,almostno machine

actually

treatsall the

(3)

registers uniformly-multiplicands must be in "even"

registers, or double precision operands must be in even- odd pairs, or a zeroinanindexingfield of an instruction denotes noindexing (and hence the zeroth register cannot be used forindexing),orsomeoperationscanbe performed only between registers while others can be performed only between a register and memory, or any of manyother variations. Each of thesedistinctions is a violation of one ormore oftheprinciples aboveandresults inadditional complexity.

Someadditional, more specific principlesare

* One vs.all. Thereshould beprecisely one way to do something,orall waysshould be possible.

* Provideprimitives, not solutions. It is far better to provide good primitives from which solutions to codegenerationproblemscanbesynthesizedthanto provide the solutionsthemselves.

Both ofthese also relatetothesimplicity(orcomplexity) of the compiler's case analysis. Consider the "one-vs.- all" principle. Either oftheseextreme

positions

implies that thecompiler need not do any case analysis. If, for example, the only conditional branchinginstructions are onesthat test forEQUALITY and LESS THAN,there is onlyone waytogeneratecode foreach ofthesixrelations.

Alternatively,ifthere is a directimplementationofallsix relations, there isanobviouscoding for each.Difficulties arise, however,ifonlythreeorfourofthe sixareprovided.

Forexample, the compilermustdecidewhether,bycom- muting operands, there is a cheaper implementation of some oftheremaining relations. Unfortunately, this is notasimple decision-it mayimplydetermining whether side-effectsemantics areviolated,whether there isanin- teraction withregisterallocation, andsoon.

Nowconsiderthe "provideprimitives, notsolutions"

principle.

InwhatIbelievetobeanhonest attempttohelp compilerwriters, somemodern architectures have provided directimplementationsofhigh-levelconceptssuch asFORand CASEstatementsandPROCEDURE calls.

Inmany,ifnotmostcases,these turn outtobe more trou- blethantheyareworth.Invariably theyeither supporton- ly onelanguage well, or are so general that they are ineffi- cientforspecialcases-thusforcingthecompilertoper- formeven moreanalysis ^to discover thecommoncases wherea moreefficient implementation ispossible. The problem arises from a "semantic clash" between the languageandhigh-levelinstructions;bygiving too much semanticcontent totheinstruction,the machine

designer

hasmadeitpossibletousetheinstructiononly in limited contexts.Again, Iwill trytoillustrate theprobleminthe followingsection.

The last three

principles

are even more

blatantly

my opinionthan those listed earlier:

* Addressing. Address computations are paths! Ad- dressingisnotlimitedtosimplearrayand recordac- cesses!The

addressing

modes ofamachine should be designedtorecognizethesefacts.

* Environmentsupport.Allmodern architectures sup- portarithmetic andlogical computations reasonably well. Theydonotdonearly^aswell in

supporting

the run-timeenvironmentsforprograms-stackframes, displays or

static/dynamic links, exceptions,

pro-

cesses, and so on. Thewritershould provide such run-time support.

* Deviations. The writer should deviate from these principles only in ways that are implementation-

independent.

The firsttwo ofthese principles, addressing and environment support, are among the most difficult to deal with-and among the most likely to run afoul of the earlier "primitives, not solutions" principle. The con- scientious designer must remember that different lan- guagesimposedifferent constraintsonthenotions of procedures,tasks, and exceptions. Even things as mundane as caseanditerationstatementsand arrayrepresentations

aredictatedbythe semantics of thelanguageand may differ fromonelanguagetoanother.

I will have more to say about these pointslater, but let meillustratesomeof the issues with theproblem of addressing.Inmyexperience,effectiveuseofimplicit addres- singisthemostimportant aspect of generating good code.

Itisoften themostdifficulttoachieve.Ingeneral,accessing adatum from^adata structureinvolves following a pathwhoselengthisarbitrary (but is knownatcompile time).Each stepalongthepath isanaddition(indexingin- toarrays orrecords)oranindirection (throughpointers ofvarious sorts). Typical computers supply a fixed and finite menu-a collection of special cases-of suchpath

Some architectures have provided direct implementations of high-level

concepts.

In

many cases

these turn

out

to be

more

trouble than they

are

worth.

steps. Thecompiler's problem is how to choose effective- ly from among this menu; I know of notechnique for doing this exceptexhaustivespecial-case analysis. Even when the target computer'saddressing modes are well- suited to the most commoncases, its compiler remains complex since itmustbereadytohandle thegeneralcase.

Thelastprinciple is, I suppose, more in thenatureof a plea. At any time there aretechnological anomalies that canbeexploitedtomakeamachine faster orcheaper. It is tempting toallowthese factors to influence the architecture-manyexamples abound. It takes the greatest re- straint to lookbeyond thecurrentstateoftechnology,to realizethat thearchitecturewilloutlivethatstate,andto design for the future. I think thatmostof the violations of notions likeorthogonality andregularitycanbe tracedto ashortsighted view of costs. Adheringto theprinciples presented here has ameasurablehardwarecost tobesure, but one which decreases exponentially as technology changes.

Kudos and gripes

From the compiler writer's perspective, various machines have bothgoodand badfeatures which illustrate- or violate-the principles discussed above. This is not

(4)

meant to be anexhaustive list of such features, nor is it in- tended to criticize particular computers or manufacturers.

On regularity. Asacompiler writer,Imustapplaud the trend in many recent machinestoallow eachinrstruction operandtobespecified by any of the addressing modes.

Theability of thecompiler to treat registers and memory aswell as source and destination symmetrically is an ex- cellentexampleof the benefits ofregularity.Thecompiler issimpler and theobject code is better.

Not all aspects ofmodernmachines have been designed regularly, however. Most machines support several data types(includingfixed and floating-point forms), several typesof words (essentially boolean vectors), and several typesofaddresses-often with several sizes of each and sometimes with variations suchas signed and unsigned andnormalized orunnormalized.Itis rare for the operators onthesetypes tobedefined regularly, even when it would make sense for them to be. A compiler, for example, wouldlike to represent small integers as addresses or bytes. Yet, onemachineprovidesacompletesetofbyte

The familiar arithmetic

shift instructions are

another

example of

irregularity.

operations thataresymmetricwith word(integer)operations except that the ADD and SUBbytesaremissing, and another defines thesettingofcondition codesslightlydif- ferentlyforbyteoperationsthan for full-wordoperations.

Such differences preventsimplecompilers from using the obvious byte representations. In more ambitious compilers, substantial analysis must be done to determine whetherthedifferencesmatter in the particular program beingcompiled.

The familiar arithmetic shift instructionsareanother example of irregularity. I trust everyone realizes that arithmetic-right-shiftisnot adivisionbyapoweroftwo on mostmachines.

A particularly annoying violation ofregularityarises from the instructions of machinesthat makespecial pro- visionfor"immediate mode" arithmetic.Weknowfrom analysisofsourceprogramsthatcertain constants,not- ably0, ^±1,and the number ofbytesperword,appearfre- quently. It makesgoodsense toprovidespecial handling for them inaninstruction set. Yet it seems that many machine designers feel that suchinstructionsare useful onlyinformingaddresses-oratleast thattheyneednot haveeffects identicaltotheirmoreexpensive equivalents.

Manifestations include

* condition codes thatare notsetin thesameway,

* carries that donotpropagatebeyondthe size ofan

"address," and

* operationsthatarerestrictedtooperateon aselected setof"index

registers."

In

practice,

of course, i=i+1 (and itsimplicit counterpart initeration statements) isoneof themostcommon

source programstatements. Irregularities such as those above precludesimple compilersfrom using immediate mode arithmetic for these common cases and add substantial complexitytomoreambitiouscompilers.

Similar remarks apply to the floating-point instructions of many machines. In addition to providing an even morerestricted set of

operations,

these machines often failtosupporttheabstractionof"real" numbersintend- edby many higher-level languages. (Someone once ob- served that the best characterization of the floating-point hardware of most machines is as "unreal" numbers!) Mostlanguagedesignersand

pr'ogrammers

want to think ofreal numbers as anabstraction of real arithmetic (ex- ceptfortheir finite precision)-they would like, for example, for them to becommutative and associative. Floating- point representation isoftenneither, andhencethe compiler writer is constrained from exploiting some obvious optimizations. The cost is both increased compiler complexity and slowerprograms-and the sad thing is that the caseswhere theoptimizationsareillegalare rarein practice.

On orthogonality. ByorthogonalityImeantheability todefine separate concerns-such as addressing, operations, and datatypes-separately. This property is closely allied withregularity andcomposability. Thefailure of general-register machinesto treatall theirregisters alike could becharacterizedasafailure of any of these properties. Several machinescontainbothlong and short forms ofbranch instructions, for example, in which theshort formis takenas aconstant displacementrelativetothe program counterand is anaddressing mode notavailable in anyother kind ofinstruction. Some machines include instructions whose effectdepends on theaddressing mode used.Forexample,onsomemachinessignextension is (or is not) donedepending on thedestination location.

Some

othermachinescreatelongorshort forms of theresultof amultiplication, depending ontheeven-oddness of the destinationregister. Some of themostpopularmachines contain different instructionsetsforregister-to-register, memory-to-register, and memory-to-memoryoperations (andworse,theseinstructionsetspartially-butnotcom- pletely-overlap).

Itshouldbeclear that thecompilershouldperforma separateanalysisforeachofthesecases.Andunlikesome of myprevious examples,this requirement shouldapply tosimpleaswellasambitiouscompilers.

Oncomposability.From thecompilerwriter's perspective,the ideal machinewould,amongotherthings,make available the full cross-product of operations and ad- dressingmodesonsimilardata types. The ADD instruction should have identical effects whether it addsliterals, bytes,orwords, forexample. Moreover, anyaddressing mode available inonevariantshould be available in

the

others.

More germanetothe presentpointis the notion ofcon- version. Many machines failbadly in this respect. Forex- ample,mostonly providerelational operators(i.e.,^conditional branch instructions) that affect control

flow,

while source languages generally allow relational expres- sionstoappearincontextswhereabooleanvaluemustbe

(5)

mademanifest (e.g., allow them tobestored into a user's booleanvariable). In addition, many machines donot provide for conversion between data types such as integer andfloatingpoint, nordotheyprovide aconversion that differsfrom thatspecified in source languages.

The root of theproblemlies in the fact that programming languages view type as a property of data (or variables), while machines view type as a property of operators. Because the number ofmachine data types is moderately large, the number of operation codes needed toimplement afull cross-product isunreasonable. For example, it is usuallyimpossible toadd a byteto aword without firstconverting the byte to full-word form. The need for suchexplicit conversions makes it difficult to determine when overall cost is reduced by choosing a particular representation. Admittedly,wheretheconversion is asignificant one (asinconverting integer tofloating- pointrepresentation)this doesn'tfeelsobad but it adds complexitytothecompileraswell asslowing bothcom- pilation and execution in trivial cases.

Iwillendthisdiscussion with an example of the complexity imposed on a compiler by the lack of regularity, orthogonality, and composability. Consider a simple statementsuchasA: = B*Cand supposewe arecompil- ing for amachine on which the operand of a multiply must be in an odd register. A simple compiler generally al- locates registers"onthefly"asit generatescode. In this example, suchastrategyappearstoworkwellenough;the allocatorrequiresonlyminorcomplexitytoknowabout even/oddregisters, and the code generatormustspecify its needsoneach request. Butinatriviallymorecomplex expression suchasA = (B +D)*C, the strategy breaks down. Addition canbe done in either an even or odd register; a simple"onthefly"allocationisaslikelytoget an evenregisteras anoddoneforB +D-and,of course, the choice ofan even onewill necessitateanextradata moveto implement themultiplicationby C. More ambitious compilersmustthereforeanalyze acompleteex- pressiontreebeforemakinganyallocations. Intreesin- volvingconflicting requirements, anassignmentmustbe found thatminimizesdatamovements.

Ambitiouscompilerscanevenhaveaproblemwith the simple assignentA: +B*C.Suchcompilersoftenemploy atechniquecalled "load/store motion" in whichtheyat- tempttomovevariables accessedfrequentlyinaprogram region into registers over that region; this tends to eliminate loads andstores.Thesimple assignmentabove suggeststhat it would begoodtohaveAallocatedtoan oddregister,since the entireright-handside could then be evaluated in A and another data move eliminated.

Whether this is desirable, however, depends on all the otherusesofAinthe programregionunder theregisterin which A resides. This involves more complex analysis than thatforsingleexpressions and, again,may

require

trade-offs inthenumber ofdatamovesneeded.

Note that such complexities arise in other contexts besideseven/oddregisterpairs.Atleastonemachinehas distinct accumulators and index registers plus a few elements thatcanbeusedaseither.Preciselythesamesort ofcompiler difficultiesarise indecidingwherethe result of anarithmetic expression or user variable should go;the

compilermustexamine alluses todeterminewhetherthe

result is used in an indexing context, an arithmetic context, orboth.

On one vs. all.Iamsure mostreaderswho are familiar with a number ofmachines can supply examples of violations of this principle. My favorite can be found in one of the newest machines-one with a generally excellent instruction set. This set includes reasonably complete boolean operations, but does not provide AND, instead providing only AND NOT. AND is commutative and associative, but AND NOT is not, so it requires a truly bewildering analysistodeterminewhich operand to com- plement and when to apply DeMorgan's law in order to generateanoptimalcode sequence.

Onprimitives vs. solutions. Mostpeoplewould agree thatPascal is more powerful than Fortran. The precise meaning of "more powerful" may be a bit unclear, but certainly anyalgorithmthan can becoded in Fortran can also be coded inPascal-but not conversely. However, this does not mean that itis easy, or even possible, to translate Fortran programs into Pascal. Features such as

A machine that attempts to support

al

implementation requirements will probably fail to support

any

of them efficiently.

COMMON andEQUIVALENCE arenotpresentin Pas- cal-andsome usesofthemare evenprohibited by that language. There is a "semantic clash" between the languages.

The same phenomenon can beobservedinmachine de- signs. Among the common higher-level languages one finds many different views ofessentiallysimilar concepts.

The detailed semantics associated with parameter passing, FOR statements, type conversions, and so on are often quite different. These differences can lead to significantly different implementation requirements. A machine that builds-ininstructions satisfying one set of these requirements cannot support other languages. A machine thatattemptstosupportalltherequirementswill probably fail to support any ofthem efficiently-and hencewill provoke additionalspecial-case analysisinthe compiler. Examples of the misplaced enthusiasm of machinedesignersinclude

* subroutine call instructions thatsupport, forexam-

ple, only

^some

parameter

passing

mechanisms,

* looping instructions that support only certain models of initialization, test and increment, and recomputation,

* addressingmodesthat presumecertainstackframe layouts-orevenpresumeparticularrepresentations of arrays,

* caseinstructions thatonlydoordon't

implicitly

test theboundaryconditions of thecaseindex and

only

doordon'tassumesuchboundsarestaticat

compile

time,

(6)

* instructions that support high-level data structures (such as queues) and makeassumptions that differ from some commonimplementations of these structures,and

* elaborate stringmanipulation.

In many ofthese cases, thehigh-levelinstructions are synthesizedfrommoreprimitive operations which,if the compiler writer couldaccessthem, could be recomposed tomoreclosely model the features actuallyneeded. An ideal solution to this problem would provideamodest arpount ofwritable microstore foruse bythe run-time system.Thecompilerwritercould then tailor the instruction settotheneeds ofaparticular language.

On addressing. Modern programming languagesper- mit the definition ofdata structures that are arbitrary compositionsofscalars,arrays,records,andpointers. At least in principle it is possible to define an array of records, a componentofwhich is a pointer to a record containing an array of arrays of yet another kind of record. Accessing a component atthe bottom level of this structureinvolvesalengthysequenceofoperations.Fur- ther, because ofthe interactions among

block

structure, recursive procedures, and "by reference" parameter passing, findingthe base address of such a structure can beequallycomplex-possibly involving indexingthrough the

"display,"

various sortsof"dope" (descriptor) information, and several levels of indirection

through

reference parameter values. Inpractice, of course, data structures areseldomthiscomplex,andmostvariablesac- cessedareeitherlocaltothecurrentprocedureorglobal tothe entireprogram-but thecompilermustbeprepared to handle the general case. And, surprisingly, even relatively simple constructs such as accessing an element of an array in the current stackframe can give risetomuch ofthis complexity.

Thealgorithmtoaccessanelement ofadata structure canbeviewedaswalkingapathfrom thecurrentinvoca- tion recordtotheelement;theinitial portionof thepath locatesthe base address of the structure via thedisplay, etc.,andthefinal portionis definedbythestructureitself.

Each step of thepath involves indirection (followinga pointer), computing^arecord element

displacement,

orin- dexing (byanarraysubscript)-allof which may involve multiplicationofasubscriptbythesize(inaddressunits) of the array component. For manylanguages,constraint checks onarraysubscriptsand nilpointersmustalso be

performed

alongthepath.

Itisclearly advantageoustousethe targetcomputer's effective addresscomputations toimplicitly perform as manyofthepathstepsaspossible.Unfortunately,most contemporary machines were designed with a simpler data structure model inmind. Rather than supporting stepsalongageneral path,these machines

generally

pro- videanadhoc collectionofindexing andindirectionthat canbeusedforonlyasubset of suchsteps-thereisno notion oftheircomposition.Forexample,

* oneofthemostpopular machines hasnoindirection atall, thus forcinganexplicit instruction eachtime a pointer^mustbefollowed;

* mostmachines that provide both indexing and in- directiondefine a fixed order-e.g., first dothein- dexing, then the indirection, although the opposite orderis just as common inpractice;

* somemodern machinesprovideimplicit multiplica- tion ofoneoperand of anindexingoperation, but only by the size of scalars (whereas ingeneral the ele- mentof an array may notbe ascalar); and

* manymachines limit the size of theliteral value in an indexing mode, thus forcing contorted code for largerdisplacements or for cases wherethe displace- mentisn't knownuntil link time.

Theaddressingmodes areusefulforsome, but neverall, ofthese cases. Worse, sometimes more than one mode can be used and the compiler must make a nontrivial choice.Again, the compilermustbemade more complex toexploit the hardware. It would be far better (and probably simpler in the hardware as well) to have a more generaland consistent model.

On environments. Ibelieve thatmostpeoplenowap- preciate hardware support for recursive procedure in- vocation,dynamic storage allocation, and synchroniza- tionandcommunication processes.Iwon't comment on this except to note thedanger of providing

such

support at toohigh a level and henceintroducing a semantic clash with some languages. Instead, I will point out some neglected areasof the supportenvironment thatinduce significant overheads in theimplementation of at least somelanguages.Theseinclude:

* Uninitialized variables. Manylanguages

define

a program to be erroneous if the value of a variable is fetched before being set. The codeand datastruc- tures tosupportthis are expensive. Yetat leastone old machine provided a way to check these "for free" by settingbadparity onuninitialized

variables.

* Constraintchecks.Manylanguagesspecify

that

certain properties of a value be checked before use.

Subscriptrange

checking

isaparticularinstance of this,buttherearemanyother casesaswell.Generally, machinesprovide no direct implementation ofthis, thus forcing explicit tests in the code and often eliminatingcleverusesof theeffectiveaddress hardware.

* Exceptions. Exceptions (analogous to PL/I's ON

conditions)

are a,part of manylanguages. Yet few machines support them, and an efficient software implementationofthemoften violates the hardware assumptions

of

thesupport

provided

for

procedures.

* Debugging support. Ifmost programmingis tobe doneinhigh-level languages,then it is axiomatic that debuggingshouldbeatthesame level. Becausemost machines donotsupportthis, however,thedesigner of the debugging system is usually facedwith two choices, both unpalatable.

The

firstistoforce the user todebugat alowlevel.

The

second isto create a specialdebuggingmode inwhichextracode

provides

the run-time debuggerwith the necessary information-thisin turn makes itdifficult to debug the pro- ductionversion ofasystem.

(7)

A note on stacks. A common

belief

is

that

allcompiler writers prefer a stackmachine.Iam anexceptiontothat belief-at leastinsofarasstacks forexpressionevaluation areconcerned. (Stackstosupport

environments

are a different matter.) It is certainly truethat there is a trivial mapping from parsetrees to postfix, and hence simple compilerscan bemadeevensimpleriftheir

target

hasan expression stack. For more ambitious compilers, however,stack machinespose almost all the same optimiza- tionproblems as register machines. A common subexpression is still a common subexpression.

A

loop in- variantexpression isstill invariant. Expression reorder- ing,doneon aregister machinetominimize the numberof registers used, also reducesthe depthoftheevaluation stack-thus increasingthelikelihood thatthe

entire

com- putation can be held in the fast top-of-stack registers.

Evenregister allocation hasa

counterpart-i.e.,

allocat- ingvariablestothe

stack

frameso that

the

mostfrequent- ly accessed ones can utilize short addressing forms on machinesthatprovidesuchafeature.Moreover,deciding whetheranoptimizationisdesirablecanbemoredifficult on a stackmachine.Onaregistermachine,forexample,it isalmostalways

desirable

tosavethevalueof acommon

subexpression,

whileonastack

machine

it isnecessaryto determinewhether theaddedcost ofstoringandretrieving thevalue is

offset

by thatvalue'suses.Thus,whileexpres- sionstacks are nice for simple compilers, theyareinno sense a solution.

Anote oninterpretersandwritablemicrocode.Anin- creasingly popular approach, especially in

the

micro- processor domain, is tomicroprograma

special-purpose

interpreter for a given

source

language. This has been doneextensivelyfor

Pascal,

forexample.Asnotedabove, ingeneralIprefer"pritnitives,not

solutions"

(especially thewrong

solutions);

taiIoring theinstructionsetthrough

microprogramming

isone waytoachieve that.

There areproblems with this

approach,

however.It implies that we must find waystoensure thatuser-written microcode cannot subvert operating system

protectioni.

Similarly, we must provide a means for

dynamically

associating

the

rightinterpreter withagiven process,

anq

this mayimply substantialcontext-swap overheads unless one is very

careful.

(I ammaking anassumption here.

Namely, I

believe

that

multiprogramming, multiproces-

sing, andprotectionwillallbecome commoninthemicro- processor

World-essentially

informs

evolved

fromwhat we seeincurrent large

systems.

Ialso believe that single- language systems arenot asolution; we mustassumea multilanguage environment.The

rationale

formybeliefs isnot

appropriate

tothisarticle, but the most recentan- nouncements from several chip manufacturers cor- roboratemy

views.)

Aboveall, the approachis not apanacea!Inatleastone caseIhaveexamined, thecode

produced

forsuch an in- terpreteris roughly two timesslower andlarger than it needs to be. Inanother case,a

sophisticated optimizing

compiler had to bebuilt just to get reasonable perfor- mance.

Both

the compiler writer and themachine designer have multiple objectives.

The

machine designer should

certainly

attend tothe concerns of

high-level-language

implementation-since

mostprogramswill bewrittenin one-but hemustalso attend to cost, reliability, com-

patibility,

customer

acceptance,

andso on.The

compiler

writer must

faithfully implement

the semantics ofthe

source

language,

^must

provide

^a

friendly interface

tothe user,andmusttrytoreacha

reasonab!e

compromisebe- tweenthe

speed

of

compilation

andthe

quality

of the^re-

sulting object

code.

Being realistic,

both

designer

and writer

know

that

they

canaffect

only

asubsetofthese

objectives

^oneachside.

However, they

alsoknowthat

they

canstill doagreatdealto

improve

the

impedence

match between

languages

and

machines

and so reduce total

costs.*

Acknowledgments

Thisresearchwassponsored

by

theDefense

Advanced

Research

Projects Agency

and monitored

by

the Air Force Avionics

Laboratory.

The viewsandconclusions

contained

in thisartfclearethose ofthe author and

should

not be

interpreted

as representing

the

official

policies,

either

expressed

^or

implied,

ofthe Defense Advanced Research

Projects Agency

ortheUS government.

William A. Wulf is a professor ofcom-

C

puter science at Carnegie-Mellon Uni-

; versity. PriortdjoiningCMU in1968,^he was aninstructorofappliedmathematics aidcomputerscienceattheUniversityof Virginia. His research interests span the fields traditionally called "programming systems" and

"computler

architecture."

He is especially interested in the construction oflargesystems, notably compilers andoperating systems,and inthewaytheconstructionof these systems interacts^withthearchitectureofthemachine on which

theyrun.Wulf'sresearchactivities haveincludedthedesignand implementation ofthe Bliss system

implemnpntation

language, participationinthe PDP-1 Idesign,constructionofC.mmp-a sixteen-processor multiprocessor computer,thedesignand im-

plementationoftheHydraoperating systemforC.mnp,thede- signof theAlphard programming language,andparticipationin thedevelopmentofAda,theDoDlanguageforembeddedcom-

puterapplications.

Wulf holds the BS in physics and the MSEE from theUniversi- tyof IllinoisandthePhDfromtheUniversity ofVirginia.