An examination of the relation between architecture and compiler design leads to several principles which can simplify compilers
and improve the object code they produce.
William A. Wulf Carnegie-Mellon University
Theinteractions between the designofacomputer's instructionsetandthedesignofcompilersthatgenerate code for that computer have serious
implications
for overall computational cost and efficiency. This article, which investigates those interactions, should ideallybe based oncomprehensive data; unfortunately, thereis a paucityof suchinformation. Andwhilethere is data on the use ofinstructionsets, therelationofthisdata to com- pilerdesignislacking. This is,therefore,
afranklyper- sonalstatement,but onewhichisbasedonextensiveex- perience.Mycolleagues andIare in themidstof aresearch effort aimed at automating the construction of production- qualitycompilers. (Tolimitthescopeofwhatisalready anambitiousproject,wehaveconsidered only algebraic languagesandconventional computers.)Inbrief, unlike manycompiler-compiler effortsofthepast, oursinvolves automatically generating all of the
phases
of a com- piler-including the optimization and code generation phases found inoptimizing compilers. The only inputto thisgenerationprocessis aformaldefinitionofthe source language and target computer. Theformulation of com- pilationalgorithms that,withsuitableparameters, areef- fective acrossabroad classofcomputerarchitectures has beenfundamentalto thisresearch.Inturn,findingthese algorithmshasledus tocritically examine many architec- tures andtheproblems theypose. Muchofthe opinion that followsisbased on ourexperiences intrying todo this,withnotes on thedifficultiesweencountered.Articles of this sortcommonly begin by observing that the costofhardware is
falling
rapidlywhile
thecost of software is rising. Theinevitable
conclusionis that we oughttofindways forthe hardwaretosimplifythe soft- ware task.Oneareainwhichthismightbedoneis in the designofinstruction sets that better reflecttheneeds of high-level languages. Better instructionsets would bothsimplify compilers
andimprove
thesizeandspeedofthe programsthey
generate.This observation and conclusion areabsolutely
correct. Treated toosimplistically,
how- ever,they
leadtomistakeninferences.Forinstance,manypeople
haveconcluded thatefficiency
ofthe objectcode fromacotnpilerisnolonger important-or,atleast,that itwillnotbeinthenearfuture. This is patently false. To suggestwhy
Ibelieveit isfalse andtolay thegroundwork
for much of whatfollows,
letmeillustratewith anexam-ple.
TodayI use atimesharingsystem that isfivetimes faster thantheoneI usedadecadeago;italsohaseight
times theprimary
memory and vastly moresecondary
storage.Yet,
it supports aboutthesamenumber-of peo- ple,and.response is worse. The reason for thisanomaly issimply
thatouraspirations
have grown much fasterthan technologyhas been ableto satisfy them. It now takes more cycles, more memory, more disk-more every- thing-to do what a typical user wants and expects. I believethis is agood thing;
despiteitsdegradedperfor-
mance, the system is more responsive to my overall needs-it's more humane. Nonetheless, there is apremium
ontheefficiency
oftheprogramsit executes.Thepastis alwaysoursafest predictor ofthefuture,at leastifweinterpret itproperly. Althoughhardwarecosts will continuetofalldramaticallyandmachine speeds will increase
equally
dramatically, wemustassumethat ouraspirations
willriseevenmore.Becauseof thiswearenotaboutto face eitheracycle ormemorysurplus. For the near-termfuture, the dominant effect will not be machine cost orspeed alone,
but
tatheracontinuingattempt toin- creasethereturnfrom
afiniteresource-that is, a par- ticularcomputeratourdisposal. Thisisn't asimplemat- ter of "yankee thrift." Given any system, people will want toimproveitandmakeit moreresponsive tohuman needs; inefficienciesthatthwartthoseaspirations willnot betolerated.I
I
The
costequation
Before discussing the mutual impact of compilers and target-machine architectures, we needtoclarifytheob- jectivesoftheexercise.A number ofcosts,andconversely benefits,areinvolved. Theyinclude
* designing(writing) compilers,
* designingthehardware architecture,
* designingthe hardwareimplementation ofthat ar-
chitecture,
* manufacturing the hardware (i.e., replicating theim- plementation),
* executingthecompiler, and
* executingthecompiledprograms.
The relative importance of these costs is, of course,
installation- and application-dependent-one cannot makestatementsabouttheirrelative importance without
morespecific information. Nonetheless, therearesome
generalobservations that bearonwhatfollows.
First, notethat only the fourth item, thecost ofthe replication of hardware, has decreased dramatically.
While standardized MSI and LSI chips haveworked to decrease thecostof hardwaredesign,theengineeringof completeprocessors on asingle chip has worked toin-
creaseitagain. Designing irregularstructuresatthechip level isveryexpensive.
Designing irregular structures at the chip level
isveryexpensive.
Second,all of thedesignactivities(compilers, architec- tures, andimplementations)areone-timecosts. Fromthe customer's perspective, they are amortized over the number ofunits sold. Thus,itmakes sensetodesignanar- chitecturemoreresponsiveto
compilation
problems only ifthe reduction in the cost of writing thecompiler, ex- ecutingit, orexecutingthe codegenerated byit,offsets the increase in thecostofdesign.Often this iseasily done, aswillbediscussedlater. However, we mustremember that it is stillmoreexpensivetodesignhardware thanto designsoftwarethat does thesamejob,unlessthe jobis verysimple.This explains why it doesn't make sense, for example,to movethe entirecompiler
intohardware.Third, both software and architectures have a life substantially longerthanthat of the hardwaretechnology for which they are initially implemented.
Despite
the predictable declineinhardware costs,toooften architec- tureshavebeendesigned
to caterto the anomalies of the technologyprevalentatthe timetheyweredesigned.Inef- fect, there has beenaconfusion between thesecond and thirdcostslisted above.Finally,the last twoitems-the costs ofcompilingand executingprograms-arenotstrictlycomparabletothose preceding them. Dollar costscanbeassignedtothese,of course. Often, however, the correct measure is not in termsofdollars but intermsofthingsthat cannot be done as aconsequenceof
compilation
orexecution inefficien- cies.Atypicalinstallationcanonlyoccasionallyacquirea newcomputer(andsuffer thetraumaofdoingso). Be-tweenacquisitions the available computer is a fixed and finiteresource. Hence inefficiencies in compilation and executipnmanifest themselves as decreased productivity, unavailable functionality, and similar costs. Although
difficult
tomeasure,these
costs are the first-order effects noticed by users once amachine hasbeen acquired.Some general principles
Principles that would improve the impedence match between compilers and a target computer have been followed in many contemporaryarchitectures:
* Regularity. If something is done in one way inone place, it ought to bedonethe same wayeverywhere.
This principle has also been
called
the "law of least astonishment"inthelanguage design community.* Orthogonality. It should be possible to divide the machinedefinition (or language definition) itito a set of separate concerns and define each in isolation fromthe others.Forexample,it
ought
to bepossible todiscuss data types, addressing, and instruction sets independently.* Composability. If theprinciples ofregularityand or- thogonality have been followed, then it
should
also bepossible tocomposetheorthogonqi,
regular no- tions inarbitrary ways. Itoughttobepossible, for example, to use every addressing mode with every operatorand every data type.However, these principles have not been completely followed, and the deviations from them present the com- piler writer withsomeofhis worstproblems.
I will have a few more principles to suggest below.
Before doing so, however, let me comment on the preceding ones.
Although many compiler optimizations are cast in terms of esoteric-sounding operations such as flow analysisandalgebraic
simplification-in
fact these tech- niquesaresimply aimedatperforming an enormous case analysis. The objective is to determine the best object code foragivensourceprogram. Becausethe caseanaly- sis is soenormous, the variousoptimizationalgorithms attempttogleaninformation and/orshape the intermedi- ateform(s)
ofthe program ina mannerthat allowsthefinai
caseanalysis and code selection to be donerapidly andthoroughly.Viewing acompiler's task as alargecaseanalysishelpsto explain whyregularity, orthogonality,and
cornposability
are soimportanttosimplifyingthe task.Every deviation
from
theseprinciples manifestsitselfasanad hoc caseto beconsidered. And, alas, theconsiderationofaspecial caseisnotlimited to thecode selectionprocess;because everypreparatoryphaserequirescertain information in- termediate representations, the manifestations of the anomalycreepbackthrough nearlyall of them. Iwill try toillustratethis withmoreexamplesin thenextsection.Forthemoment, however,consider the genre ofgeneral- register machines. Thenamesuggeststhat theregistersare all
"general"-i.e.,
thattheycanbeused for any of the purposes to which registers are put on the machine. In fact, however,almostno machineactually
treatsall theregisters uniformly-multiplicands must be in "even"
registers, or double precision operands must be in even- odd pairs, or a zeroinanindexingfield of an instruction denotes noindexing (and hence the zeroth register cannot be used forindexing),orsomeoperationscanbe performed only between registers while others can be performed only between a register and memory, or any of manyother variations. Each of thesedistinctions is a violation of one ormore oftheprinciples aboveandresults inadditional complexity.
Someadditional, more specific principlesare
* One vs.all. Thereshould beprecisely one way to do something,orall waysshould be possible.
* Provideprimitives, not solutions. It is far better to provide good primitives from which solutions to codegenerationproblemscanbesynthesizedthanto provide the solutionsthemselves.
Both ofthese also relatetothesimplicity(orcomplexity) of the compiler's case analysis. Consider the "one-vs.- all" principle. Either oftheseextreme
positions
implies that thecompiler need not do any case analysis. If, for ex- ample, the only conditional branchinginstructions are onesthat test forEQUALITY and LESS THAN,there is onlyone waytogeneratecode foreach ofthesixrelations.Alternatively,ifthere is a directimplementationofallsix relations, there isanobviouscoding for each.Difficulties arise, however,ifonlythreeorfourofthe sixareprovided.
Forexample, the compilermustdecidewhether,bycom- muting operands, there is a cheaper implementation of some oftheremaining relations. Unfortunately, this is notasimple decision-it mayimplydetermining whether side-effectsemantics areviolated,whether there isanin- teraction withregisterallocation, andsoon.
Nowconsiderthe "provideprimitives, notsolutions"
principle.
InwhatIbelievetobeanhonest attempttohelp compilerwriters, somemodern architectures have pro- vided directimplementationsofhigh-levelconceptssuch asFORand CASEstatementsandPROCEDURE calls.Inmany,ifnotmostcases,these turn outtobe more trou- blethantheyareworth.Invariably theyeither supporton- ly onelanguage well, or are so general that they are ineffi- cientforspecialcases-thusforcingthecompilertoper- formeven moreanalysis to discover thecommoncases wherea moreefficient implementation ispossible. The problem arises from a "semantic clash" between the languageandhigh-levelinstructions;bygiving too much semanticcontent totheinstruction,the machine
designer
hasmadeitpossibletousetheinstructiononly in limited contexts.Again, Iwill trytoillustrate theprobleminthe followingsection.The last three
principles
are even moreblatantly
my opinionthan those listed earlier:* Addressing. Address computations are paths! Ad- dressingisnotlimitedtosimplearrayand recordac- cesses!The
addressing
modes ofamachine should be designedtorecognizethesefacts.* Environmentsupport.Allmodern architectures sup- portarithmetic andlogical computations reasonably well. Theydonotdonearlyaswell in
supporting
the run-timeenvironmentsforprograms-stackframes, displays orstatic/dynamic links, exceptions,
pro-cesses, and so on. Thewritershould provide such run-time support.
* Deviations. The writer should deviate from these principles only in ways that are implementation-
independent.
The firsttwo ofthese principles, addressing and en- vironment support, are among the most difficult to deal with-and among the most likely to run afoul of the earlier "primitives, not solutions" principle. The con- scientious designer must remember that different lan- guagesimposedifferent constraintsonthenotions of pro- cedures,tasks, and exceptions. Even things as mundane as caseanditerationstatementsand arrayrepresentations
aredictatedbythe semantics of thelanguageand may dif- fer fromonelanguagetoanother.
I will have more to say about these pointslater, but let meillustratesomeof the issues with theproblem of ad- dressing.Inmyexperience,effectiveuseofimplicit addres- singisthemostimportant aspect of generating good code.
Itisoften themostdifficulttoachieve.Ingeneral,access- ing adatum fromadata structureinvolves following a pathwhoselengthisarbitrary (but is knownatcompile time).Each stepalongthepath isanaddition(indexingin- toarrays orrecords)oranindirection (throughpointers ofvarious sorts). Typical computers supply a fixed and finite menu-a collection of special cases-of suchpath
Some architectures have provided direct implementations of high-level
concepts.In
many cases
these turn
outto be
moretrouble than they
areworth.
steps. Thecompiler's problem is how to choose effective- ly from among this menu; I know of notechnique for doing this exceptexhaustivespecial-case analysis. Even when the target computer'saddressing modes are well- suited to the most commoncases, its compiler remains complex since itmustbereadytohandle thegeneralcase.
Thelastprinciple is, I suppose, more in thenatureof a plea. At any time there aretechnological anomalies that canbeexploitedtomakeamachine faster orcheaper. It is tempting toallowthese factors to influence the architec- ture-manyexamples abound. It takes the greatest re- straint to lookbeyond thecurrentstateoftechnology,to realizethat thearchitecturewilloutlivethatstate,andto design for the future. I think thatmostof the violations of notions likeorthogonality andregularitycanbe tracedto ashortsighted view of costs. Adheringto theprinciples presented here has ameasurablehardwarecost tobesure, but one which decreases exponentially as technology changes.
Kudos and gripes
From the compiler writer's perspective, various ma- chines have bothgoodand badfeatures which illustrate- or violate-the principles discussed above. This is not
meant to be anexhaustive list of such features, nor is it in- tended to criticize particular computers or manufac- turers.
On regularity. Asacompiler writer,Imustapplaud the trend in many recent machinestoallow eachinrstruction operandtobespecified by any of the addressing modes.
Theability of thecompiler to treat registers and memory aswell as source and destination symmetrically is an ex- cellentexampleof the benefits ofregularity.Thecompiler issimpler and theobject code is better.
Not all aspects ofmodernmachines have been designed regularly, however. Most machines support several data types(includingfixed and floating-point forms), several typesof words (essentially boolean vectors), and several typesofaddresses-often with several sizes of each and sometimes with variations suchas signed and unsigned andnormalized orunnormalized.Itis rare for the opera- tors onthesetypes tobedefined regularly, even when it would make sense for them to be. A compiler, for exam- ple, wouldlike to represent small integers as addresses or bytes. Yet, onemachineprovidesacompletesetofbyte
The familiar arithmetic
shift instructions areanother
example ofirregularity.
operations thataresymmetricwith word(integer)opera- tions except that the ADD and SUBbytesaremissing, and another defines thesettingofcondition codesslightlydif- ferentlyforbyteoperationsthan for full-wordoperations.
Such differences preventsimplecompilers from using the obvious byte representations. In more ambitious com- pilers, substantial analysis must be done to determine whetherthedifferencesmatter in the particular program beingcompiled.
The familiar arithmetic shift instructionsareanother example of irregularity. I trust everyone realizes that arithmetic-right-shiftisnot adivisionbyapoweroftwo on mostmachines.
A particularly annoying violation ofregularityarises from the instructions of machinesthat makespecial pro- visionfor"immediate mode" arithmetic.Weknowfrom analysisofsourceprogramsthatcertain constants,not- ably0, ±1,and the number ofbytesperword,appearfre- quently. It makesgoodsense toprovidespecial handling for them inaninstruction set. Yet it seems that many machine designers feel that suchinstructionsare useful onlyinformingaddresses-oratleast thattheyneednot haveeffects identicaltotheirmoreexpensive equivalents.
Manifestations include
* condition codes thatare notsetin thesameway,
* carries that donotpropagatebeyondthe size ofan
"address," and
* operationsthatarerestrictedtooperateon aselected setof"index
registers."
In
practice,
of course, i=i+1 (and itsimplicit counter- part initeration statements) isoneof themostcommonsource programstatements. Irregularities such as those above precludesimple compilersfrom using immediate mode arithmetic for these common cases and add substantial complexitytomoreambitiouscompilers.
Similar remarks apply to the floating-point instruc- tions of many machines. In addition to providing an even morerestricted set of
operations,
these machines often failtosupporttheabstractionof"real" numbersintend- edby many higher-level languages. (Someone once ob- served that the best characterization of the floating-point hardware of most machines is as "unreal" numbers!) Mostlanguagedesignersandpr'ogrammers
want to think ofreal numbers as anabstraction of real arithmetic (ex- ceptfortheir finite precision)-they would like, for exam- ple, for them to becommutative and associative. Floating- point representation isoftenneither, andhencethe com- piler writer is constrained from exploiting some obvious optimizations. The cost is both increased compiler com- plexity and slowerprograms-and the sad thing is that the caseswhere theoptimizationsareillegalare rarein prac- tice.On orthogonality. ByorthogonalityImeantheability todefine separate concerns-such as addressing, opera- tions, and datatypes-separately. This property is closely allied withregularity andcomposability. Thefailure of general-register machinesto treatall theirregisters alike could becharacterizedasafailure of any of these proper- ties. Several machinescontainbothlong and short forms ofbranch instructions, for example, in which theshort formis takenas aconstant displacementrelativetothe program counterand is anaddressing mode notavailable in anyother kind ofinstruction. Some machines include instructions whose effectdepends on theaddressing mode used.Forexample,onsomemachinessignextension is (or is not) donedepending on thedestination location.
Some
othermachinescreatelongorshort forms of theresultof amultiplication, depending ontheeven-oddness of the destinationregister. Some of themostpopularmachines contain different instructionsetsforregister-to-register, memory-to-register, and memory-to-memoryoperations (andworse,theseinstructionsetspartially-butnotcom- pletely-overlap).Itshouldbeclear that thecompilershouldperforma separateanalysisforeachofthesecases.Andunlikesome of myprevious examples,this requirement shouldapply tosimpleaswellasambitiouscompilers.
Oncomposability.From thecompilerwriter's perspec- tive,the ideal machinewould,amongotherthings,make available the full cross-product of operations and ad- dressingmodesonsimilardata types. The ADD instruc- tion should have identical effects whether it addsliterals, bytes,orwords, forexample. Moreover, anyaddressing mode available inonevariantshould be available in
the
others.More germanetothe presentpointis the notion ofcon- version. Many machines failbadly in this respect. Forex- ample,mostonly providerelational operators(i.e.,con- ditional branch instructions) that affect control
flow,
while source languages generally allow relational expres- sionstoappearincontextswhereabooleanvaluemustbemademanifest (e.g., allow them tobestored into a user's booleanvariable). In addition, many machines donot provide for conversion between data types such as integer andfloatingpoint, nordotheyprovide aconversion that differsfrom thatspecified in source languages.
The root of theproblemlies in the fact that program- ming languages view type as a property of data (or variables), while machines view type as a property of operators. Because the number ofmachine data types is moderately large, the number of operation codes needed toimplement afull cross-product isunreasonable. For ex- ample, it is usuallyimpossible toadd a byteto aword without firstconverting the byte to full-word form. The need for suchexplicit conversions makes it difficult to determine when overall cost is reduced by choosing a par- ticular representation. Admittedly,wheretheconversion is asignificant one (asinconverting integer tofloating- pointrepresentation)this doesn'tfeelsobad but it adds complexitytothecompileraswell asslowing bothcom- pilation and execution in trivial cases.
Iwillendthisdiscussion with an example of the com- plexity imposed on a compiler by the lack of regularity, orthogonality, and composability. Consider a simple statementsuchasA: = B*Cand supposewe arecompil- ing for amachine on which the operand of a multiply must be in an odd register. A simple compiler generally al- locates registers"onthefly"asit generatescode. In this example, suchastrategyappearstoworkwellenough;the allocatorrequiresonlyminorcomplexitytoknowabout even/oddregisters, and the code generatormustspecify its needsoneach request. Butinatriviallymorecomplex expression suchasA = (B +D)*C, the strategy breaks down. Addition canbe done in either an even or odd register; a simple"onthefly"allocationisaslikelytoget an evenregisteras anoddoneforB +D-and,of course, the choice ofan even onewill necessitateanextradata moveto implement themultiplicationby C. More am- bitious compilersmustthereforeanalyze acompleteex- pressiontreebeforemakinganyallocations. Intreesin- volvingconflicting requirements, anassignmentmustbe found thatminimizesdatamovements.
Ambitiouscompilerscanevenhaveaproblemwith the simple assignentA: +B*C.Suchcompilersoftenemploy atechniquecalled "load/store motion" in whichtheyat- tempttomovevariables accessedfrequentlyinaprogram region into registers over that region; this tends to eliminate loads andstores.Thesimple assignmentabove suggeststhat it would begoodtohaveAallocatedtoan oddregister,since the entireright-handside could then be evaluated in A and another data move eliminated.
Whether this is desirable, however, depends on all the otherusesofAinthe programregionunder theregisterin which A resides. This involves more complex analysis than thatforsingleexpressions and, again,may
require
trade-offs inthenumber ofdatamovesneeded.Note that such complexities arise in other contexts besideseven/oddregisterpairs.Atleastonemachinehas distinct accumulators and index registers plus a few elements thatcanbeusedaseither.Preciselythesamesort ofcompiler difficultiesarise indecidingwherethe result of anarithmetic expression or user variable should go;the
compilermustexamine alluses todeterminewhetherthe
result is used in an indexing context, an arithmetic con- text, orboth.
On one vs. all.Iamsure mostreaderswho are familiar with a number ofmachines can supply examples of viola- tions of this principle. My favorite can be found in one of the newest machines-one with a generally excellent in- struction set. This set includes reasonably complete boolean operations, but does not provide AND, instead providing only AND NOT. AND is commutative and associative, but AND NOT is not, so it requires a truly bewildering analysistodeterminewhich operand to com- plement and when to apply DeMorgan's law in order to generateanoptimalcode sequence.
Onprimitives vs. solutions. Mostpeoplewould agree thatPascal is more powerful than Fortran. The precise meaning of "more powerful" may be a bit unclear, but certainly anyalgorithmthan can becoded in Fortran can also be coded inPascal-but not conversely. However, this does not mean that itis easy, or even possible, to translate Fortran programs into Pascal. Features such as
A machine that attempts to support
alimplementation requirements will probably fail to support
anyof them efficiently.
COMMON andEQUIVALENCE arenotpresentin Pas- cal-andsome usesofthemare evenprohibited by that language. There is a "semantic clash" between the lan- guages.
The same phenomenon can beobservedinmachine de- signs. Among the common higher-level languages one finds many different views ofessentiallysimilar concepts.
The detailed semantics associated with parameter pass- ing, FOR statements, type conversions, and so on are often quite different. These differences can lead to significantly different implementation requirements. A machine that builds-ininstructions satisfying one set of these requirements cannot support other languages. A machine thatattemptstosupportalltherequirementswill probably fail to support any ofthem efficiently-and hencewill provoke additionalspecial-case analysisinthe compiler. Examples of the misplaced enthusiasm of machinedesignersinclude
* subroutine call instructions thatsupport, forexam-
ple, only
someparameter
passingmechanisms,
* looping instructions that support only certain models of initialization, test and increment, and recomputation,
* addressingmodesthat presumecertainstackframe layouts-orevenpresumeparticularrepresentations of arrays,
* caseinstructions thatonlydoordon't
implicitly
test theboundaryconditions of thecaseindex andonly
doordon'tassumesuchboundsarestaticatcompile
time,* instructions that support high-level data structures (such as queues) and makeassumptions that differ from some commonimplementations of these struc- tures,and
* elaborate stringmanipulation.
In many ofthese cases, thehigh-levelinstructions are synthesizedfrommoreprimitive operations which,if the compiler writer couldaccessthem, could be recomposed tomoreclosely model the features actuallyneeded. An ideal solution to this problem would provideamodest arpount ofwritable microstore foruse bythe run-time system.Thecompilerwritercould then tailor the instruc- tion settotheneeds ofaparticular language.
On addressing. Modern programming languagesper- mit the definition ofdata structures that are arbitrary compositionsofscalars,arrays,records,andpointers. At least in principle it is possible to define an array of records, a componentofwhich is a pointer to a record containing an array of arrays of yet another kind of record. Accessing a component atthe bottom level of this structureinvolvesalengthysequenceofoperations.Fur- ther, because ofthe interactions among
block
structure, recursive procedures, and "by reference" parameter passing, findingthe base address of such a structure can beequallycomplex-possibly involving indexingthrough the"display,"
various sortsof"dope" (descriptor) in- formation, and several levels of indirectionthrough
reference parameter values. Inpractice, of course, data structures areseldomthiscomplex,andmostvariablesac- cessedareeitherlocaltothecurrentprocedureorglobal tothe entireprogram-but thecompilermustbeprepared to handle the general case. And, surprisingly, even relatively simple constructs such as accessing an element of an array in the current stackframe can give risetomuch ofthis complexity.Thealgorithmtoaccessanelement ofadata structure canbeviewedaswalkingapathfrom thecurrentinvoca- tion recordtotheelement;theinitial portionof thepath locatesthe base address of the structure via thedisplay, etc.,andthefinal portionis definedbythestructureitself.
Each step of thepath involves indirection (followinga pointer), computingarecord element
displacement,
orin- dexing (byanarraysubscript)-allof which may involve multiplicationofasubscriptbythesize(inaddressunits) of the array component. For manylanguages,constraint checks onarraysubscriptsand nilpointersmustalso beperformed
alongthepath.Itisclearly advantageoustousethe targetcomputer's effective addresscomputations toimplicitly perform as manyofthepathstepsaspossible.Unfortunately,most contemporary machines were designed with a simpler data structure model inmind. Rather than supporting stepsalongageneral path,these machines
generally
pro- videanadhoc collectionofindexing andindirectionthat canbeusedforonlyasubset of suchsteps-thereisno no- tion oftheircomposition.Forexample,* oneofthemostpopular machines hasnoindirection atall, thus forcinganexplicit instruction eachtime a pointermustbefollowed;
* mostmachines that provide both indexing and in- directiondefine a fixed order-e.g., first dothein- dexing, then the indirection, although the opposite orderis just as common inpractice;
* somemodern machinesprovideimplicit multiplica- tion ofoneoperand of anindexingoperation, but only by the size of scalars (whereas ingeneral the ele- mentof an array may notbe ascalar); and
* manymachines limit the size of theliteral value in an indexing mode, thus forcing contorted code for largerdisplacements or for cases wherethe displace- mentisn't knownuntil link time.
Theaddressingmodes areusefulforsome, but neverall, ofthese cases. Worse, sometimes more than one mode can be used and the compiler must make a nontrivial choice.Again, the compilermustbemade more complex toexploit the hardware. It would be far better (and prob- ably simpler in the hardware as well) to have a more generaland consistent model.
On environments. Ibelieve thatmostpeoplenowap- preciate hardware support for recursive procedure in- vocation,dynamic storage allocation, and synchroniza- tionandcommunication processes.Iwon't comment on this except to note thedanger of providing
such
support at toohigh a level and henceintroducing a semantic clash with some languages. Instead, I will point out some neglected areasof the supportenvironment thatinduce significant overheads in theimplementation of at least somelanguages.Theseinclude:* Uninitialized variables. Manylanguages
define
a pro- gram to be erroneous if the value of a variable is fetched before being set. The codeand datastruc- tures tosupportthis are expensive. Yetat leastone old machine provided a way to check these "for free" by settingbadparity onuninitializedvariables.
* Constraintchecks.Manylanguagesspecify
that
cer- tain properties of a value be checked before use.Subscriptrange
checking
isaparticularinstance of this,buttherearemanyother casesaswell.Generally, machinesprovide no direct implementation ofthis, thus forcing explicit tests in the code and often eliminatingcleverusesof theeffectiveaddress hard- ware.* Exceptions. Exceptions (analogous to PL/I's ON
conditions)
are a,part of manylanguages. Yet few machines support them, and an efficient software implementationofthemoften violates the hardware assumptionsof
thesupportprovided
forprocedures.
* Debugging support. Ifmost programmingis tobe doneinhigh-level languages,then it is axiomatic that debuggingshouldbeatthesame level. Becausemost machines donotsupportthis, however,thedesigner of the debugging system is usually facedwith two choices, both unpalatable.
The
firstistoforce the user todebugat alowlevel.The
second isto create a specialdebuggingmode inwhichextracodeprovides
the run-time debuggerwith the necessary informa- tion-thisin turn makes itdifficult to debug the pro- ductionversion ofasystem.A note on stacks. A common
belief
isthat
allcompiler writers prefer a stackmachine.Iam anexceptiontothat belief-at leastinsofarasstacks forexpressionevaluation areconcerned. (Stackstosupportenvironments
are a dif- ferent matter.) It is certainly truethat there is a trivial mapping from parsetrees to postfix, and hence simple compilerscan bemadeevensimpleriftheirtarget
hasan expression stack. For more ambitious compilers, how- ever,stack machinespose almost all the same optimiza- tionproblems as register machines. A common subex- pression is still a common subexpression.A
loop in- variantexpression isstill invariant. Expression reorder- ing,doneon aregister machinetominimize the numberof registers used, also reducesthe depthoftheevaluation stack-thus increasingthelikelihood thattheentire
com- putation can be held in the fast top-of-stack registers.Evenregister allocation hasa
counterpart-i.e.,
allocat- ingvariablestothestack
frameso thatthe
mostfrequent- ly accessed ones can utilize short addressing forms on machinesthatprovidesuchafeature.Moreover,deciding whetheranoptimizationisdesirablecanbemoredifficult on a stackmachine.Onaregistermachine,forexample,it isalmostalwaysdesirable
tosavethevalueof acommonsubexpression,
whileonastackmachine
it isnecessaryto determinewhether theaddedcost ofstoringandretrieving thevalue isoffset
by thatvalue'suses.Thus,whileexpres- sionstacks are nice for simple compilers, theyareinno sense a solution.Anote oninterpretersandwritablemicrocode.Anin- creasingly popular approach, especially in
the
micro- processor domain, is tomicroprogramaspecial-purpose
interpreter for a givensource
language. This has been doneextensivelyforPascal,
forexample.Asnotedabove, ingeneralIprefer"pritnitives,notsolutions"
(especially thewrongsolutions);
taiIoring theinstructionsetthroughmicroprogramming
isone waytoachieve that.There areproblems with this
approach,
however.It im- plies that we must find waystoensure thatuser-written microcode cannot subvert operating systemprotectioni.
Similarly, we must provide a means for
dynamically
associatingthe
rightinterpreter withagiven process,anq
this mayimply substantialcontext-swap overheads unless one is very
careful.
(I ammaking anassumption here.Namely, I
believe
thatmultiprogramming, multiproces-
sing, andprotectionwillallbecome commoninthemicro- processorWorld-essentially
informsevolved
fromwhat we seeincurrent largesystems.
Ialso believe that single- language systems arenot asolution; we mustassumea multilanguage environment.Therationale
formybeliefs isnotappropriate
tothisarticle, but the most recentan- nouncements from several chip manufacturers cor- roboratemyviews.)
Aboveall, the approachis not apanacea!Inatleastone caseIhaveexamined, thecode
produced
forsuch an in- terpreteris roughly two timesslower andlarger than it needs to be. Inanother case,asophisticated optimizing
compiler had to bebuilt just to get reasonable perfor- mance.Both
the compiler writer and themachine designer have multiple objectives.The
machine designer shouldcertainly
attend tothe concerns ofhigh-level-language
implementation-since
mostprogramswill bewrittenin one-but hemustalso attend to cost, reliability, com-patibility,
customeracceptance,
andso on.Thecompiler
writer must
faithfully implement
the semantics ofthesource
language,
mustprovide
afriendly interface
tothe user,andmusttrytoreachareasonab!e
compromisebe- tweenthespeed
ofcompilation
andthequality
of there-sulting object
code.Being realistic,
bothdesigner
and writerknow
thatthey
canaffectonly
asubsetoftheseob- jectives
oneachside.However, they
alsoknowthatthey
canstill doagreatdealto
improve
theimpedence
match betweenlanguages
andmachines
and so reduce totalcosts.*
Acknowledgments
Thisresearchwassponsored
by
theDefenseAdvanced
ResearchProjects Agency
and monitoredby
the Air Force AvionicsLaboratory.
The viewsandconclusionscontained
in thisartfclearethose ofthe author andshould
not be
interpreted
as representingthe
officialpolicies,
eitherexpressed
orimplied,
ofthe Defense Advanced ResearchProjects Agency
ortheUS government.William A. Wulf is a professor ofcom-
C
puter science at Carnegie-Mellon Uni-; versity. PriortdjoiningCMU in1968,he was aninstructorofappliedmathematics aidcomputerscienceattheUniversityof Virginia. His research interests span the fields traditionally called "programming systems" and
"computler
architecture."He is especially interested in the construc- tion oflargesystems, notably compilers andoperating systems,and inthewaytheconstructionof these systems interactswiththearchitectureofthemachine on which
theyrun.Wulf'sresearchactivities haveincludedthedesignand implementation ofthe Bliss system
implemnpntation
language, participationinthe PDP-1 Idesign,constructionofC.mmp-a sixteen-processor multiprocessor computer,thedesignand im-plementationoftheHydraoperating systemforC.mnp,thede- signof theAlphard programming language,andparticipationin thedevelopmentofAda,theDoDlanguageforembeddedcom-
puterapplications.
Wulf holds the BS in physics and the MSEE from theUniversi- tyof IllinoisandthePhDfromtheUniversity ofVirginia.