Software Fault Tolerane
Sorav Bansal
CSE Department, IITDelhi
sbansalse.iitd.ernet.in
Cloudomputingenvironmentsrelyheavilyonthe
virtualizationlayertoprovideresoureonsolidation,
dynami load balaning, hardware fault tolerane,
mobility, and homogeneity. While the urrent gen-
erationof virtualization enginesare extremelygood
atwhat theydo,webelievethatavirtualizationen-
gine ould do more. We are interested in looking
at thevirtualizationlayerasaplatform forimprov-
ingperformaneofthesoftwarestakrunningabove
it,enhaningsoftwareseurity,andprovidingbetter
faulttolerane.
AtIITDelhi,wearedevelopingavirtualizationlayer
for x86 from grounds-up to better address these is-
sues. While developing our virtualization layer, we
mademanydesigndeisionstomakeiteasiertoadd
dynami optimization rules ([3℄, [4℄), allow the ad-
dition of ongurable low-overhead seurity heks
([7℄),and havemehanismsto allowsoftware to tol-
erateertainlassesofbugs([6℄). Ourvirtualization
engineruns diretlyonhardware(bare-metalhyper-
visor) and runs an unmodied guest operating sys-
tem aboveit. Weemploy binarytranslation to vir-
tualize the guest operating system, and do notrely
on hardware support for virtualization available on
reent proessors(Intel VT, AMD-V). We hose to
implementabinarytranslator,insteadofatrap-and-
emulate stylehypervisorto havegreaterontroland
visibilityoftheguestsoftware,with minimalperfor-
mane loss ([2℄). We shall disuss our experienes
in developing our binary translation based hyper-
visor, and present some early performane results.
Wedesribe,howweintendtouseourvirtualization
layerfor improving performane, seurity, and soft-
ware fault tolerane of appliations running on the
loud.
Performane Optimizations
Runtime optimization of ode has been shown to
be eetive in software for stand-alone programs
(e.g., Dynamo[3℄) and in hardware (e.g., Transmeta
Crusoe[8℄). However,neitheroftheseapproahesnd
popularuseintoday'ssystems. Therearetwomajor
problems with usingadynami optimizationengine
attheappliationlayer:
1. The optimizations are limited in sope only to
theappliation beingoptimized.
2. The optimizer needs to be aware of many OS-
spei strutures and libraries to be able to
seamlessly support all appliations, whih in-
reasesit'somplexitysigniantly. Thisisper-
haps the single-most signiant hurdle towards
wideadoptionofthistehnique.
Similarly,usingadynamioptimizeratthehardware
level suers from the obvious problem of having a
very tiny view of the instrution stream, making it
hardtoobtainsigniantperformaneimprovement.
Wearguethat using adynami optimizationengine
atthevirtualizationlayerombinesthebestofboth
worlds:
1. Beauseavirtualizationlayerprovidesthesmall
andunbreakableabstration ofthe mahine ar-
hiteture,theomplexityofanoptimizationen-
gine inside a virtualization layer is muh more
bottom of the software stak, it provides the
optimizer with a muh wider view of the en-
tire software stak running above it. A dy-
namioptimizerannowperformross-layerop-
timizations(e.g.,optimizationstransendingOS
boundaries)whihwerenotpossiblewithanop-
timizerforastand-aloneappliation.
We have started with implementing peephole op-
timizations in our virtualization engine ([5℄), and
plantoimplementross-layertraeoptimizationsand
runtime valuespeializations ([3℄,[9℄) innear future.
Ourinitialperformanemeasurementsareenourag-
ing,andompute-intensiveappliationsrunat near-
nativespeedsonthe hypervisor. Throughthese op-
timizations, we aim to reah asituation of negative
overhead, where oneould reasonably expet appli-
ationstorunfaster invirtualenvironmentsthanin
realenvironments.
Seurity
Thesmallfootprintofavirtualizationlayer(ourur-
rent hypervisor is only 50K lines of ode) makes it
easiertohekandverifyforseurityholes. Tokeep
itsmall, weavoidtheneedtoimplementallthedif-
ferent devie drivers by passing-through devies to
theguestOS.
Thesmall size ofthe virtualizationengineallowsus
tomakeitfairlyrobustanderror-freeandhenepro-
videsagoodplaeto enforeseuritypoliiesonthe
guestOSandit'sappliations. Typialseuritypoli-
ies that are either implemented in OS (e.g., SFI
[10℄) or in hardware (e.g., NX-bit [1℄) an be more
eiently andgenerallyimplementedatthevirtual-
ization layer. Wehaveadded support in ourbinary
translatortoallowtheusertospeifyarbitraryseu-
ritypoliieswhihareenforeableatinstrution-level
granularity.
Software Fault Tolerane
We will disuss, how it is possible to tolerate some
lasses of probabilisti software faults (suh asrae
onditions) in software by using a partial reord-
in ourhypervisor, and will disuss it's performane
impliations.
Referenes
[1℄ IA-32 Intel Arhiteture Software Developer's Manual
Volume3A:SystemProgrammingGuide.Teh.rep.,Intel
Corp.
[2℄ Adams, K., and Agesen, O. A omparison of soft-
wareandhardwaretehniquesforx86virtualization. In
ASPLOS-XII:Proeedingsofthe12thinternationalon-
ferene on Arhitetural support for programming lan-
guages and operating systems (New York, NY, USA,
2006),ACM,pp.213.
[3℄ Bala, V., Duesterwald, E., andBanerjia, S. Dy-
namo:atransparentdynamioptimizationsystem.ACM
SIGPLANNoties35,5(2000),112.
[4℄ Bansal, S., and Aiken, A. Binary translation using
peepholesuperoptimizers. InOSDI (2008),pp.177192.
[5℄ Bansal, S., and Aiken, A. Automatigeneration of
peepholesuperoptimizers. InProeedingsofthe12thIn-
ternationalConfereneonArhiteturalSupportforPro-
gramming Languages and Operating Systems (Otober
21-25,2006),pp.394403.
[6℄ Bartlett,W.,Soiety, I. C., andSpainhower, L.
Commerialfaulttolerane: Ataleoftwosystems.IEEE
Transations on Dependable and Seure Computing 1
(2004),2004.
[7℄ Bruening, D. Eient, Transparent and Comprehen-
sive Runtime Code Manipulation. PhD thesis, Mas-
sahusettsInstituteofTehnology,2004.
[8℄ Dehnert, J. C., Grant, B. K., Banning, J. P.,
Johnson, R.,Kistler, T., Klaiber,E.,and Matt-
son, J. Thetransmetaodemorphingsoftware: Using
speulation, reovery, and adaptiveretranslation to ad-
dressreal-lifehallenges.IEEEComputerSoiety,pp.15
24.
[9℄ Poletto,M.,Engler,D.R.,andKaashoek,M.F.
t: a systemfor fast, exible, and high-level dynami
odegeneration. SIGPLANNot.32,5(1997),109121.
[10℄ Wahbe,R.,Luo,S., Anderson,T.E.,andGra-
ham,S.L.Eientsoftware-basedfaultisolation.InIn
Proeedings of the14th ACM Symposium on Operating
SystemsPriniples (1993),pp.203216.