Dependability
in the
2
Outline
• The glorious past (Availability Progress)
• The dark ages (current scene)
Preview
The Last 5 Years: Availability Dark Ages
Ready for a Renaissance?
• Things got better, then things got a lot worse!
9%
99%
99.9%
99.99%
99.999%
99.999%
Com
put
er S
yste
ms
Telephone Systems
Cell
phones
Internet
A
va
ila
bi
lit
4
DEPENDABILITY: The 3 ITIES
•
RELIABILITY / INTEGRITY:
Does the right thing.
(also MTTF>>1)
•
AVAILABILITY:
Does it now
.
(also 1 >> MTTR )
MTTF+MTTR
System Availability:
If 90% of terminals up & 99% of DB up?
(=>89% of transactions are serviced on time
).
•
Holistic vs. Reductionist view
Security
Integrity
Reliability
Fail-Fast is Good, Repair is Needed
Improving either MTTR or MTTF gives benefit
Simple redundancy does not help much.
Fault
Detect
Repair
Return
Lifecycle of a module
Lifecycle of a module
fail-fast gives
fail-fast gives
short fault latency
short fault latency
High Availability
High Availability
is low UN-Availability
is low UN-Availability
Unavailability ~
6
Fault Model
• Failures are independent
So, single fault tolerance is a big win
• Hardware fails fast
(dead disk, blue-screen)
• Software fails-fast (or goes to sleep)
• Software often repaired by reboot:
– Heisenbugs
• Operations tasks: major source of outage
Disks (raid) the BIG Success Story
• Duplex or Parity: masks faults
• Disks @ 1M hours (~100 years)
• But
– controllers fail and
– have 1,000s of disks.
• Duplexing or parity, and dual path gives “perfect
disks”
• Wal-Mart never lost a byte
(thousands of disks, hundreds of failures).
8
Fault Tolerance vs Disaster Tolerance
•
Fault-Tolerance:
mask local faults
– RAID disks
– Uninterruptible Power Supplies
– Cluster Failover
•
Disaster Tolerance:
masks site failures
– Protects against fire, flood, sabotage,..
– Redundant system and service
Case Study - Japan
"Survey on Computer Security", Japan Info Dev Corp., March 1986. (trans: Eiichi
Watanabe).
Vendor
(hardware and software)
5 Months
Application software
9 Months
Communications lines
1.5 Years
Operations
2 Years
Environment
2 Years
10 Weeks
1,383 institutions reported (6/84 - 7/85)
7,517 outages, MTTF ~ 10 weeks, avg duration ~ 90 MINUTES
To Get 10 Year MTTF, Must Attack All These Areas
42%
12 %
25%
9.3%
11.2
%
Vendor
Environment
Operations
Application
Software
Tele Comm
10
Case Studies - Tandem Trends
MTTF improved
Shift from Hardware & Maintenance to from 50% to 10%
to
Software (62%) & Operations (15%)
NOTE: Systematic under-reporting of
Environment
Operations errors
Application Software
unknown
environment
operations
maintenance
hardware
software
0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
1 98 5
198 7
19 89
0
2 0
4 0
6 0
8 0
1 00
1 20
1 98 5
19 87
1 989
Outag es/ 1000 S yste m Y ears
Dependability Status circa 1995
• ~4-year MTTF => 5 9s for well-managed sys.
Fault Tolerance Works.
• Hardware is GREAT (maintenance and MTTF).
• Software masks most hardware faults.
• Many
hidden
software outages in operations:
–New Software.
–Utilities.
• Make all hardware/software changes ONLINE.
12
What’s Happened Since Then?
• Hardware got better
• Software got better
(even though it is more complex)
• Raid is standard,
Snapshots coming standard
13
Availability
9
9
9
9
9
well-managed nodes
well-managed packs & clones
well-managed GeoPlex
Masks some hardware failures
Masks hardware failures,
Operations tasks (e.g. software upgrades)
Masks some software failures
Masks site failures (power, network, fire, move,…)
Masks some operations failures
A
va
ila
bi
lit
y
14
Outline
• The glorious past (Availability Progress)
• The dark ages (current scene)
Progress?
• MTTF improved from 1950-1995
• MTTR has not improved much
since 1970 failover
• Hardware and Software online change
(pNp) is now standard
• Then the Internet arrived:
– No project can take more than 3 months.
– Time to market is everything
16
The Internet Changed Expectations
1990
Phones delivered 99.999%
ATMs delivered 99.99%
Failures were
front-page news.
Few hackers
Outages last an “hour”
2000
Cellphones deliver 90%
Web sites deliver 98%
Failures are
business-page news
Many hackers.
Outages last a “day”
Why (1) Complexity
• Internet sites are MUCH
more complex.
– NAP
– Firewall/proxy/ipsprayer
– Web
– DMZ
– App server
– DB server
– Links to other sites
18
One of the Data Centers
(500 servers)
C is c o 7 0 0 0
ICPMSCOMC7501
C i s c o 7 0 0 0
ICPMSCOMC7502
C a ta ly st 5 0 0 0
ICPMSCOMC5001
(MSCOM1) ATM0/0/0.1 FE4/0/0 Port 1/1 HSRP FE4/1/0 FE4/1/0 HSRP
Port 2/1 Port 2/1C a t a ly s t 5 0 0 0
ICPMSCOMC5002
(MSCOM2) FE4/0/0 ATM0/0/0.1
Port 1/1
C is c o 7 0 0 0
ICPMSCOMC7503
C a ta l y s t 5 0 0 0
ICPMSCOMC5003
(MSCOM3) ATM0/0/0.1 FE4/0/0 Port 1/1 HSRP FE4/1/0 FE4/1/0 HSRP
Port 2/1 Port 2/1 C a ta l y st 5 0 0 0
ICPMSCOMC5004
(MSCOM4) FE4/0/0 ATM0/0/0.1
Port 1/1
C is c o 7 0 0 0
ICPMSCOMC7504 SD SER ETHNEXTSELECT RESETTXCRXL PWR
S YSTEM S
SER ETHNEXTSELECT RESETTXCRXL PWR SER ETHNEXTSELECT RESETTXCRXL PWR SER ETHNEXTSELECT RESETTXC RXL PWR ACAC 48V DC48V DC 5VD C O K5VD C OK SH UTD OWNS HUTD OWN
CAUTIO N:Double Pol e/neutr al f usingCAUTI ON:Double Pole/neutr al fusing F 12A/ 250VF12A /250V
ASX- 100 0
BD BD BD BD AC AC AC AC SD SER ETHNEXTSELECT RESETTXCRXL PWR
S YSTEM S
SER ETHNEXTSELECT RESETTXCRXL PWR SER ETHNEXTSELECT RESETTXCRXL PWR SER ETHNEXTSELECT RESETTXCRXL PWR ACAC 48V DC48V DC 5VD C O K5VD C OK SH UTDO WNSH UTDO WN
CAUTIO N:Double P ole/neutr al f usingCAUTI ON:Double Pole/neutr al fusing F 12A/ 250VF1 2A/ 250V
ASX- 100 0
BD BD BD BD AC AC AC AC ICPMDISTFA1001 ICPMDISTFA1002 3A2 2A2 2A2 1A2 ATM0/0/0.1 4A2 ATM0/0/0.1 4A2 1A2
C i s c o 7 0 0 0
ICPMSCOMC7505
Catalyst 2926 ICPMSFTDLC2921
(MSCOM DL1) Port 1/1 FE4/0/0
HSRP
C i s c o 7 0 0 0
ICPMSCOMC7506
Catalyst 2926
ICPMSFTDLC2922
(MSCOM DL2) Port 1/1 FE5/0/0 HSRP Port 1/2 Port 1/2 FE4/0/0 HSRP FE5/0/0 HSRP IIS IIS IIS IIS IIS IIS CPMSFTWBW26 CPMSFTWBW28 CPMSFTWBW30 CPMSFTWBW37 CPMSFTWBW38 CPMSFTWBW39 WWW.MICROSOFT.COM WWW.MICROSOFT.COM CPMSFTWBW24 CPMSFTWBW31 CPMSFTWBW32 CPMSFTWBW33 CPMSFTWBW34 CPMSFTWBW35 CPMSFTWBW40 CPMSFTWBW41 CPMSFTWBW42 CPMSFTWBW43 SEARCH.MICROSOFT.COM CPMSFTWBS01 CPMSFTWBS02 CPMSFTWBS03 CPMSFTWBS04 CPMSFTWBS05 CPMSFTWBS06 CPMSFTWBS07 CPMSFTWBS08 CPMSFTWBS09 CPMSFTWBS10 CPMSFTWBS11 CPMSFTWBS12 CPMSFTWBS13 CPMSFTWBS14 CPMSFTWBS15 CPMSFTWBS16 CPMSFTWBS17 CPMSFTWBS18 WWW.MICROSOFT.COM CPMSFTWBW08 CPMSFTWBW13 CPMSFTWBW14 CPMSFTWBW29 CPMSFTWBW36 CPMSFTWBW44 CPMSFTWBW45 WWW.MICROSOFT.COM CPMSFTWBW01 CPMSFTWBW15 CPMSFTWBW25 CPMSFTWBW27 CPMSFTWBW46 CPMSFTWBW47 REGISTER.MICROSOFT.COM CPMSFTWBR03 CPMSFTWBR04 CPMSFTWBR05 CPMSFTWBR09 CPMSFTWBR10 SUPPORT.MICROSOFT.COM CPMSFTWBT01 CPMSFTWBT02 CPMSFTWBT03CPMSFTWBT07 CPMSFTWBT04
CPMSFTWBT05
WINDOWS.MICROSOFT.COM
CPMSFTWBY01 CPMSFTWBY02 CPMSFTWBY03CPMSFTWBY04
WINDOWS98.MICROSOFT.COM CPMSFTWBJ01 WINDOWSMEDIA.MICROSOFT.COM PREMIUM.MICROSOFT.COM CPMSFTWBP01 CPMSFTWBP02 CPMSFTWBP03 SUPPORT.MICROSOFT.COM CPMSFTWBT06 CPMSFTWBT08 CPMSFTWBR07 CPMSFTWBR08 CPMSFTWBR01 CPMSFTWBR02 CPMSFTWBR06 REGISTER.MICROSOFT.COM WINDOWSMEDIA.MICROSOFT.COM WINDOWSMEDIA.MICROSOFT.COM CPMSFTWBJ01 CPMSFTWBJ02 CPMSFTWBJ03CPMSFTWBJ05 CPMSFTWBJ06 CPMSFTWBJ07 CPMSFTWBJ08 CPMSFTWBJ09 CPMSFTWBJ10 CPMSFTWBJ06 CPMSFTWBJ07 CPMSFTWBJ08 CPMSFTWBJ09 CPMSFTWBJ10 MSDN.MICROSOFT.COM CPMSFTWBN01 CPMSFTWBN02 CPMSFTWBN03CPMSFTWBN04
KBSEARCH.MICROSOFT.COM CPMSFTWBT40 CPMSFTWBT41 CPMSFTWBT42 CPMSFTWBT43 CPMSFTWBT44 INSIDER.MICROSOFT.COM CPMSFTWBI01 CPMSFTWBI02 3D2
C a ta l y st 5 0 0 0
IUSCCMQUEC5002 (COMMUNIQUE2)
C a ta ly st 5 0 0 0
IUSCCMQUEC5001 (COMMUNIQUE1)
C a ta ly st
5 0 0 0 C a ta l y s t 5 0 0 0
ICPMSCBAC5001 ICPMSCBAC5502
Port 1/1 Port 2/12 Port 1/2
C is c o 7 0 0 0
ICPCMGTC7501
C i s c o 7 0 0 0
ICPCMGTC7502
FE4/1/0
Port 1/1 FE4/1/0
SQL
Microsoft.com SQL Servers
Microsoft.com Stagers,
Build and Misc. Servers
FTP 6 Build Servers 32 IIS 210 Application 2 Exchange 24 Network/Monitoring 12 SQL 120 Search 2 NetShow 3 NNTP 16 SMTP 6 Stagers 26
Total
459
Microsoft.com Server Count
Drawn by: Matt Groshong Last Updated: April 12, 2000 IP addresses removed by J im Gray
to protec t sec urity
CPMSFTSQLB05 CPMSFTSQLB06 CPMSFTSQLB08 CPMSFTSQLB09 CPMSFTSQLB14 CPMSFTSQLB16 CPMSFTSQLB18 CPMSFTSQLB20 CPMSFTSQLB21
Backup SQL Servers
CPMSFTSQLB22 CPMSFTSQLB23 CPMSFTSQLB24 CPMSFTSQLB25 CPMSFTSQLB26 CPMSFTSQLB27 CPMSFTSQLB36 CPMSFTSQLB37 CPMSFTSQLB38 CPMSFTSQLB39 CPMSFTSQLA05 CPMSFTSQLA06 CPMSFTSQLA08 CPMSFTSQLA09 CPMSFTSQLA14 CPMSFTSQLA16 CPMSFTSQLA18 CPMSFTSQLA20 CPMSFTSQLA21 CPMSFTSQLA22
Live SQL Servers
CPMSFTSQLA23 CPMSFTSQLA24 CPMSFTSQLA25 CPMSFTSQLA26 CPMSFTSQLA27 CPMSFTSQLA36 CPMSFTSQLA37 CPMSFTSQLA38 CPMSFTSQLA39 IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS
Consolidator SQL Servers
CPMSFTSQLC02 CPMSFTSQLC03 CPMSFTSQLC06 CPMSFTSQLC08 CPMSFTSQLC16 CPMSFTSQLC18 CPMSFTSQLC20 CPMSFTSQLC21 CPMSFTSQLC22 CPMSFTSQLC23 CPMSFTSQLC24 CPMSFTSQLC25 CPMSFTSQLC26 CPMSFTSQLC27 CPMSFTSQLC30 CPMSFTSQLC36 CPMSFTSQLC37 CPMSFTSQLC38 CPMSFTSQLC39 DOWNLOAD.MICROSOFT.COM DOWNLOAD.MICROSOFT.COM HTMLNEWS(pvt).MICROSOFT.COM CPMSFTWBV01 CPMSFTWBV02 CPMSFTWBV03 CPMSFTWBV04 CPMSFTWBV05 CPMSFTWBD01 CPMSFTWBD05 CPMSFTWBD06 CPMSFTWBD07
CPMSFTWBD08 CPMSFTWBD03CPMSFTWBD04 CPMSFTWBD09 CPMSFTWBD10 CPMSFTWBD11 ACTIVEX.MICROSOFT.COM CPMSFTWBA02 CPMSFTWBA03 FTP.MICROSOFT.COM CPMSFTFTPA03 CPMSFTFTPA04 CPMSFTFTPA05CPMSFTFTPA06
NTSERVICEPACK.MICROSOFT.COM CPMSFTWBH01 CPMSFTWBH02 CPMSFTWBH03 HOTFIX.MICROSOFT.COM CPMSFTFTPA01 ASKSUPPORT.MICROSOFT.COM CPMSFTWBAM03 CPMSFTWBAM04 CPMSFTWBAM01 CPMSFTWBAM01 MSDNNews.MICROSOFT.COM CPMSFTWBV21 CPMSFTWBV22 CPMSFTWBV23 MSDNSupport.MICROSOFT.COM CPMSFTWBV41 CPMSFTWBV42 NEWSLETTERS.MICROSOFT.COM CPMSFTSMTPQ01 CPMSFTSMTPQ02 NEWSLETTERS CPMSFTSMTPQ11 CPMSFTSMTPQ12 CPMSFTSMTPQ13 CPMSFTSMTPQ14 CPMSFTSMTPQ15 NEWSWIRE CPMSFTWBQ01 CPMSFTWBQ02 CPMSFTWBQ03
Misc. SQL Servers INTERNAL SMTP CPMSFTSMTPR01 CPMSFTSMTPR02 NEWSWIRE.MICROSOFT.COM CPITGMSGR01 CPITGMSGR02 NEWSWIRE CPITGMSGD01 CPITGMSGD02 CPITGMSGD03 OFFICEUPDATE.MICROSOFT.COM CPMSFTWBO01 CPMSFTWBO02 CPMSFTWBO04CPMSFTWBO07
PremOFFICEUPDATE.MICROSOFT.COM CPMSFTWBO30 CPMSFTWBO31 CPMSFTWBO32 SearchMCSP.MICROSOFT.COM CPMSFTWBM03 SvcsWINDOWSMEDIA.MICROSOFT.COM CPMSFTWBJ21 CPMSFTWBJ22 STATS CPITGMSGD04 CPITGMSGD05 CPITGMSGD07 CPITGMSGD14 CPITGMSGD15 CPITGMSGD16 CPMSFTSTA14 CPMSFTSTA15 CPMSFTSTA16 WINDOWS_Redir.MICROSOFT.COM CPMSFTWBY05 COMMUNITIES COMMUNITIES.MICROSOFT.COM CPMSFTNGXA01 CPMSFTNGXA02 CPMSFTNGXA03 CPMSFTNGXA04 CPMSFTNGXA05 CODECS.MICROSOFT.COM CPMSFTWBJ16 CPMSFTWBJ17 CPMSFTWBJ18 CPMSFTWBJ19 CPMSFTWBJ20 CGL.MICROSOFT.COM CPMSFTWBG03 CPMSFTWBG04 CPMSFTWBG05 CPMSFTWBG04 CPMSFTWBG05 CDMICROSOFT.COM CPMSFTWBC01 CPMSFTWBC02 CPMSFTWBC03 BACKOFFICE.MICROSOFT.COM CPMSFTWBB01 CPMSFTWBB03 CPMSFTWBB04 Build Servers INTERNET-BUILD INTERNET-BUILD1 INTERNET-BUILD2 INTERNET-BUILD3 INTERNET-BUILD4 INTERNET-BUILD5 INTERNET-BUILD6 INTERNET-BUILD7 INTERNET-BUILD8 INTERNET-BUILD9 INTERNETBUILD10 INTERNETBUILD11 INTERNETBUILD12 INTERNETBUILD13 INTERNETBUILD14 INTERNETBUILD15 INTERNETBUILD16 INTERNETBUILD17 INTERNETBUILD18 INTERNETBUILD19 INTERNETBUILD20 INTERNETBUILD21 INTERNETBUILD22 INTERNETBUILD23 INTERNETBUILD24 INTERNETBUILD25 INTERNETBUILD26 INTERNETBUILD27 INTERNETBUILD30 INTERNETBUILD31 INTERNETBUILD32 INTERNETBUILD34 INTERNETBUILD36 INTERNETBUILD42 IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS IIS SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL Stagers CPMSFTCRA10 CPMSFTCRA14 CPMSFTCRA15 CPMSFTCRA32 CPMSFTCRB02 CPMSFTCRB03 CPMSFTCRP01 CPMSFTCRP02 CPMSFTCRP03 CPMSFTCRS01 CPMSFTCRS02 CPMSFTCRS03 CPMSFTSGA01 CPMSFTSGA02 CPMSFTSGA03 CPMSFTSGA04 CPMSFTSGA07
PPTP / Terminal Servers
CPMSFTPPTP01 CPMSFTPPTP02 CPMSFTPPTP03 CPMSFTPPTP04 CPMSFTTRVA01 CPMSFTTRVA02 CPMSFTTRVA03 CPMSFTSQLD01 CPMSFTSQLD02 CPMSFTSQLE01 CPMSFTSQLF01 CPMSFTSQLG01 CPMSFTSQLH01 CPMSFTSQLH02 CPMSFTSQLH03 CPMSFTSQLH04 CPMSFTSQLI01 CPMSFTSQLL01 CPMSFTSQLM01 CPMSFTSQLM02 CPMSFTSQLP01 CPMSFTSQLP02 CPMSFTSQLP03 CPMSFTSQLP04 CPMSFTSQLP05 CPMSFTSQLQ01 CPMSFTSQLQ06 CPMSFTSQLR01 CPMSFTSQLR02 CPMSFTSQLR03 CPMSFTSQLR05 CPMSFTSQLR06 CPMSFTSQLR08 CPMSFTSQLR20 CPMSFTSQLS01 CPMSFTSQLS02 CPMSFTSQLW01 CPMSFTSQLW02 CPMSFTSQLX01 CPMSFTSQLX02 CPMSFTSQLZ01 CPMSFTSQLZ02 CPMSFTSQLZ04 CPMSFTSQL01 CPMSFTSQL02 CPMSFTSQL03 Monitoring Servers CPMSFTHMON01 CPMSFTHMON02 CPMSFTHMON03 CPMSFTMONA01 CPMSFTMONA02 CPMSFTMONA03
A Schematic of HotMail
• ~7,000 servers
• 100 backend stores
with 120TB (cooked)
• 3 data centers
• Links to
– Passport
– Ad-rotator
– Internet Mail gateways
– …
• ~ 1B messages per day
• 150M mailboxes, 100M active
• ~400,000 new per day.
S
w
it
tc
h
ed
E
th
er
n
e
t
In
te
rn
et
Telnet Management
20
Why (2) Velocity
• No project can take more than 13 weeks.
• Time to market is everything
• Functionality is everything
• Faster, cheaper, badder
Schedule
Quality
Functionality
Why (3) Hackers
• Hacker’s are a new increased threat
• Any site can be attacked from anywhere
• Motives include ego, malice, and greed.
• Complexity makes it hard to protect sites.
• Concentration of wealth makes attractive target:
• Why did you rob banks?
• Willie Sutton: Cause that’s where the money is!
Note: Eric Raymond’s
How to Become a Hacker
http://www.tuxedo.org/~esr/faqs/hacker-howto.html
22
How Bad Is It?
http://www-iepm.slac.stanford.edu/
How Bad Is It?
24
Microsoft.Com
• Operations mis-configured
a router
• Took a day to diagnose
and repair.
• DOS attacks cost a
fraction of a day.
BackEnd Servers are More Stable
• Generally deliver 99.99%
• TerraServer for example
single back-end
failed after 2.5 y.
• Went to 4-node
cluster
• Fails every 2 mo.
Transparent
failover in 30 sec.
Online software upgrades
Time
%
Total Up Time
8754:07:22
99.93%
Total Down Time
5:52:38
0.07%
Total Time
8760:00:00
100.00%
Scheduled Down
2:50:45
Scheduled Availabilty
8757:09:15
99.97%
Un-Scheduled Down
3:01:53
Time
%
Up Time
12888:21:49
99.519%
Scheduled Down
4:00:25
0.031%
Unscheduled Down
58:20:46
0.451%
Total Time
12950:43:00 99.52%
Total Down
62:21:11
0.48%
Year 1
Through
18
Months
Down 30 hours in July (hardware stop, auto restart failed,
operations failure)
26
eBay: A very honest site
•
Publishes operations log.
Publishes operations log.
•
Has 99% of scheduled uptime
Has 99% of scheduled uptime
•
Schedules about 2 hours/week down.
Schedules about 2 hours/week down.
•
Has had some operations outages
Has had some operations outages
•
Has had some DOS problems.
Has had some DOS problems.
Outline
• The glorious past (Availability Progress)
• The dark ages (current scene)
28
Not to throw stones but…
• Everyone has a serious problem.
• The BEST people publish their stats.
• The others HIDE their stats
(check Netcraft to see who I mean).
• We have good NODE-level availability
5-9s is reasonable.
Recommendation #1
• Continue progress on back-ends.
– Make management easier
(AUTOMATE IT!!!)
– Measure
– Compare best practices
– Continue to look for better algoritims.
• Live in fear
– We are at 10,000 node servers
30
Recommendation #2
• Current security approach is unworkable:
– Anonymous clients
– Firewall is clueless
– Incredible complexity
• We cant win this game!
• So change the rules (redefine the problem):
– No anonymity
– Unified authentication/authorization model
References
Adams, E. (1984). “Optimizing Preventative Service of Software Products.” IBM Journal of Research and
Development. 28(1): 2-14.0
Anderson, T. and B. Randell. (1979). Computing Systems Reliability.
Garcia-Molina, H. and C. A. Polyzois. (1990). Issues in Disaster Recovery. 35th IEEE Compcon 90. 573-577.
Gray, J. (1986). Why Do Computers Stop and What Can We Do About It. 5th Symposium on Reliability in
Distributed Software and Database Systems. 3-12.
Gray, J. (1990). “A Census of Tandem System Availability between 1985 and 1990.” IEEE Transactions on
Reliability. 39(4): 409-418.
Gray, J. N., Reuter, A. (1993). Transaction Processing Concepts and Techniques. San Mateo, Morgan
Kaufmann.
Lampson, B. W. (1981). Atomic Transactions. Distributed Systems -- Architecture and Implementation: An
Advanced Course. ACM, Springer-Verlag.
Laprie, J. C. (1985). Dependable Computing and Fault Tolerance: Concepts and Terminology. 15’th FTCS.
2-11.
Long, D.D., J. L. Carroll, and C.J. Park (1991). A study of the reliability of Internet sites. Proc 10’th Symposium
on Reliable Distributed Systems, pp. 177-186, Pisa, September 1991.
Darrell Long
, Andrew Muir and
Richard Golding
, ``A Longitudinal Study of Internet Host Reliability,''
Proceedings of the Symposium on Reliable Distributed Systems,
Bad Neuenahr, Germany: IEEE,
September 1995, p. 2-9
http://www.netcraft.com/
They have even better for-fee data as well, but for-free is really excellent.
http://www2.ebay.com/aw/announce.shtml#top
eBay is an Excellent benchmark of best Internet practices