6.89 Software typically goes through modification cycles, called updates or upgrades to fix existing errors in code or enhance or improve software functionality. One of the major causes of software failure is that as software code is modified, each modification is capable of increasing the risk of failure. Some of the changes that are meant only to fix errors may create another one, resulting in a greater or smaller probability of failure. Where a vendor releases a significant number of new features or a major redesign, there is, typically, a sudden increase of the probability of failure, after which, the risk is reduced once further error updates begin to resolve the errors discovered, thus reducing the risk again over time.
6.90 It is useful to observe that when safety-related software code is modified, there is usually documentation to explain how the risk has been reduced, although this is only in the case of dangerous failures, and not necessarily all failures. By way of example, consider the case of Saphena Computing Limited v Allied Collection Agencies Limited in which Mr Recorder Havery QC commented:
In the present case, on the other hand, once the software is fit for its purpose, it stays fit for its purpose. If by any chance a flaw is discovered showing that it is unfit for purpose (which is hardly likely after prolonged use)1 there is a remedy in damages against the supplier, if solvent, until the expiry of the period of limitation.2
1 Professor Thomas has indicated that even in 1995 there was plenty of evidence that this was not correct.
2 [1995] FSR 616, 639.
6.91 The problem with this remark is that proprietary software code can be (and indeed often is) affected by updates, which means it does not necessarily stay ‘fit for purpose’. Flaws can become manifest at any time, and some flaws can remain for years, which means if they are detected by a malicious person or state agency, they can be manipulated for purposes other than what users intend. There is a more fundamental flaw in this statement. If the software is used unchanged for a different purpose, which may be no more than the original purpose but applied to different data, it may still fail.
6.92 This is illustrated in the Heartbleed exposé.1 Cryptographic protocols are used to provide for the security and privacy of communications over the Internet, such as the World Wide Web, email, instant messaging and some virtual private networks.
The current protocol is called the Transport Layer Security (TLS). To implement this protocol, a developer will use a cryptographic library. One such library, which is open sourced, is OpenSSL. In 2011, a doctoral student wrote the Heartbeat Extension for OpenSSL, and requested that his implementation be included in the protocol. One of the developers (there were four) reviewed the proposal, but failed to notice that the code was flawed. The code was included in the repository on 31 December 2011 under OpenSSL version 1.0.1. The defect allowed anyone on the Internet to read the memory of any system that used the flawed versions of the OpenSSL software. It was possible for a hacker using this flaw to steal user names and passwords, instant messages, emails and business documents. No trace would be left of the attack. The attack did not rely on access to privileged information or credentials such as username and passwords.
Taking into account the length of exposure, the ease by which it can be exploited, the fact that an attack does not leave a trace, and that it is estimated to have affected up to two-thirds of the Internet’s web servers, this weakness was taken seriously. On 7 April 2014, the day the Heartbleed vulnerability was publicly disclosed, a new version that applied a fix to the flaw was released on the same day.
1 Jane Wakefield, ‘Heartbleed bug: what you need to know’, BBC News Technology (10 April 2014)
<www.bbc.co.uk/news/technology-26969629>; Brian Krebs, Heartbleed Bug: What Can You Do? (14 April 2014) <http://krebsonsecurity.com/2014/04/heartbleed-bug-what-can-you-do/>; <https://
en.wikipedia.org/wiki/Heartbleed>. A more important error was discovered in GNU Bash in September 2014, for which see ‘Bourne-Again Shell (Bash) Remote Code Execution Vulnerability’ (Original release date: 24 September 2014; last revised: 30 September 2014), at <https://www.us-cert.gov/ncas/
current-activity/2014/09/24/Bourne-Again-Shell-Bash-Remote-Code-Execution-Vulnerability>.
6.93 Software can also be affected by changes in the environment, such as the operating system or other components, rather than any specific application, although it is necessary to distinguish between modification of software in situ and the reuse of software in an environment that is presumed to be similar. An example is the Ariane 5 incident, where the malfunction arose from a changed environment and assumptions that were poorly understood, rather than a defect in the original development. Where the software is modified in situ, the environment does not change; where software is re-used in an environment that is presumed to be similar, the software has not changed, but the environment has. The results in either case are that there may be a mismatch where there was none before.
6.94 Generally speaking, programmers who modify someone else’s code often do not fully understand the software, and may also be less well trained than the people who wrote it. Software can be relied upon to produce verifiably correct results, but to have such a degree of certainty, it is necessary to be assured that the operating conditions remain identical and that nothing else malfunctions. Peter G. Neumann has indicated that even though the utmost care and attention might be devoted to the design of a system, it may still have significant flaws.1 This was illustrated in a 1970 report edited by Willis H. Ware.2 Now freely available, the authors noted, under ‘Failure Prediction’
within section V System Characteristics, that:
In the present state of computer technology, it is impossible to completely anticipate, much less specify, all hardware failure modes, all software design errors or omissions, and, most seriously, all failure modes in which hardware malfunctions lead to software malfunctions. Existing commercial machines have only a minimum of redundancy and error-checking circuits, and thus for most military applications there may be unsatisfactory hardware facilities to assist in the control of hardware/software malfunctions. Furthermore, in the present state of knowledge, it is very difficult to predict the probability of failure of complex hardware and software configurations; thus, redundancy [is] an important design concept.
1 Neumann, Computer Related Risks, 4; see his text generally for this topic.
2 Security Controls for Computer Systems: Report of Defense Science Board Task Force on Computer Security – RAND Report R-609-1 <www.rand.org/pubs/reports/R609-1/index2.html>.
6.95 The authors of the report went on to observe the following in Part C, Technical Recommendations:
(a) It is virtually impossible to verify that a large software system is completely free of errors and anomalies.
(b) The state of system design of large software systems is such that frequent changes to the system can be expected.
(c) Certification of a system is not a fully developed technique nor are its details thoroughly worked out.
(d) System failure modes are not thoroughly understood, cataloged, or protected against.
(e) Large hardware complexes cannot be absolutely guaranteed error-free.