Assessment? Who is the Forensic Engineer and what is his Role?
3.1 Investigation
Investigation of incidents and near misses, together with subsequent related activities, is one of the most valid methods to improve the safety and reliability of process plants and, by reflection, of the entire process industry. It has already been pointed out how other methods, like the hazards identification or the management of change, share the same objective of the investigation (safety improvement), but they are predictive methods. This characteristic implies the following
limitations [1]:
The analyses are speculative. Therefore, it is highly possible that all the plausible events are not identified;
it is difficult to predict the real level of risk, because it is usually based on approximated likelihood and consequence assessment;
it is difficult to identify multiple cause events, as the accidents are;
and
it is difficult to predict human error and to take it into account in the risk assessment (this error concerns many incidents).
On the other hand, investigation of actual incidents provides useful information, even if hard to extract, cutting prejudice, ignorance, and misunderstandings that may affect the theoretical preventive analysis.
Incident investigation is a core element of a Safety Management
System (SMS) [2], whose main goal is to prevent the incident. Indeed, a fundamental assumption of incident investigations concerns the possibility to find, as the root cause, a malfunctioning in the SMS. In
other words, it is always possible to find some aspects of the SMS that, if properly organised and applied, would have prevented the occurred incident. That malfunctioning can be related to a lack of planning, organization, actualization, or control. Taking inspiration from [3], when the management system for incidents is developed, or evaluated, or improved, it is essential to:
Involve competent personnel, define an appropriate scope,
implement the program consistently throughout the company, and monitor the effectiveness of incident investigation to maintain a dependable investigation practice;
identify the potential incidents for investigation, monitoring all possible sources, and ensuring the reporting activities;
adopt proper methods to investigate, collecting appropriate data, being rigorous, providing expertise and tools to the investigation personnel;
report the incident investigation results, with a clear link between causes and recommendations, and developing recommendations;
follow up the results of investigations, resolving recommendations, sharing the findings externally and internally; and
analyse data to identify a trend in the recurrence of similar incidents.
Performing an accident investigation requires the usage of an
investigation method (some are discussed in Chapter ). However, an effective accident investigation does not simply stop at the application of the selected method. Indeed, it also requires those personnel
involved:
To establish trust and confidence, thus a favourable environment to discuss the incident;
to be prone in listening to what people say, and to base all findings on verifiable facts;
to establish a clear cause effect link, based on sound evidence, together with a timeline;
to be assisted by technical experts when dealing with specialised issues;
to understand the identified root causes; and
to manage the accident investigation as an ordinary project should be (scheduling activities, budgeting).
The fundamental basis for investigation is scientific method. Its rigorous approach is the key to being able to carry out effective investigations, capable of looking beyond the widget, as already discussed in Chapter . But it is also systematic, thorough and
intellectually honest. The accident investigation process consists of many activities. In the context of a simplified approach, it is possible to identify the three phases in Figure 3.1. The first one is collecting data. It is immediately carried out together with the analysis of the evidence, which may guide the collection of the evidence towards the new objectives, generating a new hypothesis or rejecting others. At the end of the investigation, once the findings have been discovered, there is the last phase of the recommendations development.
Figure 3.1 Phases in accident investigation.
Incident investigation is a process that may be required for different purposes by different entities [4]. The main purpose of the
investigation is to determine the cause of the accident, exploring both
immediate and root causes, and to develop recommendations to avoid its recurrence. As has already been stated, its purpose is not to assign blame. However, there may be other goals in conducting an accident investigation, such as to check the compliance with law and standards or to solve issues about insurance liability for compensation [5]. For example, after an industrial accident, the administration of a State may suppose that a crime has been committed, so it decides to run an investigation. In this case, it is carried out to evaluate the basis for potential criminal prosecution, thus blame finding is legitimated. Very often, the investigation is commissioned by the same company which experienced the accident. This happens not only to comply with
internal rules established at the corporate level but also to ensure the understanding of the offered lessons, thus preventing future
reoccurrences. Investigating accidents is also a good way to
demonstrate how positive is the attitude of the company to health and safety, regardless of the onset of an actual litigation. It is common to investigate why a part, component, material, procedure, or
management system fails. In theory, the reasons why they have been successful before the occurrence of the incident should be investigated as well, but the reasons for successful results are generally taken for granted and the attention is focused only on the undesired outcomes [6]. Even if the main purpose of an investigation is clear, very
frequently an in house investigator or an external consultant is diverted to serve other ends, like blaming or exonerating certain
people or things. Obviously, this method tends to introduce bias, since some positions are a priori defended or offended, by strengthening the speculative approach even before any evidence is collected.
A formal definition of investigation is given by [7], recalling [8]:
“An investigation can be defined as the management process by which underlying causes of undesirable events are uncovered and steps are taken to prevent similar occurrences.”
Another interesting definition is provided by [9]:
“A structured process of uncovering the sequence of events that produced or had the potential to produce injury, death, or property damage to determine the causal factors and corrective actions.”
This definition recalls the one of the root cause. According to the
definition provided in [7], a root cause is a fundamental system related reason why an accident occurred that identifies a correctable failure or failures in management systems. There is typically more than one root cause of every process safety incident.
Incident investigation groups a series of activities. It is a process for reporting, tracking, and investigating incidents, including a formal process for investigating them and their trending to identify recurring events [3].
Incident investigations usually start from the end of the story: once the fire, or the explosion, or the collapse, or the toxic release occurred, people ask how it happened. Starting from the chronological endpoint, the investigator begins his/her work, in a tentative way to determine who, what, when, where, why, and how it happened. Only when the event has been explained, its sequence reconstructed, and the main causes found, then the investigation has been solved. The investigative analysis is based on physical evidence and verifiable facts. The
investigator then uses scientific principles and selected methodologies to collect, recognise, organise, and analyse evidence. These topics are discussed in depth in Chapter . At this stage, it is sufficient to
understand that the incident investigation is structured like a pyramid [10] (Figure 3.2). The collected facts and physical evidence form the large base of the investigation pyramid. They are then the basis for the analysis, carried out in adherence to the scientific principles. Finally, the analysis is the base to support a small number of conclusions (the apex of the pyramid).
Figure 3.2 The Conclusion Pyramid. Source: Adapted from [10].
Reproduced with permission
.
Figure 3.3 A damaged item under investigation.
Conclusions should be self evident as in Figure 3.3. Usually, this
characteristic automatically complies when facts are logically arranged in chronological order and with clear cause effect relations.
Conclusions must not be based on other conclusions or hypotheses, otherwise the investigation pyramid collapses.
Even if it is possible to classify the accidental scenarios and to find similar peculiar consequences, each accident is a stand alone case. The uniqueness must be sought in the progress of facts, which strongly depends on the context and the intrinsic features of the plant. The goal of the scientific investigation, in the industrial context, is to
reconstruct the dynamics of the incidental event, finding all the causes and their interconnections, as well as underlining the lack of technical compliance regarding plants, procedures, and machinery.
Different guidelines identify the crucial aspects to be considered during the investigation; however, they rarely find a unique
methodology to be followed, thus providing only general information.
This is due because a technical investigation cannot be faced as the resolution of a scientific or mathematic general problem. Indeed a
“problem” is a question where the aim is to find unknown data which are logically obtainable from the already known ones. From this point of view, if a problem is well posed then the solution stands in its
definition, requiring only to be extracted from the person in charge to solve the problem through a quantifiable method. Conversely, writing a technical report has a high level of uncertainty (the consequence of introducing the complex theory in the investigation context has been already discussed). Uncertainty is given by:
The peculiarity of each incident;
the complexity of the problem;
the lack of all the useful data, for the resolution from the beginning; and
the subjectivity, given by the personal contribution of the technical consultant.
Therefore, the complex problem to be faced is defined step by step, proceeding with the learning process and developing on different levels at the same time. A further difficulty is implicit in the required equilibrium between elasticity and rigour. On the one hand, to prove the claimed assertions, it is necessary to comply with the scientific literature and the laws but, on the other hand, it is not suggested that an investigation be faced only from a pragmatic point of view. The reason is in the following citation:
“When you go looking for something specific, your chances of finding it are very bad. Because, of all the things in the worlds, you're only looking for one of them. When you go looking for anything at all, your chances of finding it are very good. Because, of all the things in
the world, you're sure to find some of them”.
Daryl Zero in the film “Zero Effect” (USA, 1998).
It has been already pointed out that older investigations were
superficial, since they only identified obvious causes and developed poor recommendations. In the more modern layered approach, a deeper analysis is carried out and additional layers of
recommendations are developed: immediate technical
recommendations, recommendations to avoid the hazards, and recommendations to improve the management system.
To sum up, the research into the causes of an incident span over three different levels [11]:
Immediate cause. It is the most obvious reason why an adverse event happens (e.g. the valve is in the incorrect position). A single adverse event may be correlated to several immediate causes identified;
underlying cause. It is the less obvious reason found at the end of the investigation outcome and it concerns the system. Examples are: preliminary checks not carried out by supervisors; not robust risk assessment; too great production pressures, poor safety
culture, and so on;
root cause. It is the initiating event from which all other causes come. Root causes are generally related to management, planning or organisational failings.
Generally, recommendations are also developed over these three different levels, reflecting the distinction presented for the causes.
Having clarified which reasons to investigate, the information gained from an investigation and the benefits arising from it, one may
question about which events should be investigated [11]. Indeed, the injuries suffered on the occasion cannot simply determine the level of investigation, since the potential consequences and the likelihood of recurrence should also guide in what in depth should be carried out the investigation. The severity and the immediacy of the risk involved determine also the urgency of an investigation. It suggested that
adverse events are investigated as soon as possible, also to enjoy the
best memory and motivation.
In this sense, an important question that may arise is: “How
thoroughly should the accident be investigated?” [12]. Rasmussen, in [13], answered the question by identifying the so called stop rules.
Reason, in [14], suggests that when the identified causes are no longer controllable, then the investigation stops. This rule of thumb actually identifies different stopping points for various parties. For example, companies should go back to their own management systems to develop effective preventing measures. Supervisory authorities, like national commissions of inquiries or permanent investigation boards, should look at regulatory systems to understand if legal weaknesses could contribute to the accident. Instead, the police and the
prosecutors are generally interested in the outer layer to evaluate the basis of a potential crime. Insurance companies are focused on the liability for compensation, therefore their investigation stops at a further different level respect to the previously listed cases. Stopping at the root cause level is a recurring challenge. Increasing the depth of the analysis implies an increasing level of learning, which results in an increasing scope of the corrective actions. In simpler words, solving the management system issues is much more effective than repairing the failed equipment or blaming human error.
A common error is to consider an event as a root cause [5]. But events are not root causes; they are the consequences of the underlying
causes. For example, an LOPC or a malfunctioning SIS are not root causes, but events. Similarly, a lack of knowledge or insufficient skill is not a root cause. It is therefore fundamental to push the investigation down to the root cause level, even if the stopping point could not be easily identified, otherwise ineffective recommendations will be
produced. It is undoubtedly true that finding in depth causes is a real challenge. Depending on the depth of the analysis, it is possible to develop recommendations also to prevent similar incidents, not only the very same ones [3]. One of the problems affecting the analysis of what can be called an “organizational incident” is also the socio cultural environment surrounding the analyst, as discussed in [15].
Some common terms of the art are now described, taking suggestion from [6] and [4], to help the reader in the learning process and in a
wider comprehension of the topic:
Failure analysis. It is the determination of how a specific part, equipment, machinery, component has failed. It also concerns the design, the adopted materials, methods of production, and product usage;
evidence. From a legal point of view, it is an information used and accepted by the court to resolve disputed issues of fact. The
different sources of evidence are presented in Chapter . Fundamentally, there are two types of evidence: direct and
circumstantial. The difference between them is that direct evidence proves to a certainty that a fact happened, while a circumstantial evidence brings a level of probability in its definition. Generally, direct evidence is accepted by the courts. Circumstantial evidence is taken into account only if it is not decreed as irrelevant, not obtained illegally, not a hearsay, and being proved by one logical step, at least;
Root Cause Analysis (RCA). It connotes the determination of the managerial and human performance aspects of failure. It is
discussed in depth in Chapter ;
forensic. This modifier connotes that something is related to the law, the courts, the debate, and so on;
contributing cause. It is a factor that does not cause the event to occur, not triggering the incident sequence, but it significantly gives its contribution in increasing the magnitude of the event or the likelihood of its occurrence;
causative factor. It is a pre existing condition that increases the likelihood of the event. It can be:
Direct cause. It existed immediately before the occurrence of the event and directly allowed or promoted it; or
indirect cause. It is the same as a contributing cause;
root cause. It is a type of direct cause. It is defined by some as the fundamental cause, that is to say, once removed or modified, it would have prevented the event from occurring (or recurring). This
definition implies that only a single root cause exists: this was a conviction of the past, when the “one event one cause” tenet was extremely appealing. However, even if some incidents may have a single root cause, the current definition of root cause establishes the simultaneous presence of other root causes. Indeed, an incident investigation rarely found a single root cause: more than one root cause typically exist. A cause that cannot be controlled by a person is not a root cause (e.g. lightning);
apparent cause. It is also named “immediate cause”. It is the cause found by a limited investigation. It usually concerns failures in equipment or human error, without considering the managerial context. An investigation stops at immediate cause when the problem is small or limited in scope and there is no risk in
performing a limited inquiry. This is not the extent of this book, which intends to go deeper in the root cause analysis;
programmatic cause. It is a deficiency in a managerial construct (like procedures and training) that increases the likelihood that human error will occur;
reconstruction. It is the explanation of a failure, a crime, an incident, or, more generally, an event;
Human Performance Evaluation Process (HPEP). It is a method to evaluate how people's behaviors and actions contribute to causing the incident. The human factor is discussed in Chapter ;
corrective action. At the end of the investigation, it is the developed recommendation to fix the problems or weaknesses that are
identified in the root cause. How to develop recommendations is discussed in Chapter ;
extent of condition. It is the speculative effort to evaluate if similar incidents can occur elsewhere. Thus, the knowledge gained from the experienced incident is used to prevent further events; and falsification. It is a principle used when applying the scientific method to the incident investigation. It simply means that the working hypothesis must provide the predicted outputs (facts and collected data prove that the hypothesis is correct), but the
hypothesis must not be proven incorrect (facts and collected data prove that the hypothesis is not incorrect). Falsification is
important in incident investigation: it is not the quantity of
evidence supporting a hypothesis that count, nor the authority of those people supporting the same hypothesis. What counts is the quality of the collected evidence and of those facts that falsify (or fail to falsify) a hypothesis. The value of falsification is dealt with in depth in [6].
Falsification is extremely important to avoid an unwanted bias during the incident investigation: the confirmation bias. Briefly it occurs when the investigator tends to enforce one hypothesis solely because
“there cannot be another explanation”, and the reconstruction of the event is carried out selecting only those pieces of evidence that may confirm the prejudice in the mind of the investigator, even
unconsciously. Falsification is a strategy that tends to eliminate this bias, which is difficult to detect because investigators are usually unaware of being affected.
In order to prevent the recurrence of similar incidents, it is a requirement to [5]:
Identify and understand the scenario (what happened and how it happened);
identify the underlying and contributing causes (why it happened).
Rejection of proposed hypotheses should be based on physical evidence;
develop recommendations (identify preventive measures); and implement recommendations and share the lessons learnt.
There are some decisions to be made before an investigation begins [9]. They will be discussed further in this book and concern:
The level of the investigation, that is to say how much detail the investigation should uncover;
the decision about who will investigate. Usually, a team approach is encouraged;
the decision about how much time will be dedicated to the