Common Cause Failure (CCF) - Creating a Reliability Model

Reliability Modeling Techniques

5.2 Creating a Reliability Model

5.2.2 Common Cause Failure (CCF)

Whereas simple models of redundancy assume that failures are both random and independent, common cause failure (CCF) modeling takes account of failures which are linked, due to some dependency, and therefore occur simultaneously or, at least, within a sufficiently short interval as to be perceived as simultaneous.

Two examples are:

(a) The presence of water vapor in gas causing two valves to seize due to icing. In this case the interval between the two failures might be of the order of days. However, if the proof-test interval for this dormant failure is two months then the two failures will, to all intents and purposes, be simultaneous.

(b) Inadequately rated rectifying diodes on identical twin printed circuit boards failing simul-taneously due to a voltage transient.

Typically, causes arise from:

(a) Requirements: incomplete or conflicting

(b) Design: common power supplies, software, emc, noise (c) Manufacturing: batch-related component deficiencies

(d) Maintenance/operations: human induced or test equipment problems (e) Environment: temperature cycling, electrical interference, etc.

Defenses against CCF involve design and operating features which form the assessment criteria given in Appendix 3.

CCFs often dominate the unreliability of redundant systems by virtue of defeating the random coincident failure feature of redundant protection. Consider the duplicated system in Figure 5.2. The failure rate of the redundant element (in other words the coincident failures) can be calculated using the formula developed in Table 5.1, namely 2l²MDT.

Typical failure rate figures of 10 per million hours (10⁵ per hr) and 24 hrs down time lead to a failure rate of 2 10¹⁰ 24 ¼ 0.0048 per million hours. However, if only one failure in 20 is of such a nature as to affect both channels and thus defeat the redundancy, it is necessary to add the series element, shown as l2 in Figure 5.3, whose failure rate is

Figure 5.3:Reliability block diagram showing CCF.

5% 10⁵¼ 0.5 per million hours, being two orders more frequent. The 5%, used in this example, is known as a BETA factor. The effect is to swamp the redundant part of the prediction and it is thus important to include CCF in reliability models. This sensitivity of system failure to CCF places emphasis on the credibility of CCF estimation and thus justifies efforts to improve the models.

InFigure 5.3, (l1) is the failure rate of a single redundant unit and (l2) is the CCF rate such that (l2) ¼ b(l1) for the BETA model, which assumes that a fixed proportion of the failures arise from a common cause. The contributions to BETA are split into groups of design and operating features which are believed to influence the degree of CCF. Thus the BETA multi-plier is made up by adding together the contributions from each of a number of factors within each group. This Partial BETA model (as it is therefore known) involves the following groups of factors, which represent defenses against CCF:

- Similarity (Diversity between redundant units reduces CCF) - Separation (Physical distance and barriers reduce CCF) - Complexity (Simpler equipment is less prone to CCF)

- Analysis (FMEA and field data analysis will help to reduce CCF)

- Procedures (Control of modifications and of maintenance activities can reduce CCF) - Training (Designers and maintainers can help to reduce CCF by understanding root causes) - Control (Environmental controls can reduce susceptibility to CCF, e.g., weather proofing of

duplicated instruments)

- Tests (Environmental tests can remove CCF prone features of the design, e.g., emc testing) The Partial BETA model is assumed to be made up of a number of partial bs, each contributed to by the various groups of causes of CCF. b is then estimated by reviewing and scoring each of the contributing factors (e.g., diversity, separation).

The BETAPLUS model has been developed from the Partial Beta method because:

- It is objective and maximizes traceability in the estimation of BETA. In other words the choice of checklist scores, when assessing the design, can be recorded and reviewed.

- It is possible for any user of the model to develop the checklists further to take account of any relevant failure causal factors that may be perceived.

- It is possible to calibrate the model against actual failure rates, albeit with very limited data.

- There is a credible relationship between the checklists and the system features being analyzed. The method is thus likely to be acceptable to the nonspecialist.

- The additive scoring method allows the partial contributors to b to be weighted separately.

- The b method acknowledges a direct relationship between (l2) and (l1) as depicted in Figure 5.3.

- It permits an assumed “nonlinearity” between the value of b and the scoring over the range of b.

The BETAPLUS model includes the following enhancements:

(a) Categories of factors

Whereas existing methods rely on a single subjective judgment of score in each category, the BETAPLUS method provides specific design and operationally related questions to be answered in each category.

(b) Scoring

The maximum score for each question has been weighted by calibrating the results of assess-ments against known field operational data.

(c) Taking account of diagnostic coverage

Since CCF is not simultaneous, an increase in auto-test or proof-test frequency will reduce b since the failures may not occur at precisely the same moment.

(d) Subdividing the checklists according to the effect of diagnostics

Two columns are used for the checklist scores. Column (A) contains the scores for those features of CCF protection which are perceived as being enhanced by an increase in diagnostic frequency. Column (B), however, contains the scores for those features believed not to be enhanced by an improvement in diagnostic frequency. In some cases the score has been split between the two columns, where it is thought that some, but not all, aspects of the feature are affected (See Appendix 3).

(e) Establishing a model

The model allows the scoring to be modified by the frequency and coverage of diagnostic test.

The (A) column scores are modified by multiplying by a factor (C) derived from diagnostic related considerations. This (C) score is based on the diagnostic frequency and coverage. (C) is in the range 1e3. A factor “S,” used to derive BETA, is then estimated from the RAW SCORE:

S ¼ RAW SCORE ¼ X

A C

þX

(f) Nonlinearity

There are currently no CCF data to justify departing from the assumption that, as BETA decreases (i.e., improves), successive improvements become proportionately harder to achieve. Thus the relationship of the BETA factor to the RAW SCORE [(SA C) þ SB] is assumed to be exponential and this nonlinearity is reflected in the equation which translates the raw score into a BETA factor.

(g) Equipment type

The scoring has been developed separately for programmable and non-programmable equip-ment, in order to reflect the slightly different criteria which apply to each type of equipment.

(h) Calibration

The model has been calibrated against field data.

Scoring criteria were developed to cover each of the categories (i.e., separation, diversity, complexity, assessment, procedures, competence, environmental control, and environmental test). Questions have been assembled to reflect the likely features which defend against CCF. The scores were then adjusted to take account of the relative contributions to CCF in each area, as shown in the author’s data. The score values have been weighted to calibrate the model against the data.

When addressing each question (in Appendix 3) a score less than the maximum of 100% may be entered. For example, in the first question, if the judgment is that only 50% of the cables are separated then 50% of the maximum scores (15 and 52) may be entered in each of the (A) and (B) columns (7.5 and 26).

The checklists are presented in two forms (listed in Appendix 3) because the questions applicable to programmable equipments will be slightly different to those necessary for non-programmable items (e.g., field devices and instrumentation).

The headings (expanded with scores in Appendix 3) are:

(1) Separation/Segregation (2) Diversity

(3) Complexity/Design/Application/Maturity/Experience (4) Assessment/Analysis and Feedback of Data

(5) Procedures/Human Interface

(6) Competence/Training/Safety Culture (7) Environmental Control

(8) Environmental Testing

Assessment of the diagnostic interval factor (C)

In order to establish the (C) score, it is necessary to address the effect of diagnostic frequency.

The diagnostic coverage, expressed as a percentage, is an estimate of the proportion of failures which would be detected by the proof test or auto test. This can be estimated by judgment or, more formally, by applying FMEA at the component level to decide whether each failure would be revealed by the diagnostics.

An exponential model is used to reflect the increasing difficulty in further reducing BETA as the score increases. This is reflected in the following equation which is developed in Smith D J, 2000, “Developments in the use of failure rate data”:

ß ¼ 0.3 expð 3:4S=2624Þ

However, the basic BETA model applies to simple “one out of two” redundancy. In other words, with a pair of redundant items the “top event” is the failure of both items. However, as the number of voted systems increases (in other words N > 2) the proportion of common cause failures varies and the value of b needs to be modified. The reason for this can be under-stood by thinking about two extreme cases:

1 out of 6

In this case only one out of the six items is required to work and up to five failures can be toler-ated. Thus, in the event of a common cause failure, five more failures need to be provoked by the common cause. This is less likely than the “one out of two” case and b will be smaller (see tables below).

5 out of 6.

In this case five out of the six items are required to work and only one failure can be tolerated.

Thus, in the event of a common cause failure, there are five items to which the common cause failures could apply. This is more likely than the “one out of two” case and b will be greater (see tables below).

This is an area of much debate. There is no empirical data and the models are a matter of conjecture based on opinions of various contributors. There is not a great deal of consistency between the various suggestions. It is thus a highly controversial and uncertain area. The original suggestions were from a SINTEF paper (in 2006) which were the MooN factors originally used in the Technis BETAPLUS package version 3.0. The SINTEF paper was revised (in 2010) and again in 2013. The IEC 61508 (2010) guidance is similar but not identical (Table 5.10). The SINTEF(2013) values are shown inTable 5.11. The BETAPLUS (now Version 4.0) compromise is shown in Appendix 3.

Table 5.10: BETA(MooN) factor IEC 61508.

M[ 1 M[ 2 M[ 3 M[ 4

N ¼ 2 1

N ¼ 3 0.5 1.5

N ¼ 4 0.3 0.6 1.75

N ¼ 5 0.2 0.4 0.8 2

Dalam dokumen The Safety Critical Systems Handbook (Halaman 133-138)