Glossary of Terms
Chapter 4: Methodology
4.9 Bayesian Networks
4.9.1 Modelling requirements
A number of issues need to be considered in specifying the modelling requirements:
• In the narratives about success or failure of interventions, conflicting information is often encountered, and there is therefore a need for an ability to handle multiple and conflicting hypothesis;
• Some information is more reliable than other information (for example statements in official reports by reputable experts as opposed to statements made over lunch by a government official), and so there is a need to allow for taking this diversity in information reliability into account;
• Assessments are made at certain points in time and the relevance diminishes with time. This freshness of information needs to be taken into account; and
• The structure of the cause and effect relationships is often unclear, and there is a need to allow for structural learning of models to occur in response to new information.
It is also noted that BNs do not support feedback loops, as only acyclic cause and effect networks can be modelled. This is a shortcoming of this methodology, and it is hoped that the use of ABMs will help overcome such shortcomings.
147 4.9.2 Design process
The design of a BN model has three main components, i.e. it’s:
• Ontological component, which is the definition of terminology;
• Qualitative component, which is the definition of the causal relationships between topic domains;
• Quantitative component, which consists of judgments of the state of topic domains.
The steps that have been taken in the design of Bayesian Networks are as follows:
• Choice of scenario / hypothesis for exploration, on the basis of options (management, technology or otherwise) that, on the basis of knowledge elicitation, have been perceived as both important but also either already being considered, already being implemented or being planned for;
• Identification of ontology and topic domains (i.e. terminology) and causal relationships between topic domains using literature and trip reports; generating influence diagrams which are translated into a model structure. It is noted however that there is an element here of filtering out what is important and what is not important; providing another bias from me as a researcher;
• Choice of data sources (such as literature, trip reports and personal memory) including assessments, judgments and opinions regarding the state of the various topic domains;
accepting multiple and conflicting hypotheses regarding the same topic domain. This also involves a subjective assessment of the reliability of each information source.
Validity of the BNs has been explored by analysing the risks of historical projects, and seeing whether it is possible to predict failure or success of such intervention strategies. There has also been dialogue and social validation with stakeholders in regards to the influence diagrams. The decision about which intervention strategies to analyse has been made on the basis of those that are perceived as important by key stakeholders and experts.
148 4.9.3 Software environment
Software environments that have been used in other BN modelling studies:
• Varis (1997), Varis and Kuikka (1997) and Varis and Somlyody (1997) used a self generated spreadsheet toolkit, ‘F. C. BeNe’, which are not widely accessible;
• Bromley and colleagues (2005) used the HUGIN EXPERT (Hugin Expert A/S 2009) software environment which is not free, but which has been widely used in many different areas ranging from climate and environment to finance and intelligence;
• Saravanan (2008) used the software tool NETICA (Norsys Software Corp. 2009) which is also a commercial software product with an impressive list of clients worldwide;
• Pollino and colleagues (2007) have used a methodology called Knowledge Engineering of Bayesian Networks (KEBN) (Woodberry 2009) and the associated support tools that are linked to this approach. This is a tool developed at Monash university and while promising it seems to be in relatively early stages of development;
• Ha and Stenström (2003) used a Neural Network software tool, Neural Connection 2.1 which would have had to be adjusted in order to help in Bayesian Network modelling;
• Ticehurst and colleagues (2007) have programmed their own software environment based on the Interactive Component Modelling System (ICMS);
• Pope and colleagues (2008) have used the software tool Sheba which has been developed at the Defence Science and Technology Organisation (DSTO) of Australia, primarily for intelligence purposes.
Based on this list, four realistic options have been identified as for BN analysis, i.e. Sheba, HUGIN EXPERT, NETICA and KEBN. Based on an initial investigation of these software tools, the criteria for assessing which tool is most suitable are as follows: quality of user interface, and ability to build large scale applications; ability to accept multiple and conflicting hypotheses regarding the state of a particular topic domain; and given the wide range of
149 information types that’s can be used, the ability for including information about reliability of information; as well as the ability for modelling both deductive and abductive reasoning. Whilst most tools are able to capture deductive reasoning, only few tools allow for considering abductive reasoning (i.e. considering the symptoms of a hypothesis being correct).
Applying the first criteria eliminates the KEBN software environment from our list of options.
The second criterion eliminates NETICA from our list as there seems to be nothing in the available documentation written about how to consider multiple and conflicting information.
HUGIN EXPERT on the other hand appears to not support abductive reasoning; leaving Sheba as the only remaining option. While Sheba is not commercially available yet; thankfully, for the purposes of this study, Sheba can be used at no charge after approval from its inventor.
The following description of Sheba has been provided:
Sheba is a cross-platform software application for performing predictive analysis under conditions of uncertainty. It runs on a desktop computer and provides a framework for users to structure and analyse categorical reasoning problems. It was designed for use with strategic intelligence tasks within the National Security and Intelligence domains, and is well-suited to various competitive intelligence and decision-support applications within the Business and Commerce domains. Sheba uses the Analysis of Competing Hypotheses using Subjective Logic (Pope and Jøsang 2005) to reason about the effect on the projected outcomes created by different categories of information, even where data may be sparse. Various techniques are used within Sheba to calculate the effect of out-of-date information, and unreliable sources and observations (Pope et al.
2006). This enables Sheba to be used in situations where the availability and quality of the data varies to create usable assessments that are reflective of the available data, and which also highlight the uncertainty created by any lack of relevant, reliable data (personal communication, Pope 2009).
150 4.9.4 Model formulation in Sheba
In an overview of the Sheba software (Pope 2008), it is described how the software has been developed as a framework for structuring and analysing intelligence problems, with initial applications in the National Security Domain, but also in the Business and Commerce domain.
It is also described that Sheba is a software framework for developing analytical models to represent the complexity and subtleties of real world problems. This is achieved by:
• Decomposing such problems into fragments that can more easily be understood and managed;
• Storing information relating to the relevant problem at hand, using multiple sources and considering the time aspects;
• Adding qualifiers relating to the reliability and credibility of information sources;
• Providing reasoning with uncertain and/or incomplete information using a number of analytical models;
• Setting up complete inference chains using subjective logic formalised in a probabilistic framework, from information to assessment;
• Providing output in terms of likelihoods and certainty of particular outcomes; and in this way giving measures of not only the uncertainty of outcomes but also how much is really known about the problem, and where critical knowledge gaps are.
Whilst initial applications have been in unrelated domains, it can clearly be seen how the Sheba framework may provide considerable support in the sphere of uncertain and incomplete information encountered in the urban water domain. To specify a model in Sheba, an analyst will need to provide certain information, relating to:
• Choosing, in the language of Sheba – the Hypothesis, i.e. a particular issue to explore, for example ‘successful installation, operation, performance and maintenance of a desalination
151 plant in a particular location operated by a particular organisation, and based on a certain technology, etc?’
• Identifying the Topic Domains, i.e. the terms relevant to the particular problem, i.e.
Hypothesis, relating to all factors or items of information which relates to, and including, the hypothesis, and for each topic domain possible states have been defined, such as for example if the term is ‘environmental acceptability’, the states may be ‘high environmental acceptability’ and ‘low environmental acceptability’;
• Setting up the Model, defining the relationship between the Topic domains and the Hypothesis. This is done by specifying a system of conditional probabilities (i.e.
applications of Bayes’ formula):
• A topic domain can be defined as either a Symptom/Indicator or an Influence/Driver for the chosen Hypothesis;
• Each of the conditional probabilities are specified on the basis of existing information; and can be classified as Simple opinion, Probability or Bayesian and specifying, as percentage of trustworthiness, the Certainty that the estimate is judged to have;
• Linking Topic domain states, for example in the case of an Influence/Driver ‘The probability of weak ownership of technology assuming that there is a low social and cultural acceptability of the solution’. For Symptoms/Indicators the statement is reversed.
• Entering the Data sources which are to be used in the Assessment, identifying documents and other sources of information that are relevant for the Hypothesis at hand. These Data sources contain different information at different times, and with different levels of Certainty. They are entered as Simple opinions, Probabilities, or Bayesian statements referring to the states of Topic domains. Conflicting information is acceptable, and dealt with using probabilistic formalism.
152
• Setting up an Assessor specifying which information sources to use, which models to link up, and providing output results such as:
• The likelihoods and certainty of different outcome states;
• Graphical output containing a number of different aspects.
The layout of the Sheba software is shown in Figure 4-10, where a project is entered.
Figure 4-10: User Interface for an example BN Model in Sheba
153