Human factors tools - The human factors toolkit

The human factors toolkit

4.4 Human factors tools

4.4.1 Subjective methods

Subjective methods focus on asking the user about their experience using the product or system. Thus, they are indirect methods of finding out about the design of the product since the user reports on their expectations and perceptions of using the interface. Some examples include heuristic evaluation, questionnaires, interviews, checklists and focus groups. Questionnaires and interviews (from open-ended through to semi-structured and closed questions) are commonly used tools for data collection, and the reader is referred to Reference 9 for an excellent review.

4.4.1.1 Heuristic evaluation

Heuristic evaluation is a method that has arisen within the human factors community over the last couple of decades. It is essentially a systematic inspection of a user interface in order to check the extent to which it meets recognised design principles, e.g. compatibility with the user’s mental models, internal consistency of operations, availability of reversion techniques, etc. Nielsen [10] developed a set of 10 heuristics for use in this context. These included visibility of system status, match between system and the real world, user control and freedom, consistency and standards, error

prevention, recognition rather than recall, flexibility and efficiency of use, aesthetic and minimalist design, facilitating users to recognise, diagnose and recover from errors, and help and documentation. In its simplest form, a heuristic evaluation works by having an expert evaluator carry out a number of typical tasks using the product (or a mock-up, if the product is not fully developed) and reporting on how it conforms in terms of the design heuristics, as mentioned above. For a fuller explanation of the methodology for carrying out a heuristic evaluation, see Reference 11. Heuristic evaluations fall into the so-called ‘quick and dirty’ group of methods. They can be carried out very quickly, but they do need an expert evaluator and it is preferable that this person has a human factors background. A further point to note is that a heuristic evaluation will highlight design problems but not their solutions. However, it is generally thought that once problems have been identified, ways of solving them are often apparent.

4.4.1.2 Checklists

Checklists could be thought of as a sophisticated form of heuristic evaluation in that the questions have already been researched and formed. They provide an ‘off-the-shelf’

technique because a lot of checklists for evaluating interfaces are available commer- cially or from the academic sources where they were developed. Their advantages include the fact that they are easy to administer, and can be used by the non-expert to assess the design of an interface. Participants completing them tend to like them because they are fast to fill in. However, like heuristic evaluation, they will highlight problems with the interface, but not provide any solutions. There might also not be a checklist that will fit the particular needs of the interface being evaluated, so this may prove problematic. Some examples include:

SUS (System Usability Scale: [12]);

QUIS (Questionnaire for User Interface Satisfaction: [13]);

CUSI (Computer User Satisfaction Inventory: [14]);

SUMI (Software Usability Measurement Inventory: [15]);

FACE (Fast Audit based on Cognitive Ergonomics: [16]);

MUSiC (Measuring Usability of Software in Context: [17]);

WAMMI (Web-site Analysis and MeasureMent Inventory: [18]);

MUMMS (Measuring the Usability of Multi-Media Systems: Human Factors Research Group at http://www.ucc.ie/hfrg/). This is a new multi-media version of the SUMI (developed by the same group of people).

4.4.1.3 Focus groups

Focus groups have been gaining in popularity recently in a number of different sectors in industry and the research community. They are essentially brainstorming sessions where people meet to consider a problem or situation. (For a good review, see [19].) In the human factors context, they have been used to assess the usefulness of a product in terms of meeting user requirements. They can be quite elaborate with table-top discussions, stooges being present to stimulate the focus group content, and taking place over a number of days (for a detailed example, see [20]). Table-top discussions

are group sessions in which a selected group of experts discuss issues based on specific scenarios. Often, these groups are chaired by a human factors specialist [21].

Like heuristic evaluations and checklists, they have been classed as ‘quick and dirty’.

However, the reliability and validity of the findings emerging from focus groups need to be questioned. Two of the difficulties of working with humans are evident in focus group work. First, what people say they do is actually different from what they do, and second, people will tell you what they think you want to hear, i.e. they attempt to provide the right answers especially when there is no comeback on them. Hence, there are many opportunities for the introduction of bias when running focus groups and compiling the findings.

4.4.2 Objective methods

Whereas subjective methods focus primarily on users’ attitudes towards a product, objective methods, as the term suggests, provide an appraisal of the product based on measured data in a more controlled setting. They can be carried out formatively or summatively. The examples to be considered here include observation, task analysis (TA) and human reliability assessment (HRA). It should be noted that the latter two are more extensively covered in Chapters 5 and 8.

4.4.2.1 Observation

Observation of users can provide a rich source of information about what users actually do. My colleague, Chris Baber, quotes the example of an observation study he was conducting of ticket vending machines on the London Underground. He observed that people were folding up notes and trying to push them into the coin slots. It would have been hard, if not impossible, to predict this type of user behaviour. Hence, observation studies have a high level of face validity, that is they will provide reliable information about what people actually do, but a low degree of experimental control. Further, they do not provide information about the use of a system or why people are using it in a certain way. The primary disadvantage with the direct observation technique is the lack of further information to explain why users are behaving in certain ways. This can make redesign difficult. A way around this is to follow observation studies with a debriefing session with the users – this allows reasons why they carried out specific actions to be explored. However, in naturalistic settings, this may not be possible.

Direct observation needs to be accompanied by some means of recording the observations. Common ways include audio and video recording, and event/time logging. Video recording has frequently been used in observation studies, but it is resource-hungry in terms of the time needed to transcribe the video-tapes. Some researchers report that an hour of video-tape takes around 10 hours of transcription for analysis purposes. With event/time logging, the raw observations consist of the events or states along a time-line. Drury [22] provided the example of observing customers queuing in a bank. The raw observation data comprised customer-related events along a time-line. The data collection exercise was event driven and this was transformed into a time/state record for indicating the system states in the queue formation. Drury gave five types of information that can be calculated from this type of observation

information. These include: sequence of activities; duration of activities; frequency of activities; time spent in states; and spatial movement. This information can then be represented in the form of Gantt charts, process flow charts and link charts.

Another issue concerns the Hawthorne effect; if users know they are being observed, they will often modify their behaviour accordingly. (The Hawthorne effect arose from the studies carried out at the Hawthorne electrical works in the 1920s. It was found that workers responded whenever the environmental conditions of their workplace were changed by increasing their work output. This happened even when the changes were detrimental. Thus, the experimenters concluded that the workers were responding to the fact that they were being studied, and had improved their work rate accordingly.) The artificial aspects of the experimental setting may result in the Hawthorne effect being a problem when running experiments. Although it can occur in natural, operational environments (as the original Hawthorne studies indicated), it is less likely.

An extension of the observation method is that known as the Wizard of Oz. This method, influenced by the film of that name in which the Professor instructs onlookers to pay no attention to the man behind the curtain, involves having a person ‘act’ as the product or system by making the responses. The real advantage of this technique is that it allows emerging technologies, e.g. advanced speech recognition that can cope with natural language dialogues, to be simulated. Users can then experience and feed back information on a system that is currently not technically feasible. There is an element of deception when participants undergo experimental studies that employ the Wizard of Oz; hence, the ethics of carrying out this type of study need to be carefully considered.

Indirect observation techniques can also be used to find out how people carry out tasks. Surveys, questionnaires, interviews, focus groups, group discussions, diaries, critical incident reports and checklists can all be administered to individuals to provide details and documentation about their use. In contrast to these self-report techniques, archive material can also be employed to find out about individuals’ use of a particular system. These might include analyses of written records, and computer logs of user activities.

4.4.2.2 Task analysis

In contrast to observation, task analysis (TA) is a better-known and established method in the human factors toolkit. The term ‘task analysis’ essentially covers a range of techniques to describe, and sometimes to evaluate, the human–machine and human–

human interactions in systems. Task analysis has been defined as follows: ‘methods of collecting, classifying and interpreting data on human performance in work situations’

([23], p. 1529). However, the generality of the term evident in definitions such as this is often not particularly helpful in terms of trying to find out about and execute this technique. Task analysis is a term that includes a number of different techniques.

It is particularly useful for large, multiple activity tasks, where techniques such as timeline analysis, link analysis, or critical incident technique would prove inadequate on their own.

One of the best-known forms of TA is hierarchical task analysis (HTA) (see [24]

for a comprehensive overview). Here, a task is taken and broken down in terms of goals and sub-goals, and their associated plans. The end-result can be a pictorial representation, often in the form of a flow chart, showing the actions needed to achieve successful completion of the task. On its own, this type of breakdown of the task can be useful in demonstrating where the problems are. Likewise, carrying out a TA for a task being carried out under different conditions, say a paper-based versus a computerised system, can be particularly illuminating. The main advantage of this method is that it provides a systematic breakdown of the tasks and sub-tasks needed to use a product or a system. Often, this in itself is beneficial to finding out about the difficulties people have using the interface. In terms of disadvantages, there is a need to decide the level of granularity required in the TA. Take a simple example, like making a sandwich. When carrying out a TA for this everyday domestic task, decisions have to be made about the level of detail needed. For example, if unsliced bread is used, the availability of a sharpened bread knife needs to be considered, a check on the thickness of the slices, using the crust, etc. Trivial though this example might be, it provides a good demonstration of the type of problems encountered when carrying out a TA.

Other variants on TA include Cognitive Task Analysis (CTA). The cognitive element is an important component of tasks involving automated and complex systems, and CTA methods reflect this. The example given by Shepherd [25] to differentiate physical and cognitive elements is that of button pressing: the action of actually pressing the button is physical whereas deciding which button to press is cognitive.

Successful completion of the task will depend on both being carried out, and Shepherd argued that it is more useful to consider a general TA that accommodates all elements of the task rather than focusing on cognitive and non-cognitive TA.

Early CTAs included GOMS (Goals, Operations, Methods and Selection rules) developed by Card, Moran and Newell [26]. This was used in a text processing application and required the identification of rules for selecting methods for allow- ing the operators to achieve their goals. GOMS was followed by TAG (Task Action Grammar: [27]) and TAKD (Task Analysis for Knowledge Description: [28]). TAG focuses on describing tasks in terms of syntax rules, while TAKD considers task descriptions in the context of rules for knowledge elicitation. A more recent development is Applied Cognitive Task Analysis (ACTA: [29]), which comprises three interview methods to enable the practitioner to extract information concerning the cognitive skills and mental demands of a task. The early TAs have been criticised because they focused exclusively on the role of the user in the design process to the point of excluding other aspects. Diaper, McKearney and Hurne [30] have attempted to compensate for this approach by developing the pentanalysis technique. This technique comprises the following elements: an initial requirements data capture, task and data flow analyses carried out simultaneously, integration of the pentanalysis and data flow analyses, and finally, the development of a final data flow model and pentanalysis.

In conclusion, TA is a term that covers a plethora of techniques to examine the activities carried out by humans in complex systems. Reference 31, for example,

has been cited as listing over 100 methods relating to TA. This vast array of different methods does mean that an expert is needed to decide which techniques are appropriate in order to conduct the TA. As a result, a TA is not easy for the non-specialist to carry out. The primary benefit of executing TAs stems from the systematic analysis of human–machine activities that these techniques facilitate. This, as their long history indicates, is one of the main reasons for the continuing interest in developing and using them. Further details on TA techniques are provided in Chapters 5 and 8.

4.4.2.3 Human reliability assessment

Human reliability assessment (HRA) is a generic methodology, which includes a TA, to assess the reliability of a system. A methodology for an HRA was given by Kirwan and Ainsworth [32]. In this methodology, after the problem had been defined, they suggested that a TA should be conducted in order to consider the goals and sub-goals of the users in terms of the tasks and sub-tasks they are carrying out. This would lead to an identification of the particular types of error being made. As a general rule, HRA techniques focus on quantifying the impact of different errors in order to consider means of error prevention, reduction and management. In an HRA, a fault or event tree might be used to model the errors and their recovery paths, as well as specification of the human error probabilities and error recovery probabilities. Thus, a key component of HRA is error identification; there are many techniques available for doing this.

Human error identification (HEI) techniques have been developed over the last 20 years as systems have become more complex and more highly automated. The link between human error and incidents/accidents has fuelled developments in this area, although it is this author’s view that errors only play a small part in causing accidents (or the less critical incidents). For more information on errors and their role in accidents, see Reference 33 and Chapter 7 of this book.

Like TA, HEI techniques benefit from facilitating a systematic analysis of the operation of a product or system from the point of view of where the errors are likely to occur. Once the activities where errors are likely to occur have been located, this then allows remedial actions to be taken in order to either prevent or accommodate the operator making them. Many techniques for HEI have been developed: some examples are listed below.

THERP (Technique for Human Error Rate Prediction: [34]);

HEART (Human Error Assessment and Reduction Technique: [35]);

SHERPA (Systematic Human Error Reduction and Prediction Approach: [36]);

PHECA (Potential Human Error Cause Analysis: [37]);

GEMS (Generic Error Modelling System: [38]);

PHEA (Potential Human Error Analysis: [39]);

TAFEI (Task Analysis for Error Identification: [40]).

4.4.3 Empirical methods

The third and final group of methods are the empirical ones. It could be argued that these are similar to the objective methods in that they have a high level of objectivity.

The difference, therefore, is slight and relates primarily to the degree of control:

empirical methods are tightly controlled. This is in contrast to the subjective and objective methods that rely primarily on indirect or direct reporting and analyses of user activities. The use of experiments for examining usability, modelling tools and fitting trials and mannequins will be considered here.

4.4.3.1 Experiments

The focus of the experimental approach in human factors work is on usability. This is primarily because usability is a central issue in the design of products and systems.

It emanates from the term ‘user-friendly’, which began to be used in the early 1980s to describe whether or not computer systems had been designed to optimise user interactions. Although the term is still widely used and applied to a range of products today, the human factors community does not particularly like ‘user-friendly’

because it does not have an agreed definition. Hence, it cannot be measured in any objective way. This is in contrast to the term ‘usability’, which has been defined by the International Standards Organisation [41] as: ‘the usability of a product is the degree to which specific users can achieve specific goals within a particular environment;

effectively, efficiently, comfortably, and in an acceptable manner’. Important in this definition is the subjective component as illustrated by the final point concerning acceptability.

The affective element of product/system use has been increasingly recognised in recent years (see [42]). If users find that using a device is unacceptable for some reason, they may decide they do not like it, and will cease to use it. Hence, it has been suggested that the ultimate test of usability is whether or not people use an object. Data logging techniques, e.g. the keystroke level model, may be employed to do this. Although there could be ethical considerations in doing this, i.e. logging an individual’s use of a product or system, it is purported to provide an accurate assessment of use. Self-report techniques, where individuals are asked to report their use, are known to be subject to inaccuracy. People, either deliberately in order to create a certain impression, or accidentally, will often provide erroneous reports of their level of use of a product or system.

The first personal computer (PC) was launched in February 1978, and PCs began to penetrate the workplace in the early 1980s. The net result of this was to bring computers and computing to the masses. It is perhaps not surprising therefore that the concept of usability was developed in the 1980s. Emeritus Professor Brian Shackel is credited with this as he was the first to use the term in this context in 1981 [43].

Since then, the original definition has been extended, and many, many publications have been produced on usability. Shackel’s early definition in conjunction with his colleague, Ken Eason, focused on the user and the task in terms of learnability, effectiveness, the attitude of the users (the subjective component) and flexibility (nicknamed the LEAF acronym). Over the years, this original work has been modified to include usefulness, acceptability and reusability. Thus, usability metrics are a vital way of assessing the usefulness and appropriateness of a product or system (Table 4.2).

These metrics are often investigated in an experimental setting.

Dalam dokumen Human Factors for Engineers (Halaman 88-98)