Pre- and post- gameplay questionnaires

Chapter 5. Testing of the game environment

5.3 Results and discussion

5.3.2 Pre- and post- gameplay questionnaires

Due to the complexities of qualitative assessment and the rigorous demands of conducting indepth interviews and observations, sample size cannot realistically be large for the formative evaluation of a prototype system.

Accordingly, the analysis of strictly empirical data is not feasible in the context of this study. However, quantitative aspects of the evaluation were still included in the pre- and post-gameplay questionnaire to see if it was possible to at least see a trend in the data when moving from the pre-gameplay questionnaire to the post-gameplay questionnaire. To this end, quantitative, categorical data from the multiple choice questions on both questionnaires were graphed to see if there was any noticeable trend between the pre- and post-gameplay knowledge states.

When the total number of correct answers for all the multiple choice questions in the pre-gameplay sample were compared with those in the post-gameplay sample (Figure 5.2) a number of interesting things became apparent. Out of a total of 16 questions, 7 (44%) of the questions were better answered in the post-gameplay sample as opposed to the pre-gameplay sample, while a further 7 (44%) of the questions were answered the same in both samples, and 2 (12%) of the questions were answered less correctly in the post-gameplay than in the pre-gameplay questionnaire.

Realistically there could be a number of reasons to explain the difference in pre- and post-gameplay performance and it is not within the scope of this study to examine each of these possibilities. However, since the learners filled in the pre-gameplay questionnaires immediately before engaging in gameplay, and filled out the post-gameplay questionnaire immediately after gameplay and their interview with the facilitator (which did not cover any of the topics in the questionnaire), it is most likely that any changes in the learners view of the topic resulted from something that happened during gameplay or, at the very least, was part of the discursive element associated with gameplay.

Question number

23 22 21 20 19 13 12 11 10 9 8 7 6 5 4 2

Number correct responses

8 7

Pre-gameplay Post-gameplay

Figure 5.2 Graph showing the numbers of correct responses before and after gameplay. The graph indicates that on a

‘per question’ basis, learners improved their responses in 44% of the questions, maintained their original answers in a further 44% of the questions and downgraded their responses in 13% of the questions.

When all questions are considered, the post-gameplay questionnaire was an improvement on the pre- gameplay questionnaire. Furthermore, it appeared that the confidence learners had in their initial answers for many of the questions were boosted after playing the game (Figure 5.3). This is feasible in that if learners were initially unsure about a certain aspect of the content and then what they thought was confirmed either through interfacing with the game environment or through the resultant discourse with their gameplay partner, then their confidence in their answer would either remain the same or likely increase after gameplay. This change in mean confidence between pre- and post-gameplay questionnaires is apparent when comparing the two samples.

The trends shown in both Figures 5.2 and 5.3 indicate that learners are taking something away with them after playing the game. Since determining whether this occurs, and to what degree it may occur, is beyond the focus of this study, one cannot state conclusively, nor backup these assumptions with empirical statistically robust analyses, as to what degree of learning is taking place. Suffice it to say that the trends indicated in these results suggest that learners firstly get more questions correct in the post-gameplay questionnaire when compared with the pre-gameplay questionnaire and that learners appear to be more confident of their answers in the post-gameplay questionnaire. These trends certainly indicate that learners might learn something from gameplay or that gameplay is effective at validating their pre-existing knowledge concerning certain topics.

Question number

23 22 21 20 19 13 12 11 10 9 8 7 6 5 4 2

Mean Confidence

3.6

3.4

3.2

3.0

2.8

2.6

2.4

2.2

Pre-gameplay Post-gameplay

Figure 5.3 Graph showing the mean confidence levels specified by learners in the pre- and post-gameplay questionnaires. Out of a total of 16 questions, 8 (50%) of the questions were answered more confidently after gameplay than before, and the remaining 50% of questions were answered with the same confidence levels as before gameplay.

When the open-ended questions from the pre- and post-gameplay questionnaires were analysed a marked trend became immediately apparent. When the total relevant concept scores for each question were graphed it could be seen that the post-gameplay questionnaires had substantially higher concept scores than the pre- gameplay questionnaires (Figure 5.4). Furthermore, the pre- and post-gameplay concept scores were subjected to statistical analysis to see if they were significantly different from one another. This was done by means of a Wilcoxon paired signed rank test.

Question number

26 25 24 18 17 16 15 14 3 2 1

Total relevant concept scores

Pre-gameplay Post-gameplay

Figure 5.4 Graph showing the comparison between the pre- and post-gameplay relevant concept scores. The graph shows that the number of relevant concepts included in the learners answers were consistently higher across all questions after gameplay.

The parametric paired samples T-test is a more powerful test but it assumes that the data that it compares is normally distributed and departures from this distribution can cause problems with accuracy. To test the distribution of the concept scores a Kolmogrov-Smirnov procedure was run. This procedure confirmed that the concept score populations were both normally distributed (pre-gameplay concept score: P = 0.557, N = 11; post-gameplay concept score: P = 0.215, N = 11). However, the smaller the sample size the less likely a Kolmogrov Smirnov test is to detect the presence of outliers, and it is these outliers that can cause statistical misrepresentations (Zar, 1996). One way to get around this is to do a normal probability plot where the points are plotted graphically and outliers can be visually detected. When a normal probability plot was perfomed the presence of one outlier was found. Therefore, the Wilcoxon paired signed rank test, which does not assume normal data distribution, was used instead. The results from this test indicated that the post-gameplay relevant concept scores were significantly higher than their pre-gameplay counterparts (Z = -2.810; P = 0.005). This means that learners consistently answered the open-ended questions after gameplay with more relevant concepts than before gameplay.

In addition, when the pre- and post-gameplay relevant terminology scores were compared the post-gameplay terminology scores were consistently higher across all questions (Figure 5.5). These two data populations

were also compared to see if they were statistically significantly different, and this was done using a Wilcoxon paired signed ranks test. The pre- and post-gameplay terminology scores were determined to be significantly different (Z = -2.814; P = 0.005) with the post-gameplay terminology scores being higher (mean 10.64) than those from the pre-gameplay questionnaires (mean 15.36).

Question number

26 25 24 18 17 16 15 14 3 2 1

Total relevant terminology scores

Pre-gameplay Post-gameplay

Figure 5.5 Graph showing the total relevant terminology scores for pre- and post-gameplay questionnaires. The post- gameplay relevant terminology scores are consistently higher than their pre-gameplay counterparts across all questions with the exception of question 16 which remains unchanged.

These results indicate not only that the learners responded with more appropriate and informed answers after gameplay, but that they were also able to incorporate the relevant terminology into their explanations. All of the open-ended questions here referred to puzzles that were embedded in the portal environment and with which the learners engaged during gameplay. While it is not possible with the data that has been collected here to conclusively attribute this increased learner performance to the interaction with the puzzles, it seems a feasible explanation.

Perhaps criticism could be levied at the author for the manner in which the gameplay testing was run. It might have been more beneficial to ask respondents to fill in the post-gameplay questionnaire an extended period after gameplay and not when their interactions with the facilitator and their gameplay partners were still foremost in their minds. This approach would have been complicated by the simple fact that when learners

leave the testing premises after gameplay they are open to numerous sources of information which would be considered confounding factors in the post-gameplay analysis. In other words, one could never be sure that the change in their knowledge state was a direct result of the learning environment and not some other source.

It is instead suggested that whether increased learner performance is a result of the interaction with the learning environment or whether the puzzle interfaces simply stimulated dialogue between the learners is almost immaterial. What ultimately matters is that learner performance has increased and this increased performance must be due to something that happened during gameplay. These types of environments are appealing to learners and one of their many strengths is that they do get learners talking to one another about the concepts presented in the environment.

Dalam dokumen Use of constructivism in the development and evaluation of an educational game environment. (Halaman 135-140)