Method - Digitizing Assessment of Creative Aptitude: A Human-Centred Design Approach

Chapter 1. Introduction 1

3.2 Method

3.2.1 Questionnaire preparation and survey plan for identifying parameters

This experimental plan intends to evaluate novelty in creative aptitude exhibited in mass examinations of Design education. Novelty is a complex factor that constitutes several other subfactors. Literature highlights several subfactors involved in novelty, such as language processing, relevance between questions and their responses, narration link or coherence in responses, unique concept, etc. (Berbague et al., 2021; Camburn et al., 2020; Demirkan &

Afacan, 2012; Schumann et al., 1996). A questionnaire was framed based on these features whose internal consistency was measured with Cronbach’s alpha which was found to be 0.709 that belongs to an acceptable range (Sharma, 2016). The questionnaire used Likert-type scale ranging from ‘very important’=5, ‘slightly more important’=4, ‘important’=3, ‘slightly important’=2, and ‘not at all important’=1. It possessed a provision of including additional features that Design practitioners considered significant apart from the existing features.

The questionnaire designed for finding the parameters to evaluate novelty in creative responses is illustrated in Appendix B. The questionnaires were written in the way (provided in appendix) to confirm whether the factors identified in literature review ascertain to the factors considered by Design pedagogues in practice. State of the art review was conducted to examine the findings from literature that contributed to the identification of factors that are referred for

assessing products, solutions, ideas, etc. as illustrated in Table 3.1. A survey was conducted to collect user ratings using 5-point Likert-type scale (Demirkan & Afacan, 2012) to identify the factors preferred by experts in the evaluation process. The structure of the questionnaire is provided in Appendix C.

Table 3.1: Factors for evaluating creative aptitude (questionnaire given in Appendix B and D) that are associated with literature articles

Factors State-of-the-art literature

Ideas of fluency D'Souza, 2021; Benedek et al., 2016; Diedrich et al., 2015; Cheung et al., 2003; Kornish and Jones, 2021; Al‐

Zahrani, 2015; Mirabito and Goucher-Lambert, 2020;

Dippo and Kudrowitz, 2013; Bayer-Hohenwarter, 2010;

Almeida et al., 2008

Flexibility D'Souza, 2021; Benedek et al., 2016; Diedrich et al., 2015; Cheung et al., 2003; Al‐Zahrani, 2015; Dippo and Kudrowitz, 2013; Bayer-Hohenwarter, 2010;

Chakrabarti, 2006; Almeida et al., 2008

Usefulness Sarkar and Chakrabarti, 2011; Diedrich et al., 2015;

Kornish and Jones, 2021; Takai et al., 2015; Bayer- Hohenwarter, 2010; Chakrabarti, 2006; McCarthy, 2018 Relevance Camburn et al., 2020; D'Souza, 2021; Cheung et al.,

2003

Uniqueness Demirkan and Afacan, 2012; Diedrich et al., 2015;

Cheung et al., 2003; Kornish and Jones, 2021; Al‐

Zahrani, 2015; Vargas Hernandez et al., 2012; Takai et al., 2015; Dippo and Kudrowitz, 2013; Bayer-

Hohenwarter, 2010; McCarthy, 2018; Sarkar and Chakrabarti, 2011; Almeida et al., 2008

Clarity Chaudhuri et al., 2020; Chakrabarti, 2006;

Choice of colours Demirkan and Afacan, 2012; Chaudhuri et al., 2021b Sketching ability Takai et al., 2015; Garaigordobil, 2006; Demirkan and

Afacan, 2012; Schumann et al., 1996; Almeida et al., 2008

Language processing Benedek et al., 2016; Cheung et al., 2003

Narration/coherence Pérez and Sharples, 2001; Wu, 2013; D'Souza, 2021;

Demirkan and Afacan, 2012; Takai et al., 2015

The population size of this study is unknown, and there are fewer experts available in this domain. A confidence interval of 0.75 was considered in this study, which is commonly practiced in design, educational, and social researches with 0.05 marginal errors (Krejcie &

Morgan, 1970). The sample size is calculated to be seventy-one (N=71) subjects. Subjects considered for this study are experts from reputed private and government Design schools, and Designers from industry are selected for this study. The average age of the subjects was between 39- 62 years (M=47.92, S.D. =5.52), and 46% were female (N=33). The survey was conducted mostly on-site, i.e., in department of numerous Design schools and industries.

However, 3% of the data were acquired by online survey form due to unavailability of the experts. Pedagogues were inquired about parameters required to evaluate novelty in mass examination where students aspire admission to Design schools. The selection parameters were based on their opinion and a set of options provided in the questionnaire.

3.2.2 System architecture

A model is proposed based on the parameters captured from the survey viz., 1) grammatical mistake, 2) misspelling, 3) relevance between question and response, 4) narration or coherence, and 5) relative uniqueness of a response. The input to the architecture is the digitized questions and creative responses exhibiting creative aptitude, and the outcome is the novelty score. A score like this, which is based on a fixed set of parameters intends at generating consistent and unambiguous support for evaluating novelty in Design entrance examination.

Initially, language processing was conducted to identify grammatical mistakes and misspellings in students’ responses. Language processing is significant in evaluating novelty as it supports experts in comprehending responses to their target audience. The processing is conducted on an online language processing tool. The Application Programming Interface (API) provides multiple types of semantic and syntactic errors. Next, relevance between a question and an response was verified in order to confirm that the response fits boundary of a question. The questions and responses were individually tokenized. The tokenized questions were fed into the doc2vec model (Devlin et al., 2018; Mikolov et al., 2013; Skansi, 2018). This model is responsible for converting textual data of questions into numerical vectors. The outcome of this model is 300 dimension words embedding of a given document. Similarly, tokenized creative responses were also fed into the doc2vec model. This model also provided 300 dimension word embedding. Both vectors were matched by using cosine similarity

function (Skansi, 2018), which provided scores of similarity between a question and response in the range of −1 and 1, indicating lower relevance to higher relevance of descriptive creative responses.

Narration or coherence between sentences is also a pivotal construct to evaluate novelty.

Coherence of a response indicates that the answer presents a proper description without diverging to irrelevant contexts. Coherence in a creative response is significant in evaluating novelty as it supports experts in comprehending novelty of responses. Coherence of responses was identified by Bidirectional Encoder Representations from Transformers (BERT), which predicts the probability of the next sentence being the next sentence. Explicitly, this model accepts a pair of a sentence and returned the tensor of two values. The first value, when passed through a normalization function, gave the probability of the next sentence being the next sentence. The second value, when passed through a normalization function, gave the probability of the next sentence not being the next sentence. If the probability of the first value was less than 0.75, then it was considered to be a narration break (Le & Le, 2013). However, this threshold of considering breakage in narration can be altered depending on the type and difficulty level of an examination.

The relative uniqueness of a response determines its novelty. A clustering algorithm was implemented to evaluate uniqueness of responses, which groups semantically similar responses together in a cluster. The cluster containing fewer descriptive pattern of creative responses indicates that it is unique. However, to get a concise set of responses, three cases were considered and experimented before applying clustering algorithms. They were 1) responses were summarized by abstractive text summarization algorithm (Vodolazova et al., 2013), 2) responses were summarized by extractive text summarization algorithm (Kubat, 2017), and 3) responses were not summarized. The length of responses was considerably high; hence text summarization intend to find central theme of responses. However, many text summarization models require a large volume of data to get trained (Kubat, 2017). Since there was a dearth of dataset to train these models, summarized themes obtained from test dataset were irrelevant to a given context. The digitized responses were directly used for clustering to find infrequent responses. Initially, k-Means clustering algorithm was deployed to find groups of semantically similar responses, but it is subjected to drawbacks of initialization of clusters. Then Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) clustering algorithm was used for clustering the answers, which provided a stable set of clusters (Zhang et al., 1997). A

demonstrative rule-base system was framed to decide threshold for acceptance of responses, number of errors in responses, and scores. This rule-base was formed by brainstorming with experts possessing a minimum of ten years of experience in evaluating novelty in creative responses illustrating creative aptitude in Design education. However, these rules cannot be standardized, and it is entirely dependent on the type and level of an examination. Type of an examination indicates that the test may be institutional, nationalized, etc., whereas level of examination refers to the degree of difficulty of test, which may be easy, moderate, difficult, very difficult, etc. Moreover, pedagogues might change threshold value of parameters depending on their stringent or lenient behaviour towards students. Finally, summative assessment of all the features was performed to evaluate novelty score. The details of each of the processes involved in this model are illustrated in Figure 3.2. This model is validated using Mean Squared Error (MSE) and Mean Absolute Error (MAE), where the errors were found to be negligible.

Figure 3.2: Detailed architecture of evaluating descriptive pattern of creative responses

Dalam dokumen Digitizing Assessment of Creative Aptitude: A Human-Centred Design Approach (Halaman 88-93)