Chapter 1. Introduction 1
3.3 Results and discussion
3.3.2 Language processing
An API based service of an online platform Language Tool (https://languagetool.org/) (LanguageTool – Online Grammar, Style & Spell Checker, 2019) was used to conduct language processing of descriptive pattern of creative responses exhibiting creative aptitude. The API returned a .JSON format data for every text, which contains linguistic errors along with category of each errors (e.g. ‘grammar’, ‘duplication’, ‘non-conformance’, ‘misspelling’,
‘typographical error’, etc.). These categories of errors were segregated into two divisions-
grammatical mistakes and misspellings. Grammatical mistakes belong to ‘grammar’ and ‘non- conformance’ category whereas, misspellings belong to the ‘misspelling’ category. The errors of all other categories, such as ‘typographical error’, ‘duplication’, ‘extra whitespace’, etc. were neglected since digitized text format was considered in this context. The representation of the .JSON file is shown in Figure 3.3.
Figure 3.3: .JSON file returned by language tool
Subsequently, scores for grammatical mistakes and misspellings were calculated for every responses. A rule-base was formulated based on the parameters for scoring the creative responses. It is to be noted that the rules are subject to change for every examination, and it depends on numerous factors such as the type of an examination (e.g., national, institutional, etc.), level of an examination (easy, moderate, difficult, very difficult, etc.), behaviour of pedagogue (stringent, lenient), etc. Here, an illustrative example of a rule-base is shown in Table 3.8. The number of sentences present for each response was calculated. Each set of sentences had a assumed threshold value associated with it. If number of errors exceeded the threshold value, the response belonging to that particular category of the number of sentences is assigned the minimum score above the threshold (Penumatsa et al., 2006). However, if number of errors is equal to or less than the threshold value, then the score is assigned according to the formula shown in Equation 3.1.
Table 3.8: Rule-base for assigning grammatical scores
No. of Sentences (L to U) Threshold Value (TG) Minimum Score Above Threshold
1 – 10 3 0.1
11 – 20 5 0.09
21-30 7 0.08
31-40 9 0.07
41-50 11 0.06
*N.B.:1) The same pattern follows for forthcoming number of sentences.
2) This rule-base is subject to change for every examination.
Score below Treshold = 1 − 𝑁
1 + 𝑇𝐺 (3.1) Where N is the number of grammatical errors, and TG is the threshold value of that category given in Equation 3.2.
Threshold value, 𝑇𝐺 = 1 +2 ∗ 𝑈
10 (3.2) Where U is the upper limit of the number of sentences in each category.
The minimum score above the threshold for a category is given by Equation 3.3.
Minimum Score above Threshold = 0.1 −((𝑇𝐺− 3) ∗ 0.01)
2 (3.3) A similar rule-based system was framed for assigning spelling scores for each response, as illustrated in Table 3.9.
Table 3.9: Rule-base for assigning spelling scores
No. of Sentences (L to U) Threshold Value (TS) Minimum Score Above Threshold
1 – 10 6 0.1
11 – 20 9 0.09
21-30 12 0.08
31-40 15 0.07
41-50 18 0.06
*N.B.:1) The same pattern follows for forthcoming number of sentences.
2) This rule-base is subject to change for every examination.
Similar to the grammar checking rule-based system, this parameter also had a threshold value and a minimum score above the threshold associated with each category. The minimum score above the threshold is assigned to any response having spelling errors exceeding the specified threshold value. If the number of errors is less than or equal to the threshold value, the score assigned is given by Equation 3.4.
Score below Threshold = 1 − 𝑁
1+𝑇𝑆 (3.4) Where N is the number of spelling errors, and TS is the threshold value of that category given by Equation 3.5.
Threshold value, 𝑇𝑆 = 3 (1 + 𝑈
10) (3.5) The minimum score above the threshold for a category is given by Equation 3.6.
Minimum Score above Threshold = 0.1 −((
𝑇𝑆
3)−2)∗0.01
2 (3.6) Thus, a grammar and spelling score, between 0 and 1 can be assigned to each response based on the number of grammatical errors and misspellings. When number of grammatical errors increases, then it is essential to reduce the grammatical score. For example, when there was no mistake in a response, then the highest score was considered as 1. When the upper limit of the number of lines was 10, then a score of 0.75 score was considered with one error. Similarly, when the upper limit of the number of lines was 10, a score of 0.5 was considered for two errors, and so on, as shown in Figure 3.4. However, when number of errors surpasses the threshold, score decrements corresponding to a range of lines.
Figure 3.4: Number of grammatical mistakes vs. grammatical scores when upper limit of number of lines is 10
Similarly, evaluating misspellings of responses was essential as it supported in understanding the context by readers and thereby contributing to novelty. When number of misspellings increases, scores for spelling decreases according to rule-base. The thresholds in rule-base were marked by domain-specific experts. A response with no misspellings received the highest score
of 1. When the upper limit of lines 10 with no error, the score was 1; with one error, the score was 0.85; with two errors, the score was 0.71; with three errors, the score was 0.57, and with seven errors which was above the threshold for that range of lines the score was 0.1 as illustrated in Figure 3.5. Detection of spelling errors is the most elementary and significant step in evaluating novelty. During a real-time evaluation of answer scripts, misspellings always have an adverse effect on perception of evaluators. Sometimes it might also convey a wrong meaning, which reduces the impact of novelty.
Figure 3.5: Number of misspellings vs. spelling scores when upper limit of number of lines is 10