Abstract
Assessment is an integral component of a learning environment that involves planning and administering different assignments, examinations to gauge under- standing of students on a particular topic. Though Selected Response Written Assessment involving multiple-choice tests are easy to grade and quickly help instructors to gain an overall idea about students’ understanding, they fail to assess their deeper understanding and logical interpretation skills. This leads to requirement of Constructed-Response Written Assessment demanding text-based answers from students in the form of essays or short-answers that show great potential in testing such skills. In this kind of assessment involving short-answers, students are compelled to present their answers in their own words about facts or knowledge on specific topics. These text-based short answers provided by students may be assessed through summative assessment and formative assessment strate- gies. Summative assessment quantifies learning progress of students at the end of an instructional unit through grading of short answers in a standardized scale.
Formative assessment is administered in the form of small tests or quizzes during a particular course of study. These small tests determine and subsequently improve the quality of learning progress of students with meaningful feedback such as gaps identified in their answers. Due to the massive requirement of human graders to grade innumerable student answers as well as to provide feedback in the form of gaps in student answers, programmatic or automatic grading of answers as well as automation of gap-detection in answers is the need of the hour. Keeping this in view, the primary objective of the work presented in this thesis are designing an Automatic Short- Answer Grading (ASAG) system that aims to grade students with performance equivalent to that of human graders and automating process of formative assessment with design of computational models for Automated Gap Analysis (AGA).
A detailed study of different ASAG systems provided a comprehensive view of the feature spaces explored by previous works. We compensate for the lack in systematic study of features by systematic feature space exploration. We also presented ensemble methods that have been experimentally validated to exhibit significantly higher grading performance over the existing works in almost all the datasets in ASAG domain. A comparative study over different features and regression models towards short-answer grading has been performed with respect to evaluation metrics used in evaluating ASAG. Apart from traditional text-similarity based features like WordNet similarity, Latent Semantic Analysis, and others, we have introduced novel features like topic models suited for short text, relevance
xxiii
Abstract
feedback based features. An ensemble-based model has been developed using a combination of different regression models with an approach based on stacked regression. The proposed ASAG model has been tested on University of North Texas dataset (UNT) for the regression task, whereas in case of classification task, the Student Response Analysis (SRA) based ScientsBank and Beetle corpus have been used for evaluation. The grading performance in case of ensemble-based ASAG is highly boosted from that exhibited by an individual regression model.
Extensive experimentation has revealed that feature selection, introduction of novel features and regressor stacking have been instrumental in achieving considerable improvement in performance over the existing methods in ASAG domain.
Formative assessment demands identification of gaps in student answers to construct meaningful feedback to be provided to the students. This form of analysis of student answer, also known as Automated Gap Analysis (AGA), has tremendous implication in designing intelligent tutoring systems and smart learning environments. We have adopted graph-based representations of the answers to identify gaps. Two variants of graph representation namely, directed and undirected, have been explored to address Automated Gap Analysis problem. The core of the approaches relies on alignment of a pair of undirected or directed answer graphs representing a pair of student answer and corresponding model answer for a given question. The gaps are predicted by inspecting the nature of alignment. A gold standard dataset for AGA problem has been developed considering student answers and corresponding model answers to questions from well-known datasets belonging to short answer grading domain such as University of North Texas (UNT), SciEntsBank and Beetle. Evaluation metric based on traditional machine learning such as Macro-average F1 score has been used to evaluate the proposed approach that is tested on the gold standard dataset. Overall, the proposed approaches have fairly outperformed baseline approach by a noticeable margin.
Keywords: Automatic Short Answer Grading, Topical similarity, Relevance feed- back, stacked ensemble, Automated Gap Analysis, Undirected graph-alignment, Directed graph-alignment, Automated assessment, Programmatic grading.
xxiv