Aptitude Testing for Selection

List of Appendices

Chapter 2 Literature Review

2.1 Aptitude Testing for Selection

non-school leavers, the Undergraduate Medical Admissions Test (UMAT), and the Graduate Australian Medical Schools Admissions Test (GAMSAT). In Sweden, there is the Högskoleprovet and the Swedish Scholastic Aptitude Test. Admission tests in the UK include the History Aptitude Test, the National Admissions Test for Law (LNAT), and the United Kingdom Clinical Aptitude Test (UKCAT).

In the US there are many admission tests, but the most well known ones are the SAT, ACT, the Graduate Record Examination (GRE), the Medical College Admission Test (MCAT), the Graduate Management Admission Test (GMAT), and the Law School Admission Test (LSAT). The SAT and ACT are admission tests for undergraduate levels and the rest are for postgraduate levels.

Although many countries have admission tests, there is not much information about these tests in the literature. Discussions as well as research studies that have been reported mostly concentrate on the admission tests in the US. Therefore, the information about admission tests in this section is mainly drawn from the US literature.

Some admission tests used in the US are categorized as aptitude tests such as SAT I or SAT Reasoning, GRE General, GMAT, and LSAT, while others such as SAT II or SAT subjects, ACT, and GRE Subject Tests are categorized as achievement tests. A test such as the Medical College Admission Test (MCAT) measures both aptitude and achievement as it consists of a Verbal Reasoning section which measures aptitude and a Science section which measures knowledge in science subjects. In Australia, the admission tests such as STAT, UNITEST, and UMAT can be categorized as aptitude tests. They all measure reasoning and thinking skills, while the GAMSAT measures both aptitude and achievement (ACER, 2007b).

2.1.1 The case of SAT

The SAT is perhaps the most widely known admission test in the US. This test attracts much attention and controversy. Many scholars have raised criticisms about the SAT (Crouse & Trusheim, 1988; Lemann, 1999; Owen & Doerr, 1999; Zwick, 2004). As indicated earlier, the SAT consists of two types of tests. SAT I, now called SAT Reasoning, measures reasoning and thinking skills; SAT II or SAT Subject, measures knowledge in certain subject areas. The SAT I or SAT Reasoning is the more controversial of the two tests.

The test has been criticized as being biased against minority groups and women, lacking predictive validity, having limited utility in making admission decisions, and being vulnerable to coaching effects (Crouse & Trusheim, 1988; Linn, 1990). It has also been criticized for being used as an indicator of school quality and for disadvantaging lower social class students since they do not have the same access to test preparation (Syverson, 2007). Other criticisms are that the SAT leads to overemphasizing test preparation in regard to content which is not relevant to school subjects and which does not provide information about how well the students perform and how to improve their skills (Atkinson, 2004).

Several changes have been made during its development. These involve the question types, testing times to ensure the speed factor does not affect test performance, and test administration such as permission to use a calculator in the mathematics section (Lawrence, Rigol, Van Essen, & Jackson, 2004). The name was also subjected to change. Originally, SAT stood for the “Scholastic Aptitude Test” and it then changed to the “Scholastic Assessment Test”. Now SAT is no longer an acronym, but just the name of the test (Noddings, 2007; Zwick, 2004).

The modifications to the test were partly made in response to the criticisms. The new SAT, administered in 2005, for example, was a result of criticisms made by the University of California president, Richard Atkinson, in 2001 (Zwick, 2004). However, the criticism has not lessened. Since the new version was released it has drawn more criticism than ever before (Syverson, 2007).

Changes have taken place not only in the SAT but also for other admission tests. The GRE, for example, consisted of Verbal, Math, and Analytical Ability sections prior to 2002. In the current version, the Analytical Ability section has been replaced by Analytical Writing (GRE, 2007). Despite some significant changes in some major admission tests (SAT, MCAT, GRE, LSAT), “the fundamental character of the tests remains largely constant” (Linn, 1990, p. 298).

2.1.2 Aptitude versus Achievement

In general, as stated above, admission tests can be categorized into two groups, achievement and aptitude. A popular but misleading conception is that aptitude tests measure innate abilities (Lohman, 2004). Criticism about aptitude tests partly results from this misconception (Atkinson, 2004) and misunderstandings about the relationship between aptitude and achievement tests (Gardner, 1982).

Both aptitude and achievement tests measure developed abilities since, “all tests reflect what a person has learned” (Anastasi, 1981, p.1086). The difference between these two tests is that achievement tests measure certain experiences which can be identified, while aptitude tests measure broad life experience (Anastasi, 1981). However, it is not easy to distinguish between aptitude and achievement tests. The difference between them is relatively subtle. As Gardner (1982, p. 317) puts it, “aptitude tests cannot be designed that are completely independent of past learning and experience and achievement tests cannot be constructed that are completely independent of aptitude”.

Nonetheless, “an aptitude test should be less dependent than an achievement test on particular experiences, such as whether or not a person has had a specific course or studied a particular topic ” (Wigdor & Garner, 1982, p. 28).

Anastasi (1981) also characterises the difference between achievement and aptitude tests in terms of the test’s purpose. The primary purpose of an aptitude test is for predicting, while that of the achievement test is for evaluating performance in certain programs. However, again, it is acknowledged that this is not a clear-cut distinction;

some achievement tests may be used to predict future performance. Therefore, it is not surprising that achievement and aptitude are related; GPA in high school and SAT scores are positively correlated (Andrich & Mercer, 1997); ACT scores are highly correlated with SAT scores (Wigdor & Garner, 1982; Briggs, 2009).

For the purposes of selection, it is generally recommended to use more than one source of information. Information from aptitude tests, for example, should be used with other information such as academic achievement, since in general it yields a better prediction than either one of these alone (Gardner, 1982; Linn, 1990).

2.1.3 Predictive Validity of Aptitude Tests as Admission Tests at Undergraduate and Postgraduate Levels

Much research has been conducted to study the predictive validity of aptitude tests as admission tests at undergraduate and postgraduate levels. In the case of SAT, based on data from hundreds of institutions, the correlations between composite SAT and first year grades ranged from 0.27 to 0.57, with a mean of 0.42 (Shepard, 1993). In a recent study on a new version of the SAT, with a sample of 193,364 students from 110 colleges and universities, Kobrin et al. (2008) found that the correlation between the Verbal SAT and first year grades was 0.29 before correcting for attenuation and 0.48 after correction. A similar figure was found between the mathematics or quantitative

SAT and first year grades, that is 0.26 before correcting for attenuation and 0.48 after correction.

For the postgraduate level, the correlations of admission tests (for example GRE, GMAT, MCAT) with the criterion of first year grades had average or median values of between 0.30 and 0.40 (Linn, 1990).

Kuncel, Hezlett, and Ones (2001) conducted a meta-analysis study of the predictive validity of the GRE, which consists of four components: Verbal, Quantitative, Analytical (Reasoning), and Subject matter with using some criteria. One of the criteria was graduate GPA. It was reported that with the graduate GPA as the criterion, for all samples the average correlations for the Verbal, Quantitative, Analytical Reasoning and Subject components were 0.23, 0.21, 0.24, and 0.31 respectively, with standard deviations of 0.14, 0.11, 0.12, and 0.12 respectively. By correcting for the restriction range and the unreliability of the criterion, the correlations increased to 0.34, 0.32, 0.36, and 0.41 respectively, with standard deviations of 0.15, 0.08, 0.06, and 0.07 respectively. The size of the correlations was similar to that reported by Linn (1990).

In their study, Kuncel et al. also analysed the predictive validity in each of four fields of study: Humanities, Social Science, Life Science, and Math-Physical Science. The observed correlations between GPA and the Verbal subtest for those fields of study were 0.22, 0.27, 0.27, and 0.21 respectively. For the Quantitative subtest, the values were 0.18, 0.23, 0.24, and 0.25 respectively. For the Analytical subtest, they were 0.33, 0.26, 0.24, and 0.24 respectively, and for the Subject test they were 0.37, 0.30, 0.31, and 0.30 respectively.

The above findings show the variance of academic performance in the university explained by aptitude test score was less than 25 %. This figure is considered small, however, it is understandable because many factors influence academic performance,

and reasoning as measured by a scholastic aptitude test is just one factor. It is also argued that even a small correlation is useful because it can improve the selection procedure significantly (Anastasi & Urbina, 1997; Kuncel et al., 2001; Nunnally &

Brenstein, 1994).

2.1.4 The Controversy of Aptitude Tests in Selection

As mentioned earlier, there are criticisms and controversies surrounding admission tests, especially aptitude tests. Atkinson (2004), whose criticism in 2001 had a major impact on the current SAT format, for example, proposed not to use SAT Reasoning as an admission test at the University of California. His proposal was based on research results conducted in the university which indicated that SAT Reasoning was not a good predictor of academic performance. In fact, research found that SAT Subject is a better predictor than SAT Reasoning and is less affected by differences in socioeconomic background. He also criticized the effect of SAT testing on school curricula, arguing that much time is spent on preparing for a test whose content are not related to school subjects. He argued that students should study material which is relevant for schools or colleges.

Lohman (2004), however, argues that aptitude tests are an important tool in student selection. According to him, aptitude is not the most important factor but it makes significant contribution to predicting academic success, especially if the content or the field of study is different from students’ past experience. In other words, aptitude is especially significant in novel situations. He notes that in studying the contribution of aptitude tests (in this case SAT) in predicting college academic performance, the criterion of academic performance plays a significant role. A conclusion drawn by Willingham, Lewis, Morgan, Ramist in 1990 (cited in Lohman, 2004) is that when the

criteria are grades in a particular course, rather than GPA, SAT is a better predictor than high school GPA.

Linn (1990, p. 303), based on a review of a thousand studies, concluded that

1) Admission tests have a useful degree of relationship with subsequent grade or other indicators of academic performance. 2) Tests in combination with previous grades yield better prediction than either alone. 3) Due to artifacts that attenuate relationships, the observed correlations in selected samples understate the predictive value of the tests and previous grades. 4) Nonetheless, the predictions are far from perfect. Thus, substantial error in prediction can be expected even under the best of circumstances.

With respect to the criticism that SAT disadvantages students from lower socioeconomic backgrounds, a recent study by Zwick and Green (2007) confirmed the previous results that SAT scores are related to SES. This study, which also compared the relation between SAT score and school grade with SES, showed that SES influences SAT as well as high school grade. It is argued that SES inevitably has an impact on students’ learning, either specifically related to school or learning in general. So SES has an influence on aptitude and achievement. It is the realisation of the common effect of SES on both kinds of tests, achievement and aptitude, that explains why both tend to be criticized for their tendency to disadvantage students from lower SES background.

Another criticism of aptitude tests relates to their vulnerability to coaching. Claims that coaching can increase scores, in particular SAT scores, are mostly made by coaching companies. However, claims that coaching increases scores substantially are not always true (Powers and Camara, 1999; Briggs, 2009). Many claims are based on weak evidence. They are based only on the data of students who attended coaching programs only. Some of the score increases of those attending coaching program could be due to chance or due to practice effects. For example, an unpublished study conducted by Franker in 1986-1987 (cited in Powers and Camara, 1999) reported that the average

increase in the total SAT score for students who attended a coaching program and for those who did not was the same, that is 80 points.

To examine the effect of coaching, comparisons between coached (experimental group) and uncoached (control group) students need to be made. However, it is difficult to control factors to ensure that any difference between those groups can be attributed only to coaching. Therefore, the real effect of coaching is still not known as the coached and uncoached groups often differ in other aspects. For example, in the Powers and Rock study, eventhough students in the uncoached group did not attend the coaching program, they prepared for the test in other ways. Nevertheless, the coaching effect seems to be fairly consistently estimated across studies. Powers and Rock found that the increase in mean scores for the Verbal SAT of coached and uncoached groups were 29 and 21 respectively, while the increase in Math SAT scores were 40 and 22 respectively. In their review of some studies Powers and Camara (1999) found a similar figure: the mean score increase for the Verbal SAT was between 9 and 15 points and that for Math SAT between 15 and 18 points. Briggs (2009) found a comparable average effect of 8 points increase for the Verbal SAT and 15 points increase for the Math SAT .

The effect of coaching is very small and it is difficult to determine whether the score increase is due to coaching or measurement error.

In terms of its effect on test validity, coaching is not always bad. In some cases coaching can improve the validity of inferences made from test scores (Anastasi, 1981). Two kinds of coaching can achieve this. The first is coaching with the purpose to minimize the differences in test familiarity among test takers. The second is coaching with the purpose to improve broad cognitive abilities; if the coaching succeeds, then it will improve the test score and also criterion performance. Therefore, while an

individual’s ability is improved, the validity of inferences from the test scores is not reduced. A type of coaching that can reduce the validity of the inferences of the test scores is when the purpose of coaching is to train test takers with similar items. If this coaching leads to a higher test score but no improvement in criterion behaviour, then the validity of the inferences from the test is reduced

2.1.5 The Current Usage of Aptitude Tests in the Selection Processes

In the US with its decentralized system of education, including a decentralized curriculum, for decades selection has employed admission tests at both undergraduate and postgraduate levels, including professional programs. These tests provide one common criterion for all applicants. The admission tests for undergraduate level entry are SAT and ACT, while for postgraduate entry GRE is used for the general program, the GMAT for business schools, the MCAT for medical schools, and the LSAT for Law schools (Linn, 1990).

However, recently the emphasis on admission tests, especially aptitude tests, such as SAT Reasoning, has decreased. A relatively large number of colleges even adopt a Test-Optional Policy, which allows the applicants to choose either to submit admission test scores or not (Syverson, 2007). Many universities tend to take a holistic approach, using various instruments in assessing candidates, for example, besides SAT scores which may be optional, they also use portfolios, essays, interviews, grades, and class ranking (Syverson, 2007; West & Gibbs, 2004).

While in the US the usage of aptitude tests, particularly the SAT, is decreasing, in other countries such as Russia and the UK, aptitude tests are being considered as selection tools. In Russia, the standardized aptitude test, similar to SAT, was intended to be used as a selection instrument across the country by 2009 to replace high school final

examination and university admission tests (MacWilliams, 2007). This new selection method is expected to lead to fairer and less corrupt university admission procedures.

In the UK, performance in the General Certificate of Education (GCE) at Advanced Level (A-level), which is an examination of achievement, has been the main selection test to enter a university. For many years, there has been a debate as to whether it is worthwhile to also use aptitude tests similar to the SAT as selection instruments. Again, the argument is that it will give greater opportunities for students from lower social economic backgrounds (West & Gibbs, 2004). Some universities in the UK have used aptitude/reasoning tests, in addition to other selection tools, to obtain more information on applicants’ academic ability. The University of Cambridge and the University of Oxford, for example, have used the Thinking Skills Assessment (TSA) test to select undergraduate students for some courses (Cambridge, 2008; Oxford, 2008).

Similar to the UK, Australia also uses high school examination achievement as the criterion to enter university for school leavers. This is called the Tertiary Entrance Rank (TER) or Equivalent National Entry Rank (ENTER) for Victoria, and Universities of Admission Index (UAI) for New South Wales and the Australian Capital Territory (TISC, 2007). The achievement score on each subject is generally a combination of school based assessment and external examination. However, for those who do not have a recent TER, for example mature age applicants, performance in a standardized aptitude test is required to enter some universities. This test is called the Special Tertiary Admission Test (STAT) which is intended to measure critical thinking in verbal and quantitative areas. STAT is also used as a selection criterion for specialist courses (ACER, 2007a). Another aptitude test called uniTEST has been applied by some universities in Australia (ACER, 2007b).

As indicated earlier, in Indonesia, the achievement admission test to enter public universities has been complemented with a scholastic aptitude test from 2009.

It appears that despite the controversy, the aptitude test will continue to be used in practice. It is used either on its own or to provide information not given by an achievement test and therefore to complement the achievement test, which may yield a better prediction of performance.

There are at least four ways in which to use achievement and aptitude tests together in selection. The first is simply by taking a total score. In this way scores on the two assessments compensate each other. A second way is to require high scores on both.

This would restrict entry more than if only one test was used. A third way is to require a high score on only one of these tests, with perhaps a minimum score on the other. This approach enhances entry relative to a high score on just one. In particular, students from educationally disadvantaged backgrounds would have a better chance of being selected.

This might operate differently in different areas of university study. A fourth way is to form a prediction equation with a criterion and use multiple regression to derive empirical weights.

Dalam dokumen Evaluation of the Indonesian Scholastic Aptitude Test According to the Rasch Model and Its Paradigm (Halaman 42-53)