• Tidak ada hasil yang ditemukan

Forms That are Based on Student Achievement Measurement .1 National assessment programs

Basics of Educational Evaluation

3.2 Forms That are Based on Student Achievement Measurement .1 National assessment programs

General description

Assessment programs consist of educational achievement tests that are meant to monitor acceptable levels of performance in the basic school subjects in a country. Likely age- levels at which the tests are taken are 11/12, (end of primary school), sometimes also 14/15 (end of lower secondary school). Assessment tests in a particular subject need not be administered each year; for example, when there are 6 subjects in the assessment program, each subject may be tested every 6 years. Application of multiple matrix sampling, however, makes a more frequent testing (shorter time interval) for each subject matter area feasible. Typically national assessment programs will target samples of students. Depending on whether conclusions about schools as a particular organizational level would also be aimed for, sampling design would need to accommodate this by ensuring a sufficient number of students per school.

Main audiences and type of use of the information

Decision-makers at the central level, i.e. the Ministry of Education, parliament, organizations representing stakeholders in education like school governors, teachers, parent association, employers are also relevant.

The information from assessment programs can lead to adaptations in the curriculum in the sense of goals (standards) or means (curriculum contents) and all conditions that have an impact on the performance in a particular subject (e.g. teacher training in the particular subject matter area, the textbook-industry, use of computers).

Technical issues

Norm- versus criterion referenced testing. Procedures for standard-setting. Psychometric properties of the tests, in particular the content validity of the tests (do test adequately represent the universe of subject-matter elements in the specific curriculum domain).

Sampling issues; not all students need to do all tests, application of multiple matrix sampling (a technique where students do sub-sets of items of comparable content and difficulty level).

Technical and organizational capacity required

Skill-areas that should be covered are: subject-matter expertise, skills in curriculum analysis, skills in writing test-items. Expertise in psychometrics, methods of standard settings, sampling expertise. Communicative and PR skills in disseminating information to decision-makers and the education field.

Concerning the organizational infrastructure the degree to which specialists in subject- matter and subject-related didactics are organized in special interest groups is relevant for mobilization of this expertise. The same applies to curriculumdevelopment institutions.

Depending on the size of the country a specialized institute like ETS in the USA or CITO in the Netherlands could be considered, at least a specialized unit as part of the “techno- structure” of a Ministry of Education would be required. In case of a smaller assessment unit organizational links with curriculum and subject-matter specialist units is very important. Technical support concerning logistics of distribution and retrieval of test- material from schools, dataanalysis and reporting should also have a place in either a specialized institute or a network with sufficient cohesion. Boards of officials and experts should be created to authorize newly developed tests.

Controversial points

Controversy about national assessment programs can arise with respect to the scope of what is being measured. The often-heard argument is that important goals of education cannot be measured. Also the issue, referred to in the above, of curriculum-tied, as compared to “cross-curricular competencies” can be a controversial issue. In developing countries expectations about low performance as compared to industrialized countries might be a difficult point.

3.2.2 International assessment programs General description

Over recent years there has been an increased interest from governments and international organizations in international assessments. Examples are:

• the Third International Mathematics and Science Study-Repeat of the IEA (TIMSS-R);

• the Civic Education Study (CivED) of IEA;

• the OECD Program for International Student Assessment (PISA)

• the IEA Progress in Reading Literacy Study (PIRLS) and

• the Adult Literacy and Lifeskills (ALL)Study (formerly ILSS).

There are two major advantages for taking part in these international assessment studies.

The first is practical: if a country does not already have a national assessment program, important developmental costs can be foregone by making use of the internationally available instruments. This can be the case, even if instruments are modified or extended according to the specific national circumstances. The second potential advantage is the opportunity to compare national performance levels to international standards. This application of comparative “benchmarking” could be seen as an important feature of accomplishing globalization of educational provisions. Of course this possible advantage of international standardization can also be seen as an undesired uniformity. Perhaps a compromise could be found in defining a set of core competencies, which would be expected to meet international performance standards next to a set of more country- specific, or region specific standards.

Main audiences and types of use of the information These are more or less the same as in national assessment programs.

Technical issues

Making tests internationally comparable is the biggest challenge for international assessment programs. The range of difficulty levels on the scales should be sufficiently broad to cover potentially large differences in achievement levels between countries.

IRT-modeling is important for this. Remaining comparability problems can be tackled by means of national options and “add on-s” and by measuring test-curriculum overlap or

“opportunity to learn”.

Technical and organizational capacity required

Much of the technical capacity for international assessment programs will be located with the international study-coordinating organization, which may be a consortium of top- level institutes at the global level.

Usually at national level a small team with the required research-technical skills and logistic facilities is sufficient to carry out the work at national level.

Controversial points

The main controversy had already been referred to in the above: can specific national priorities sufficiently be represented in international test programs.

3.2.3 School performance reporting General description

School performance reporting (SPR) is a prototype of accountability oriented assessment.

It uses statistical information and/or achievement tests to generate output indicators per school. These are then made public, as, for example, in the form of “league tables”

(rankings of schools) that are published in the newspapers.

The achievement test data used for SPR could have various sources:

• national assessment tests;

• tests used in student monitoring programs (an M&E type that will be described further on);

• examinations.

Examples of statistical performance indicators are the success rate (e.g. finalizing the period of schooling without delay), average absenteeism, drop-out and classrepetition rates).

An important issue is whether or not output indicators should be adjusted for previous achievement or other relevant student background characteristics (the issue of “value-

added” output indicators). Another question is whether or not school process or input indicators should be included in the school reports.

Main audiences and types of use of the information

The results of SPR is meant to be used by administrative levels above the school, like municipalities, regional and central government and/or by the consumers of education. In countries with freedom of school choice, parents could make use of this information to select a school for their children.

Decisions about school funding could be made dependent on the results of SPR. Next, different “markets” might use the information for either selecting a particular school or not: markets of parents choosing schools, markets of teachers choosing a school and schools actively marketing themselves with respect to these audiences.

As a “side-effect” schools might also use the information from SPR to diagnose their own performance and use it to target improvement oriented measures. In fact, empirical results indicate that this latter use may be even more important than the accountability oriented uses (cf. Bosker & Scheerens, 1999).

Technical issues

Computing value-added performance indicators is a technical problem both in the sense of statistical analysis as in terms of communication. Although the value-added option may be considered as the fairer one to judge schools, its meaning may be difficult to communicate to broad audiences. Besides, “raw” outcome scores are also informative.

Technical and organizational capacity required

This is highly dependent on the provisions for the basic assessment, monitoring and evaluation types that SPR is likely to depend on. If these are in place a relatively small research team, which contains a unit of data-analysts, would be sufficient.

Controversial points

SPR is quite controversial as it can be seen as stimulating selection mechanisms that are not easily reconcilable with the ideal of equity in education. When the stakes for schools are made high, undesired strategic behavior to artificially create higher scores are likely to occur.

3.2.4 Student monitoring systems General description

Student monitoring systems operate at the micro level (class level) of educational systems.

Basically student monitoring systems are sets of educational achievement tests that are used for purposes of formative didactic evaluation. An important function is to identify those pupils who fall behind, and also to indicate in which subject matter areas or skills they experience difficulties.

Items should preferably be scaled according to a particular IRT model. Student monitoring systems should be longitudinal and allow for the “following” of students throughout a particular educational program. For example in the Dutch Leerlingvolgsysteem for primary schools two tests per grade level are administered. The scope of a student monitoring system depends on the number of school subjects that are covered.

Main audiences and types of use of the information

Student monitoring systems are used in the interaction between teachers and students.

Apart from the achievement tests remedial material should be seen as the major component of a pupil monitoring system. One type of remedial follow-up material consists of guidelines for further diagnosis of deficiencies. Exercises to remedy deficiencies form another. Such exercises take the form of performance tasks to stimulate learning.

Technical issues

Test construction is an important technical issue. Because of the intended longitudinal use of the instruments “vertical equating” is an essential asset. This requires scales that confirm to the assumptions of IRT models. An important precondition for the curriculum validity of the tests is that there is at least consensus about the educational objectives at the end of the program. If prescribed curricula do not exist they need to be

“reconstructed” in the form of a sequence of subject-matter areas for each subject, which in their turn form the basis for the development of test items and remedial tasks.

Technical and organizational capacity required

Technical and organization capacity requirements are basically similar to those for national assessment programs.

Controversial points

The same kind of controversies might arise as in national and school assessment programs, primarily the criticism that important educational goals would escape measurement. In settings where schools are given complete autonomy in establishing the curriculum, these, and other student assessment instruments, could be seen as letting in centralization through the back door. If one accepts fixed educational objectives and national item banks one should probably also be ready to accept the equalizing tendency that the assessment tools would inevitably have. “Teaching to the test” would only deserve its negative connotation if the test was fallible and the item bank too small. The

flexibility and quality of tests developed according to the current state of the art methodology should be able to prevent this.

3.2.5 Assessment-based school self evaluation General description

This type of M&E is best perceived as a spin-off of other assessment types. The core idea is that schools use the information from externally initiated assessments or from internal student monitoring systems to evaluate their own performance. There are nevertheless also examples of projects where school self-evaluation appears to have been the primary motive for the development and administering of achievement tests.

Main audiences and types of use of the information

School managers and the school staff are the main category of users. Parents could also be a target group for disseminating the information to.

Following the achievement of cohorts of students in the main subjects would allow schools to monitor their own standards and detect problems in a particular time x grade x teacher x subject combination. Follow-up actions might involve adapting the school- curriculum, choice of textbooks, initiatives for counseling and consultation of teachers, and decisions about matching teachers and groups of students.

Technical issues

Psychometric quality of the achievement tests is relevant to the possibilities for their use.

The issue of criterion—versus norm referenced testing is relevant in this context as well.

Additional technical problems arise when it is the ambition to relate information on process indicators to the assessment results and indices of learning progress (Scheerens &

Bosker, 1995). These technical issues concern additional data collection, developing appropriate data-records, and problems of data-analysis.

Required technical and organizational capacity

Until fully computerized forms become available schools would require the assistance of assessment specialists and data-analysts to compute statistics, make comparisons over time, and (possibly) link the information to other data-sources.

At school level specific organizational pre-conditions need to be fulfilled in the sense of established discussion platforms and clear rules about the way the information will be used. Confidentiality is an important issue.

Controversial issues

Controversial issues are similar to other types of achievement-test based assessments.

The implied “multi-purpose use” of instruments is not unproblematic. For example, from

the perspective of School Performance Reporting the test results are to be made public, while, for school self-evaluation purposes, schools might give preference to keep part of the information confidential.

3.2.6 Examinations General description

Examinations are sets of learning tasks or test items and specific procedures to administer these (e.g. written and oral exams, portfolio’s showing samples of accomplishments).

These are used to determine whether a candidate has the required level of achievement to be formally certified as having successfully completed a program of formal schooling or training.

Main audiences and types of use of the information

Examinations belong to the institutional arrangements of a country and regulate selection for follow-up education and entrance to positions on the labor market.

Technical issues

A major technological question is whether examinations can fully depend on objective and standardized achievement tests, or need other review procedures and demonstration of skills as well. By allowing for the objective scoring of open test items, tests on more general cognitive skills and “authentic testing” the achievement test methodology appears to be “moving up” in taking care of these more complex aspects. Therefore tests will probably play an increasingly important role in examinations. Organizational forms like allowing for a school-based part and a central part of a final examinations could allow for combining more holistic and informal review by school-teams and objective testing (the central part of the exam).

Technical and organizational capacity required

Assuming that examinations will, at least partially, be based on standardized tests, the required technical capacity in a country matches that for other applications of educational achievement tests.

In addition, examinations require committees that take the formal responsibility for each annual version of the examination. Sometimes the educational inspectorate has a function in this as well.

Of course there should also be technical and logistic facilities to score test-forms, possibly combine test-results with the results of other parts of the examination etc. Again state of the art ICT applications, like for example optical readable test-forms, are relevant.

Controversial points

Perhaps the issue of norm-referenced versus criterion-referenced testing applies to examinations more than to other student assessment forms. Traditionally examinations have been norm-referenced. The main draw-back of this being that norms would differ across years and cohorts.

3.3 Forms That are Based on Education Statistics and Administrative