• Tidak ada hasil yang ditemukan

PDF Educational Evaluation, Assessment, and Monitoring

N/A
N/A
Nguyễn Gia Hào

Academic year: 2023

Membagikan "PDF Educational Evaluation, Assessment, and Monitoring"

Copied!
425
0
0

Teks penuh

It concentrates on the application of educational evaluation, assessment and monitoring activities embedded in organizational, management and teaching processes. The structure of the book is built around a three-dimensional model on the basis of which different types of educational "M&E", as it is sometimes abbreviated, are distinguished.

Basic Concepts

Monitoring and Evaluation (M&E) in Education: Concepts, Functions and

Context

  • Introduction
  • Why do we Need Monitoring and Evaluation in Education?
  • A Conceptual Framework to Distinguish Technical Options in Educational M&E
  • Pre-Conditions in Educational M&E
  • Conclusion: Why Speak of “Systemic Educational Evaluation”?

By crossing these three dimensions (see Table 1.1), the main forms of educational M&E can be characterized. What the example illustrates is that when it comes to taking concrete steps to establish or improve educational M&E, one cannot take "the political will" to do so for granted.

Basics of Educational Evaluation

Introduction

Basics of Evaluation Methodology .1 Evaluation objects, criteria and standards

  • Measurement of criteria and antecedent conditions Measuring outcomes
  • Controlling for background variables (value added)
  • Design: answering the attribution question

Indicators of the context, for example of the school, can be judged according to whether they are favorable or unfavorable for the proper functioning of the school. External validity is threatened by selection biases (uncontrolled initial differences between treatment groups that influence treatment conditions) and artificial aspects of the experimental situation.

Figure 2.1 A basic systems model.
Figure 2.1 A basic systems model.

Important Distinctions in Evaluation Theory .1 Ideal-type stages in evaluation

  • Formative and summative roles
  • Accountability and improvement perspectives reconsidered

In this situation, evaluability assessment is described as an analytical activity that focuses on the structure and feasibility of the program to be evaluated. The key variables on which to collect data should be selected based on the determination of evaluation criteria and standards (ends) and the structure of the program (means)—see ad a) and ad b) above.

Introduction

Forms That are Based on Student Achievement Measurement .1 National assessment programs

  • International assessment programs General description
  • School performance reporting General description
  • Student monitoring systems General description
  • Assessment-based school self evaluation General description
  • Examinations General description

Main target groups and types of use of the information These are more or less the same as in national assessment programmes. By allowing the objective scoring of open-ended test items, testing of more general cognitive skills, and "authentic testing," the achievement testing method appears to be "moving up" in addressing these more complex aspects.

Forms That are Based on Education Statistics and Administrative Data

  • System level management information systems General description
  • School management information systems General description

MIS requires an Office of Education Statistics with a specialized unit for developing indicators in areas where traditional statistics do not fully cover all categories of the theoretical model. The information could be used for all kinds of corrective actions in school management.

Forms That are Based on Systematic Review, Observations and (Self)Perceptions

  • International review panels General description
  • School inspection/supervision General description
  • School self-evaluations, including teacher appraisal General description
  • School audits General description
  • Monitoring and evaluation as part of teaching

Finally, some of these forms of school self-evaluation can be combined and integrated with each other. School management and staff teams are the primary audience for school self-evaluation results.

Program Evaluation and Teacher Evaluation .1 Program evaluation

  • Teacher evaluation

Results of program evaluations can lead to political disputes when the results are critical, the stakes in the program are highly evaluated and the credibility of the applied research methodology is less than optimal. This type of "input" control has long been one of the most important measures for quality care in education; especially when combined with another type of input control, namely centrally standardized curricula.

Theoretical Foundations of Systemic M&E

The Political and Organizational Context of Educational Evaluation

Introduction

Rationality Assumptions Concerning the Policy-Context of Evaluations

According to the third characteristic of the rationality model, planned programs are 'actually' implemented. In many cases, a clear exploration of the goals of the evaluation will help overcome resistance.

Gearing Evaluation Approach to Contextual Conditions; the Case of Educational Reform Programs

  • Phase models
  • Articulation of the decision-making context
  • Monitoring and evaluation in functionally decentralized education systems

The next question in the sequence is whether the intended direct results of the project have been achieved. The distinction between areas of decision-making in educational systems has some similarities with Bray's use of the term "functional decentralization," as cited by Rondinelli. Examining the location of decision-making in relation to domains and subdomains is one of the most interesting possibilities.

Figure 4.1 provides a schematic model of the progression of events.
Figure 4.1 provides a schematic model of the progression of events.

Creating Pre-Conditions for M&E

  • Political will and resistance
  • Institutional capability for M&E
  • Organizational and technical capacity for M&E

But the rules of the game can also be less formal and depend on convention and implicit norms. Institutional capacity for M&E is most realistically addressed as an assessment activity to gain an idea of ​​the general climate in which M&E activities will “land” in a country. In the case of gaps, several options should be considered: narrowing the M&E objectives or changing and improving current practices, e.g.

Conclusion: Matching Evaluation Approach to Characteristics of the Reform Program, Creating Pre-Conditions and Choosing an

The above conjectures all express a contingent approach: the appropriateness or efficiency in the choice of monitoring and evaluation strategy depends on the characteristics of the reform context. The fields of educational evaluation in the sense of measuring student performance on the one hand and educational evaluation in the sense of program evaluation on the other hand have developed as two relatively separate fields. In all reform programmes, where some kind of curriculum revision is at stake, it would also be possible to use assessments of student achievement in the particular curricular area as effect criteria.

Evaluation as a Tool for Planning and Management at School Level

  • Introduction
  • The Rationality Paradigm Reconsidered
    • Synoptic planning and bureaucratic structuring
    • Creating market mechanisms: alignment of individual and organizational rationality
    • The cybernetic principle: retroactive planning and the learning organization
    • The importance of the cybernetic principle
    • Retroactive planning
  • The Organizational Structural Dimension
    • Organizational learning in “learning organizations”
    • Management in the school as a “professional bureaucracy”
    • Educational leadership as a characteristic of “effective schools”
    • Schools as learning organizations?
  • Conclusion: The Centrality of External and Internal School Self- Evaluation in Learning and Adapting School Organizations

In the remaining sections, the focus will shift from procedural variations of the rationality model to organizational structures. Operational management is firmly in the hands of the professionals (teachers) in the operational core (classroom) of the organization. Second, "pedagogical management" is not entirely at odds with certain demands of the professional bureaucracy.

Figure 5.1 The complete cycle of  choice, cited from March & Olsen  (1976).
Figure 5.1 The complete cycle of choice, cited from March & Olsen (1976).

Assessment of Student Achievement

Basic Elements of Educational Measurement

Introduction

This includes the objective of the test (eg, curriculum-based skills, cognitive or psychomotor abilities) and the type of decisions to be made (eg, mastery decisions, pass/fail decisions, selection, prediction). Depending on the purpose of the test, the content area of ​​the test and the level at which the content should be tested can be determined. Test scoring and analysis of tests and items, including conversion of scores into grades and evaluation of test quality as a measurement instrument.

Test Purposes

Construction of test materials, such as construction of multiple-choice items and open-ended items, or the construction of performance assessments. This chapter will conclude with some of these topics: assessment systems, item banking, optimal test construction, and computerized adaptive testing. Implications for a future criterion setting concern knowledge, skills and affective goals that must continue in the period after the teaching has ended.

Quality Criteria for Assessments

A judgment of the relevance and representativeness of the content is based on a specification of the boundaries and structure of the domain to be tested. One can think of the properties of the test (for example, the length of the test) or the evaluation procedure (for example, rater effects). The purpose of a reliability analysis is to quantify the consistency and inconsistency of student performance on the test.

Test Specifications

  • Specification of test content
  • Specification of cognitive behavior level

The choice of test content involves a trade-off between the breadth of content coverage and the reliability of subscores. The main objective of the specification table is to ensure that the test is a valid reflection of the test domain and purpose. Table cell entries give the relative importance of a specific combination of content and cognitive level of behavior on the test.

Test Formats

  • Selected response formats
  • Constructed response formats
  • Performance assessments
  • Choosing a format

One of the main mistakes made in this format is that the wording of the statement closely matches the wording used in the instructional materials. The relationship between the number of items to be purchased and the probability of gambling will also be discussed in the next chapter. Fill-in items resemble multiple-choice items in that they can (in principle) be scored objectively and can provide good substantive coverage as a result of the number that can be administered in a given time.

Table 6.3 Indication of Response Time per Item  Type.
Table 6.3 Indication of Response Time per Item Type.

Test and Item Analysis

The expected value of the test result is equal to the true score, i.e. This means that the unreliability of the scores suppresses the correlation between the observed scores. The dependence of reliability on the variance of the true scores can be abused.

Table 6.5 Example of a Test and Item Analysis  (Number of observations: 2290).
Table 6.5 Example of a Test and Item Analysis (Number of observations: 2290).

Assessment Systems

  • Item banking
  • Item construction
  • Item bank calibration
  • Optimal test assembly
  • Computer based testing
  • Adaptive testing

What the test constructor has done is change the definition of the population of interest. An overview of the use of computerized testing in psychological assessment can be found in Butcher (1987). Having an appropriate IRT model can confirm construct validity, it does not imply test reliability.

Figure 6.1 Overview of an assessment  system.
Figure 6.1 Overview of an assessment system.

Measurement Models in Assessment and Evaluation

Introduction

Unidimensional Models for Dichotomous Items .1 Parameter separation

  • The Rasch model
  • Two- and three-parameter models
  • Estimation procedures
  • Local and global reliability
  • Model fit

The element parameters are estimated simultaneously with the mean and standard deviation of the ability parameters. In the section on model fitting, a test for the appropriateness of the ability distribution will be described. Not that the model violation did not result in a significantly biased estimate of the item parameters.

Table 7.1 Data Matrix with Observed Scores.
Table 7.1 Data Matrix with Observed Scores.

Models for Polytomous Items .1 Introduction

  • Adjacent-category models
  • Continuation-ratio models
  • Cumulative probability models
  • Estimation and testing procedures

In the following, the answer to item k can be in one of the categories m=0,..., Mk. For dichotomous items, the response function was defined as the probability of a correct response as a function of the ability parameter θ. In this formulation, we define the item category function as the probability of scoring in a given item category as a function of the ability parameter θ.

Figure 7.5 Response curves of a  polytomously scored item.
Figure 7.5 Response curves of a polytomously scored item.

Multidimensional Models

In general, however, these identification constraints will do little to provide an interpretation of the dimensions of ability. This approach is a generalization of the marginal maximum likelihood (MML) estimation procedure for unidimensional IRT models (see, Bock & Aitkin, 1981), and has been implemented in TESTFACT (Wilson, Wood & Gibbons, 1991). In the framework of adjacent category models, the logistic versions of the probability of a response in category m can be written as.

Figure 7.6 Item response surface for a  multidimensional IRT model (Reckase,  1977).
Figure 7.6 Item response surface for a multidimensional IRT model (Reckase, 1977).

Multilevel IRT Model .1 Models for item parameters

  • Testlet models
  • Models for ratings

In the level 2 model, the values ​​of the item parameters are considered as realizations of a random vector. It is assumed that the element parameters e.g. has a 3-variate normal distribution with mean µp and a covariance matrix Σp. This approach discards part of the information in the item responses, which will lead to a certain loss of measurement precision.

Applications of Measurement Models

Test Equating and Linking of Assessments .1 Data collection designs

  • Multi-stage testing
  • Test equating

Note that this expectation depends only on the parameters of the items of the reference exam. In the example of Table 8.3, the cut-off point of the reference exam was 27; consequently, 28.0% failed the exam. In the bold row marked "English H" information is given about the results of the reference population doing the reference examination.

Figure 8.2 Linking by common  persons.
Figure 8.2 Linking by common persons.

Multiple Populations in IRT .1 Differences between populations

  • Multilevel regression models on ability

The Bayesian approximation deals with the posterior distribution of the parameters, for example p(θ, δ, β, µ, σ | y). At the student level, the variables were gender (0=male, 1=female), SES (with two indicators: father's and mother's education, scores ranging from 0 to 8) and IQ (range from 0 to 80). It can be seen that the magnitudes of the fixed effects in the MLIRT model were larger than the analogous estimates in the ML model.

Table 8.7 Parameter Values and Estimates.
Table 8.7 Parameter Values and Estimates.

Gambar

Figure 2.1 A basic systems model.
Figure 4.1 provides a schematic model of the progression of events.
Figure 4.3 Percentage of decisions  taken by schools across countries at  lower secondary level (Source: OECD  Network C)
Figure 4.4 Percentage of autonomous  decisions taken by schools across  countries.
+7

Referensi

Dokumen terkait

Untuk persuasi verbal yang diberikan oleh guru selama proses pembelajaran dan keberhasilan yang dilihat dari temannya yang lain mengakibatkan siswa yang memiliki