Data analysis models - Cohort one

3.5 Research design

3.5.1 Cohort one

3.5.1.3 Data analysis models

Reference group Question Response option

Interpretation of response frequencies as a function of an estimation of collective self-efficacy (in respect of teaching ability) Higher than

expected response frequency

Lower than expected response frequency the same as

mine.

c. The teacher’s home language makes no difference to how I learn.

Passive (no impact on collective self- efficacy rating)

4. Which of the following is true about your teacher speaking your home language while teaching you?

a. I learn better when my teacher speaks my home language while teaching me.

High collective self- efficacy

Low collective self- efficacy

b. I learn better when my teacher does not speak my home language while teaching me.

Low collective self- efficacy

High collective self- efficacy

c. The teacher using my home language while teaching makes no difference to how I learn.

Passive (no impact on collective self- efficacy rating)

Table 3-8 Interpretation model for ‘perception questionnaire’ response frequencies

of statistical stringency on the same data set in the form of Zero-Order Correlations, multiple regression and Hierarchical Multiple Regression in his study on learning style congruence as a predictor of cognitive performance (Zhang, 2006).

In line with international studies of a similar nature (such as those referred to in the foregoing), this study primarily uses a multiple regression model for cohort one to identify the extent to which the various independent variables (such as race, home language and gender match or mismatch) contribute to the variance of the dependent variables (improvement and post test scores).

By way of comparison, however, a secondary analysis of some of the data is conducted using a lesser known correlation model (Point-Biserial Correlation) designed for situations where either the independent variable or dependent variable is dichotomous while the other variable is non- dichotomous. In the case of the match/mismatch components of this study, the independent variable is indeed dichotomous (either match or mismatch) and therefore ideally suited to this model.

Multiple regression

Multiple regression is an accepted and widely used statistical method that is employed to account for (predict) the variance in an interval dependent variable, based on linear combinations of interval, dichotomous or dummy independent variables. The model identifies which independent variables significantly contribute to the variance of the dependent variable and can also provide the relative predictive importance of the independent variables.

In the case of this study, the dependent variable is improvement score – an interval scale variable.

The independent variables are the dichotomous match/mismatch variables. Pre-test score is used as a covariate.

While the analysis of an improvement (gain) score is a measure of the post-test score relative to the pre-test score, it does not take into account differences in pre-test scores. Clearly, a person with a low pre-test score has the potential to achieve a higher improvement score than one with a high pre-test score. The interpretation of an analysis on a gain score can be problematic when differences in pre- test scores exist. Therefore, it is important to include the pre-test score as a covariate as this controls for the effect of the pre-test which co-varies with the dependent variable.

In respect of the regression process utilised in this analysis, the following assumptions were made:

 Independence: Keeping the classes for each course separate adequately addressed this condition.

 Normality: Once the outliers (all subjects with an Improvement score of -40 or less) were removed, problems relating to normality were eliminated. Checks were made by plotting histograms of the standard residuals as well as measuring Skewness and Kurtosis. These measurements all fell well within the accepted interval of [-1; +1].

 Homoscedasticity: Plots of the residuals were examined to ensure that the variance of the residuals was constant for all values of the independents.

 Linearity: The rule of thumb for regression was used for this analysis to test for linearity.

i.e. the standard deviation of the dependent must be greater than the standard deviation of the residuals.

 Proper specification of the model: In each case, variables added to the model were checked for correlation with other independents. Multicollinearity (excessively high correlation) among independents was tested using the Tolerance and VIF tests.

Point-Biserial correlation

Point-Biserial Correlation is a special version of the Pearson product-moment correlation. This correlation model is designed to meet the needs of data involving either of the independent or dependent variables being dichotomous and the other variable being non-dichotomous. In the case of this study, the dichotomous variable is the independent variable, teacher student match/mismatch (in terms of race, home language or gender). The dependent variable is non-dichotomous, both in the case where a single post test score is used as the dependent variable and where an improvement (gain) score is used.

The Point-Biserial Correlation coefficient is calculated using the following formula:

r

= M

- M

S

√𝑝𝑞

Notes on Point-Biserial Correlation Formula:

 Mp is the mean for the non-dichotomous values in connection with the variable coded 1;

 Mq is the mean for the non-dichotomous values for the same variable coded 0;

 St is the standard deviation for all non-dichotomous entries;

 p and q are the proportion of the dichotomous variable coded 1 and 0 respectively.

It should be noted that even if the coefficient of determination indicates a relationship between variables, the correlation may not be significant. An interpretive index such as the coefficient of determination is not meaningful by itself. It has to be statistically significant. In order to determine if the correlation is significant, the null hypothesis must be rejected. To begin, the null hypothesis always states that rpb equals zero. Any evaluation of a correlation begins with the disprovable statement that there is no correlation between the two variables. Although rarely stated explicitly, the research hypothesis is always formulated with the null hypothesis in mind. If the null hypothesis is rejected, the alternative or research hypothesis can be accepted, namely, rpb is greater or less than zero. Research hypotheses involving the point-biserial correlation will either be positive or negative.

In order to reject the null hypothesis, a one-tailed t-test for independent means is applied to the correlation coefficient, as per the following formula:

Notes on the t-test:

 n is the number of cases;

 n-2 is the degrees of freedom;

 rpb is the point-biserial correlation coefficient.

A one-tailed t-test is used instead of a two-tailed t-test because correlations are almost always, though not necessarily, directional. That is, a given research or alternative hypothesis will usually state some variable is positively or negatively related to another variable as opposed to simply just being related.

If the value of t obtained is less than the critical value for a one-tailed t-test for independent means associated with the degrees of freedom (n-2) then the null hypothesis cannot be rejected. If the value of t is greater than the critical value associated with the relevant degrees of freedom, then the null hypothesis can be rejected and the research hypothesis supported.

Dalam dokumen Maximising return on investment in IT training : a South African perspective. (Halaman 121-125)