• Tidak ada hasil yang ditemukan

CHAPTER 3: THEORETICAL FRAMEWORK, METHODOLOGY, AND DATA

3.3. Data

Data are considered the vital part of econometric analysis and as such has to be clean, in the correct format, and relevant for econometric analysis if it is to address the specific research questions set in this study (Wegner, 2007). This study uses primary and secondary data. The secondary data set used in this thesis is generated by merging student records from mainly three UKZN Student Trends Data-bases: (1) the “CHES Student Trends Data-base”, (2) downloads from UKZN statistics on line provided by UKZN’s DMI system (www.ukzn.ac.za/dmi), and (3) data from the Deputy Vice-Chancellor (DVC): Teaching and Learning, the Faculty Office of the FMS, the School of Accounting, and the School of Economics and Finance. In addition, various other official sources of information were used in this study, as indicated in the tables with information in the text and in references. The “CHES Student Trends Data-base” is a repository of a large micro data-base of all UKZN student information since 1990, archived and maintained by the Centre for Higher Education Studies (CHES) in Statistical Package for Social Sciences (SPSS). Originally started as a former UN data-base, with the merger in 2004, the data-base has been enlarged to include all students of the former UDW. The CHES Student Trends Data-base has been updated by downloads from the merged UKZN’s Data Management Information (DMI) system. It now spans two decades and has data for more than 223 000 students. The key variable of interest is the university identity number (student number) of all students who registered between 1990 to date (2009 is the last academic year for which downloads are available): it does not contain student names (http://innerweb.ukzn.ac.za/depts/chesdata).

3.3.1. DATA RELEVANCY

To narrow the search for relevant and appropriate data, other faculties are filtered out before any analysis as the scope of this study is the FMS only. The targeted student population incorporates active students registered for qualifications in the FMS, who registered for both semesters one and two across the selected five indicator academic years measured: 2004. 2005, 2006, 2007, and 2008; have final examination marks in the selected undergraduate accounting and economics modules; and for whom all other supporting data are available.

The main challenge of this study is the measurement of educational facts. UKZN’s Student Trends data- bases provided cross sectional data of students’ records such as transcripts records, bio-data and student demographics, the grade point average (GPA), the students’ final examination marks (student scores) for

97 each university module, and some before-university information. More specifically, data available include inter alia: (1) University identity number, (2) semester/year registered in, (3) course/module registered in, (4) Faculty student is registered in, (5) Campus specific, (6) qualification student registered for, (7) academic year when student gained qualification (year of graduation), (8) the major course student registered for, (9) achievement/students’ final examination marks, (10) grade point average (GPA), (11) credits accumulated, (12) result code, (13) academic year and semester of cancellation of registration, (14) academic year and semester of student exclusion, (15) academic year and semester of student readmission, (16) drop out code, (17) gender, (18) race, (19) date of birth, (20) home first language, (21) matric authority (also racially specific for pre-1995), (22) school attended, (23) year of matriculation, (24) total matric points, (25) matric subject score, and (26) matric subject symbol (HG or SG). The sheer size of these data- bases means that there are umpteen possible research questions pertaining to UKZN that can be explored and checked against empirical evidence. The selection of the most relevant data and appropriate measures by which the specific research questions set in this thesis can be addressed is important. A brief discussion of variables incorporated and tested in the regression analysis using both OLS and Logistic regressions is provided below.

3.3.2. DATA PREPARATION AND CLEANING

There is an expectation that every database contains some “dirty data” (Wegner, 2007). When captured available data contain errors, outliers (also referred to as extreme values), come in varying formats, or is inappropriate, incomplete, inconsistent, irrelevant, and unnecessary to the proposed research question under analysis. When processing UKZN’s Student Trends data-bases, various clerical errors and biases were found. It was an essential prerequisite to clean data to ensure for reliable and valid econometrical findings that are not distorted by dirty data in the data-bases.

Though the three UKZN Student Trends data-bases and focus groups assisted this study to research student trends in the FMS, to undertake the estimation of the parameters of the econometric model, several modifications were made to the original data. This study sets up a system to catch these clerical errors and biases, correct them or omit them where possible before undertaking the econometric analysis. Data pertaining to student demographics are the most affected, in some cases demanding knowledge of likely spelling and re-ordering of words. In some cases data on student identity numbers had to be cleaned up case by case through some SPSS and Stata software processing. These include inter alia re-ordering of student

98 identity numbers with both alpha and numeric form, and spotting likely duplicates of student identity numbers. Duplicates in student numbers were eliminated systematically using a dummy variable which allowed for dropping unwanted student numbers. It was also identified that only about 42 percent of the 223 000 students (about 105 634 of the students) in the data base had a school code that could be linked to the national system of the DoE (also referred to as Natemis quintiles system). It can be assumed that the remaining 58 percent of the students encompass South African students from schools not in the national system or schools that ceased to exist or changed name before the national system was set up, and foreign students. The age of the student was generated but some dates of birth were given as a date after matriculation, which had to be cleaned by rewriting in SPSS these faulty cells as missing. Students whose marks were “zero” and did not have the UKZN code to justify the discernible reasons why they missed the compulsory final examinations were referred to as “Ghost”. This is not an insignificant problem since a zero mark leads to many complications in econometric analysis that could give a false reading of the results.

These ghost students were also eliminated systematically using a dummy variable which captured them and provided their descriptive statistics and therefore allowed for them to be omitted from the students examined in this study. Unfortunately, there was no indication available on the dropouts and students who had unfinished (non-completed) degrees.

These biases in the UKZN Student Trends data-bases downloaded from UKZN’s DMI system can stem from various speculative sources. These include inter alia administrative staff of UKZN – specifically the ones close to selection and admission services – failing to capture all of the details either because of poor application and registration forms-filling or because they failed to accurately key in the details that were given. Students’ exclusion and readmission are not accompanied by students’ records update, nor were Dean’s discretion prerogatives in fast-tracking or transferring students between degrees, majors, programmes, or qualifications. These biases are not insignificant problems in a situation of a fine grained longitudinal research for strategic planning and implementation of education policies based on the information sourced from UKZN’s DMI system.

For example, if the proportion of students who according to the UKZN’s DMI system are still registered, but have in fact dropped out or submitted a form for cancellation from the modules in the Faculty office before the due date for cancellation and are mistakenly included in the empirical analysis as still active students, this inclusion will give a false reading and interpretation of important descriptive statistics including inter alia the pass rates, failure rates, retention rates, and graduation or throughput rates.

99 The contention is that, to investigate via the data downloaded from UKZN’s DMI system, it is imperative to first fine grain these data (clean and select the most relevant) to incorporate corrections and modifications that in the case of this study are huge and complex. This study, therefore, has several unusual features, omitting students with inaccurate data and correcting for obviously erroneous data. Data used in this study were transformed into meaningful measures and are relevant to the research questions under analysis. This quality of data ensures the reliability and validity of the econometrical findings on which it is based.

The study uses both rank order and ratio data – the examination final marks and matric scores themselves – as there is no evidence and agreement amongst studies surveyed on which is the most used. The student identity numbers was not incorporated in the analysis to preserve student anonymity prescribed by the UKZN Ethics Committee. In order to identify those explanatory variables which are most significant in their influence on students’ academic performance, the stepwise approach was used for selecting variables as in Taylor and Harris (2004). Where a causal relationship was identified between a variable and student performance, that variable is incorporated in the model. Some 30 explanatory variables covering empirically tractable characteristics for predicting the general academic performance of students as reviewed in Section 2.3.1 were examined to ascertain whether any causal inter-relationships existed among these variables. On this basis, their appropriateness for incorporation in the regression models was determined (Taylor and Harris, 2004). In some cases the study also had to omit some variables and outliers to minimize endogeneity and heteroscedasticity. Some pair-wise correlations coefficients were not computed because at least one of the variables was constant.

A series of 16 regression models was run. The results of this study run into thousands of pages since models were run and re-run for different first-year accounting and economics modules separately to ascertain the degree of consistency between different regression model results (Taylor and Harris, 2004) and detect also any possible discrepancy and distortion in the results. To control the challenge of summarizing the above results and findings in the most efficient displays, five indicator academic years have been chosen (2004, 2005, 2006, 2007 and 2008). Three measurement academic years at roughly three-yearly intervals: 2004, 2006, and 2008 were used as representing a cohort’s years to graduation for the BCom (Accounting) and BCom (General) Degree. These measurement academic years have the merit of catching 2004 which is the academic year of the merger (the initial year) which brought with it all the legacies of pre-1994 and the year 2000 which is the academic year when some Faculty re-organization took place at the former UDW and UN (including its constituent campuses). These measurement academic years also have the merit of catching one mid-year, 2006, during what some studies at UKZN refer to as ”the academic year of the merger chaos

100 or hiatus” when the physical articulation and re-organization of Faculties to a single UKZN campus occurred. These measurement academic years also have the merit of catching 2008 which is the most stable and complete year for which complete student trends data are available. This does not however suggest that major changes and policies at UKZN did not occur out of these indicator years. Though the study failed to undertake a longitudinal analysis, the results are gained for five indicator academic years in terms of cross sections suggesting that trends across time can duly be discerned as discussed in the following Chapter 4.

The regression results are similar and available on request (these results are to be published in another follow up study).

The primary data are computed from the focus group discussions with students, academic and non- academic (administrators and support) staff members at UKZN, and various education stakeholders in Durban. Focus group discussions are important as they emphasize the variables this study cannot grasp and the ones that are beyond measurement.