Changing Assessment Practices Resulting From the Shift Towards On-Screen Assessment in Schools
This dissertation reports a study into the appropriateness of on-screen assessment materials compared to paper-based versions, and how any potential change in assessment modes might affect assessment practices in schools. The research was centred around a controlled comparative trial of paper and on-screen assessments with 1000 school students. The appropriateness of the assessments was conceptualised in terms of exploring the comparative reliability, validity and scoring equivalence of these assessments in paper and on-screen modes. Reliability was considered using quantitative analysis: calculating the performance and internal reliability of the assessments using classical test theory, Cronbach’s alpha and Rasch latent trait modelling. Equivalence was also addressed empirically. Marking reliability was not quantified, however it is discussed. Validity was considered through qualitative analysis, using questionnaire and interview data obtained from the students and teachers participating in the trial; the focus on the comparative authenticity and fitness for purpose in assessments in different modes. The outcomes of the research can be summarised as follows: the assessment tests in both modes scored highly in terms of internal reliability, however they were not necessarily measuring the same constructs. The scores from different modes were not equivalent, with students performing better on paper. The on-screen versions were considered to have greater validity by students and teachers. All items in the assessments that resulted in significant differences in performance were analysed and categorised in terms of item types. Consideration is then given to whether differences in performance are the result of construct irrelevant or relevant factors. The recommendations from this research focus on three main areas; that in order for on-screen assessments to be used in schools and utilise their considerable potential, the equivalence issue needs to be removed, the construct irrelevant factors need to be clearly identified and minimised and the construct relevant factors need to be enhanced. Finally a model of comparative modal dependability is offered, which can be used to contrast and compare the potential benefits and issues when changing assessment modes or item types are considered.