Agreement rates between automated essay scoring systems and human raters: Meta-analysis
Received: Sep 01, 2014 ; Revised: Sep 30, 2014 ; Accepted: Oct 13, 2014
Published Online: Nov 30, 2014
Abstract
Automated essay scoring (AES) is defined as the scoring of written prose using computer technology. The objective of this meta-analysis is to consider the claim that machine scoring of writing test responses agrees with human raters as much as humans agree with other humans. The effect size is the agreement rate between AES and human scoring estimated using a random effects model. The exact agreement rate between AES and human scoring is 52%, compared with an exact agreement rate of 54% between humans. The adjacent agreement rate between AES and human scoring is 93%, compared to an adjacent agreement rate of 94% between humans. This meta-analysis shows that the agreement rate between AES and human raters is very comparable. This study also compares the subgroup analysis of agreement rates using study characteristic variables such as publication status, AES type, essay type, exam type, human expertise, country, and school level. Implications of the results and potential future research are discussed in the conclusion.