Applicabilities of Automated Short-answer Scoring to Large-scale English Writing Tests

Si, Ki-Ja; Park, Do-Young; Lim, Hwang-Gyu

doi:10.29221/jce.2014.17.2.71

J. Curric. Eval. 2014; 17(2):71-97

pISSN: 1229-1544

DOI: https://doi.org/10.29221/jce.2014.17.2.71

Article

대규모 영어 단문형 쓰기 평가를 위한 자동채점 프로그램의 적용 가능성 탐색

시기자¹^,^†, 박도영², 임황규³

Applicabilities of Automated Short-answer Scoring to Large-scale English Writing Tests

Ki-Ja Si¹^,^†, Do-Young Park², Hwang-Gyu Lim³

Author Information & Copyright ▼

¹한국교육과정평가원 연구위원

²한국교육과정평가원 부연구위원

³한국교육과정평가원 전문연구원

¹Research fellow, Korea Institute for Curriculum and Evaluation

²Associate Research fellow, Korea Institute for Curriculum and Evaluation

³Researcher, Korea Institute for Curriculum and Evaluation

^†Corresponding Author : Ki-Ja Si, E-mail : skj@kice.re.kr

ⓒ Copyright 2014, Korea Institute for Curriculum and Evaluation. This is an Open-Access article distributed under the terms of the Creative Commons Attribution NonCommercial-ShareAlike License (http://creativecommons.org/licenses/by-nc-sa/4.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Received: May 01, 2014 ; Revised: Jun 01, 2014 ; Accepted: Jun 13, 2014

Published Online: Jul 31, 2014

요약

본 연구의 목적은 단문 형식의 쓰기 답안 평가를 위해 개발한 자동채점 프로그램의 성능을 검증하여 대규모 쓰기 평가에서의 적용 가능성을 탐색하기 위한 것이다. 본 연구에서 성능 검증에 사용한 문항은 ‘상황에 맞는 짧은 글쓰기’(15～25단어, 5분) 문항과 ‘그림의 세부묘사 완성하기’(하위문장별 10단어 이내, 5분) 문항이다. 본 연구에서는 단문 형식의 쓰기 평가를 위한 자동채점 프로그램의 성능을 검증하기 위해 인간채점과 자동채점에 따른 상관계수, 유사일치도 통계에 근거한 채점자 간 신뢰도의 차이, 다국면 라쉬 모형에 근거한 채점자 엄격성의 차이, 일반화가능도 계수에 근거한 검사점수 신뢰도의 차이, 시간 및 비용 차이 등에 대한 통계적 분석을 실시하였다. 분석 결과, 자동채점이 인간채점자 1명을 대체할 경우 채점자 간 신뢰도, 검사점수 신뢰도를 인간채점과 유사한 수준으로 유지하면서 채점자 엄격성에 의한 영향력과 시간 및 비용을 큰 폭으로 감소시킬 수 있음을 확인하였다.

Keywords: 단문형 쓰기 평가; 자동채점; 채점자 간 신뢰도; 채점자 엄격성; 일반화가능도 계수