This study investigates how raters make their scoring decisions when assessing tape-mediated speaking test performance. 24 Chinese EFL teachers were trained before scoring analytically five sample tapes selected from TEM4-Oral, a national EFL speaking test designed for college English major sophomores in China. The raters' verbal reports concerning what they were thinking about while making their scoring decisions were audio-recorded and collected during and immediately after each assessment. Post-scoring interviews were used as supplements to the probe of the scoring process. A qualitative analysis of the data showed that the raters tended to give weight to the content, to punish both grammar and pronunciation errors and to reward the use of impressive and uncommon words. Moreover, the whole decision-making process was proved to be cyclic in nature. A flow chart describing the cyclic process of hypothesis forming and testing was then proposed and discussed.
A survey was carried out in this study to find out the factors that the raters perceived as affecting the rating of TEM-4 Oral Test, a large-scale tape-mediated oral English testing system in China. The findings show that what the raters perceived as affecting the rating included training, raters' interaction with the rating criteria, raters' physical and emotional conditions, raters' attitudes towards the rating work, raters' oral English proficiency level, and the recording quality. Raters' educational and research background were perceived not to affect the rating.