远程教育创新

当前位置: 首页 >> 期刊栏目 >> 远程教育创新 >> 正文

大语言模型支持的主观题自动化评价研究——以人文素质主观题为例

2025年第6期点击：[]

凌晨宋宁宁

（北京师范大学远程教育研究中心，北京 100875）

【摘要】大语言模型凭借其强大的自然语言理解与生成能力，正在重塑教育评价的范式与方法。研究以主观题自动化评价为切入点，结合布鲁姆的教育目标分类，面向小学五年级学生开展测评。通过对比人工评分与五种大语言模型在三种提示策略下的评分结果，系统评估大语言模型在主观题评价中的一致性、稳定性与适用性。研究发现，大语言模型在人文素质主观题评价中整体表现良好，尤其在理解与评价创造维度上与人工评分高度一致；提示词策略显著影响模型表现，思维链与角色扮演提示优于零样本提示。对此，研究从评价设计与评价模式的角度提出建议，为推动大语言模型在教育评价中的科学化、规模化应用提供理论支持与实践参考。

【关键词】教育评价；大语言模型；主观题；自动化评价

Research on the Automated Assessment of Subjective Questions Supported by Large Language Models: A Case Study of Subjective Questions Assessing Humanistic Quality

LING Chen and SONG Ningning

(Research Center of Distance Education, Beijing Normal University, Beijing 100875, China)

Abstract: Large language models (LLMs), with their powerful capabilities in natural language understanding and generation, are reshaping the paradigms and methods of educational assessment. This study takes the automated assessment of subjective questions as its entry point, integrates Bloom’s Taxonomy of Educational Objectives, and conducts an assessment targeting fifth-grade primary school students. By comparing manual scoring with the scoring results of five large language models under three prompting strategies, it systematically evaluates the consistency, stability and applicability of large language models in subjective question assessment. The research finds that large language models perform well overall in assessing subjective questions on humanistic qualities, showing particularly high consistency with manual scoring in the dimensions of understanding, evaluation and creation. Prompting strategies significantly influence model performance, with chain-of-thought and role-play prompts outperforming zero-shot prompts. In response, the study offers recommendations from the perspective of assessment design and assessment models, providing theoretical support and practical reference for promoting the scientific and large-scale application of large language models in educational assessment.

Keywords: educational assessment; large language models (LLMs); subjective questions; automated assessment

下载：大语言模型支持的主观题自动化评价研究.pdf

上一条：ChatGPT的教育应用：提升学习效果的真正原因是什么？
下一条：大规模教育测评的应用机制与挑战：国际经验与本土化路径探索

【关闭】