LLM Essay Scoring Under Holistic and Analytic Rubrics: Prompt Effects and Bias

📰 ArXiv cs.AI

Researchers evaluate LLMs for essay scoring under holistic and analytic rubrics, analyzing agreement with human scores and bias

advanced Published 2 Apr 2026

Action Steps

Collect and preprocess essay-scoring datasets with human annotations
Train and fine-tune LLMs on instruction-tuned datasets
Evaluate LLM performance using metrics such as agreement with human consensus scores and directional bias
Analyze the stability of bias estimates and identify potential sources of bias

Who Needs to Know This

AI engineers and educators can benefit from this research to improve the accuracy and fairness of LLM-based essay scoring systems, while data scientists can apply the findings to develop more robust models

Key Insight

💡 LLMs can be effective for essay scoring, but may introduce bias and require careful evaluation and fine-tuning