https://educationaldatamining.org/edm2019/proceedings/
## Background ##
This paper improves the training/testing process of AES in our previous study (Boulanger & Kumar, 2018). It uses exactly the same dataset and set of writing features. It demonstrates that the performance reported in the previous study is yet inflated due to the shuffling of essays in the training, validation, and testing sets. In fact, when averaging the model's performance over five different shufflings, we get a QWK of only 0.63. Moreover, this paper is one of the first to predict rubric scores on ASAP datasets. It tests the performance of several deep learning architectures against a baseline model (majority classifier).