Peer grading serves as a critical tool for scaling the grading of complex, open-ended assignments to courses with tens or hundreds of thousands of students. But despite promising initial trials, it does not always deliver accurate results compared to human experts. I will talk about methods for estimating and correcting for grader biases and reliabilities, showing significant improvement in peer grading accuracy on real data with 63,199 peer grades from Coursera's HCI course offerings --- the largest peer grading networks analyzed to date. I will also discuss the relationship between grader biases and reliabilities to other student factors such as student engagement, performance as well as commenting style.
I will also discuss scalable feedback in the context of coding assignments, which have more structure than arbitrary open ended assignments. I outline a novel way to decompose online coding submissions into a vocabulary of “code phrases”. Based on this vocabulary, we have architected a queryable index that allows fast searches into the massive dataset of student homework submissions. To demonstrate the utility of our homework search engine we index over a million code submissions from users worldwide in Stanford’s Machine Learning massive open online course (MOOC) and then (a) semi-automatically learn shared structure amongst homework submissions and (b) generate specific feedback for student mistakes.
This is joint work with Chris Piech, Andy Nguyen, Zhenghao Chen, Chuong Do, Andrew Ng, Daphne Koller and Leonidas Guibas