While popularity of peer assessment in Computer Science has increased in recent years, the validity of peer assessed marks remain a significant concern to instructors and source of anxiety to students. We report here on a large-scale study (1,500 students and 10,000 reviews) involving three introductory programming classes which recorded grades and feedback comments for both student and tutor reviews of novice programs. Using a paired analysis, we compare the quantitative marks given by students with those given by tutors, for both functional and non-functional aspects of the program. We also report on an analysis of the lexical sophistication of feedback comments.
We find good correlations that improve with student ability and experience, and that marks for functional aspects correlate more closely than those for non-functional aspects. Our lexical sophistication analysis suggests student feedback can be as good as or better than tutor feedback. We also observe that a policy of selecting tutors based on their previous peer assessment performance leads to a large improvement in tutor feedback.