As a TA we did something similar: we asked them to self grade their own homework using a provided rubric, and then we spot checked 1/4 of the students (without replacement) to punish lying about what grade you deserve. We didn’t punish for a few disagreements over the rubric, but if it was blatant we checked their assignments every time in the future (and told them). I think if it was bad enough we could have reported them.
This saved a bunch of time on actually grading assignments and made us write a very clear and unambiguous rubric (which required a very clear homework) and also demonstrated to the students that grading was not arbitrary.
Several universities [1] scale out personalized instruction and interactive grading by hiring students from previous cohorts and paying them either in course credit (taking a "course" that involves teaching students in the current cohort) or at a low rate (possibly subsidized by financial aid) comparable to other on-campus student jobs.
How do you justify the fact that only some of the students get the pleasure of an in-person grilling? Or, am I completely misunderstanding the process you're going to be using?
In my plan, each student is interviewed at least once. Ideally more than once by the same teacher, so the teacher can get to know them a little better, spot areas where the student needs more help, etc.
There's still a scaling problem, but I think it makes the ~200 student classes we have now more feasible than 100% autograding. I also like the other commenter's suggestion of coming back to interview certain students each time, if they need it.
Is this about pleasure or about measuring knowledge?
A lot of stuff you learn and the way you learn it isn't necessarily pleasant, but frequently you still have to do it and you really discover 20 years later why it was needed.
No, it's about why only a subset of students get singled out for extra scrutiny, literally arbitrarily, as the selection procedure itself is defined as "random sampling."
random sampling is an effective method for inferring the same information about the larger population that is being measured in the smaller sample, to a certain degree of confidence based on the sample size and known distribution of what is being measured. These concepts are fundamental to statistics.
It obviously doesn't scale, so we'll use random sampling.