Right, the thing about applications like the digit set and similar OCR problems etc., is that we can independently generate a model of "acceptable" translations/rotations and validate it reasonably easily because we understand the domain well (not that you can't cause trouble this way). This certainly isn't true across data sets.