Video is just complicated. The libraries have to be native, because of performance (even if you don't need the performance, the kind of person who writes a video decoder from scratch will), so they have to be built as a c extension. Which means you need a c build environment, and all the complexities of setting that up. And then the api will be "bent" around high performance patterns, which might not be obvious if you're new to that field. I remember when I first did some graphics programming I couldn't understand why people were using char* all over the place instead of a 2d array class which contained instances of a Color class with all sorts of fancy getters and setters. After a few years working with graphics code that question just doesn't seem reasonable anymore, my frame of reference has changed. If I were to write docs about a new library I made, it would be hard for me to write for a new person, even if I was a good writer, and genuinely tried. Many makers of open source libraries are not, and do not try (and noone's paying them to, so why should they, if they don't want to).
So yeah, basically, I think it is just a hard problem in easy problem's clothing.
So yeah, basically, I think it is just a hard problem in easy problem's clothing.