I recently stumbled upon this talk by professional real-time captioner Mirabai Knight about why human captioning (still) matters:
I highly recommend watching the talk in its entirety. I found it super interesting and learned a lot. However, if you have only 5 minutes, I suggest watching starting at minute 10:38, which contains my personal highlight.
Mirabai gets to the point of what has been bothering me for a long time. All around me, automatic captioning is emerging. Besides Youtube’s very own captioing, there is the LiveTranscribe App of Google and many others coming up. Whenever I use or tested them, my impression was: yeah, it is improving (compared to a few years back), but it really does not have the accuracy that is enough for me. Whenever I don’t get one of the words in a sentence, I can be pretty sure that the system fails exactly at that word as well. And then it is exactly not helpful.
Many people say “Yeah, but 90% is awesome, isn’t it?” It took me while to realize that this statement comes either from hearing people who tested it for 5 minutes and did not suffer much from the words that it got wrong, because they could hear them. Or it comes from people who have a higher degree of hearing loss and themselves miss more words than I would (with a medium hearing loss). Of course, then it is more of an improvement and “better than nothing” and I understand the higher appreciation.
However, Mirabai, nicely describes that 90% accuracy is not 100%. 90% accuracy in an average sentence of ten words means one of ten words is omitted or messed up. And if that is the most important word in a sentence, that can be really unfortunate for the understanding of the sentence and the amount of brain power to make up for that. And if that continues with the next sentence, it isn’t actually a great experience.
I am so happy that finally someone explains this well. Thank you, Miarabai.