
Bowerbird@aol.com wrote:
the reason people will slog through the problems anyway with voice recognition is because it will be a lot more _fun_ and _easy_ to just _read_ a book through rather than to sit inside an editing system, and that will make the difference.
Bowerbird, how often do you read books aloud? My grandmother, who grew up in a time when reading to others was an essential skill, taught me to read aloud as a child, and I spent many hours reading aloud to her and to my mother. I actually enjoyed doing it, and I often wish I had more opportunities to do so now. But reading a book aloud is an extremely slow and inefficient way to get text into electronic form. I could *type* a book in faster than I could read it aloud, much less scan it. Reading out loud is tiring, even when you're used to it. If you have only read, say, picture books to your children, you may not realize this. You need to rest your voice after an hour or so. And it takes hours and hours to read an ordinary book out loud, never mind something like The Lord of the Rings (and, yes, I have read the entire The Lord of the Rings out loud. Twice.). Scanning is boring, yes, but it is also fast. And it doesn't make your throat hurt at the end of a session.
over the past 5 years, some 35,000 people signed up at d.p.
roughly 10% -- about 3,500 -- were around when d.p. reset its subscription base a while back. those were the top 10%, so that wasn't a bad thing, but it does go to show that the o.c.r. route is just a little bit too trying for the average bear. even when you distribute out the work.
but hey, if that other 90% could do their part to help out by simply recording a book -- they _did_ once express enough interest in the cause to sign up, remember -- then maybe they could have been retained as helpers...
and maybe a whole order of magnitude of more helpers could be _recruited_ if the means of helping were so fun.
with libre vox, people are already recording old books. audiobooks, always popular, are getting even more so. podcasting is growing the base of recording experience (and audience) in the user-population at a _huge_ rate.
and, for those of us keeping track, there has already been a message posted on the distributed proofing forums from a person who reported using voice-recognition software _within_the_current_d.p._system_. now that's dedication.
and as the form-factors of our machines continue to shrink, voice-recognition will become more and more important, and more ingrained, and some people will rely on it entirely.
and speaking of libre vox, it's important to keep in mind that a _recording_ retains value even _after_ it has been turned into digital text. heck, many people will prefer the .mp3 to the .txt. there's sure a lot more player-hardware out here for the .mp3.
moreover, when a person creates a recording, that product is _seeped_ with their contribution. with their own _voice_, for crying out loud. can it get much more personal than that? to some people, that will surely be more satisfying than the simple credit-line at the top of a project gutenberg e-text...
and hey, it might mean a lot more to the _end-user_ as well!
i can tell you that i've looked at a lot of texts from jon ingram. lots and lots of them. and they almost always look very nice. but none of them has had the impact of the bit he recorded for libre vox, where his accent had me muttering to myself, "hey, i forgot, that bloody bloke is from _england_, isn't he?"
there's something very endearing and personal about a voice. even one with a heavy english accent. ;+)
so a person who records a book is giving us _two_ products; one is a route to obtaining digital text via voice-recognition, and the other is a recording of that book in a human voice. it might be that down the line, the second dwarfs the first.
it's also quite important to remind ourselves that these two products are complementary, not competing with each other.
and it's not hard to imagine that the recording will become _especially_ useful when it gets combined with page-scans. a recording of each page playing when the scan is displayed might become the most typical kind of "book" in the future!
likewise, it does _not_ have to be either/or between o.c.r. and voice-recognition; we can instead make the two work together.
we could do o.c.r. on the scans, and then cross-check the o.c.r. against the voice-recognition results, then concentrate on the differences to intelligently remove errors from _both_ versions.
we would expect homonym problems in the voice-recognition, for instance, and scannos in the o.c.r., so could control for that.
anytime you combine two different methods for the same result, they can serve as a useful cross-check on each other. bingo.
in case you didn't know, some of the people who are obtaining the highest accuracy in their e-texts use text-to-speech to get it. what i'm talking about here can be viewed as the flip-side of that.
so, in summary, if you're "rolling on the floor laughing" about voice-recognition and the possibilities it offers to digitizers, you show your lack of vision. there's no other way to say it...
of course, your loss is the lurkers' gain, because it gave me a reason to explain.
-bowerbird
------------------------------------------------------------------------
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d
-- Meredith Dixon <dixonm@pobox.com> Check out *Raven Days* <www.ravendays.org> For victims and survivors of bullying at school. And for those who want to help.