Re: [gutvol-d] the newest d.p. iteration

16 Jun 2006

      Bowerbird@aol.com wrote:
...
the reason people will slog through the problems anyway
with voice recognition is because it will be a lot more _fun_
and _easy_ to just _read_ a book through rather than to sit
inside an editing system, and that will make the difference.
Bowerbird, how often do you read books aloud?  My grandmother, who grew 
up in a time when reading to others was
an essential skill, taught me to read aloud as a child, and I spent many 
hours reading aloud to her and to my mother.
I actually enjoyed doing it, and I often wish I had more opportunities 
to do so now.  But reading a book aloud
is an extremely slow and inefficient way to get text into electronic 
form.  I could *type* a book in faster than I could
read it aloud, much less scan it.

Reading out loud is tiring, even when you're used to it.  If you have 
only read, say, picture books to your children, you
may not realize this.  You need to rest your voice after an hour or so. 
  And it takes hours and hours to read an ordinary
book out loud, never mind something like The Lord of the Rings (and, 
yes, I have read the entire The Lord of the Rings
out loud.  Twice.).

Scanning is boring, yes, but it is also fast.  And it doesn't make your 
throat hurt at the end of a session.
...
over the past 5 years, some 35,000 people signed up at d.p.
roughly 10% -- about 3,500 -- were around when d.p. reset
its subscription base a while back.  those were the top 10%,
so that wasn't a bad thing, but it does go to show that the
o.c.r. route is just a little bit too trying for the average bear.
even when you distribute out the work.
but hey, if that other 90% could do their part to help out
by simply recording a book -- they _did_ once express
enough interest in the cause to sign up, remember --
then maybe they could have been retained as helpers...
and maybe a whole order of magnitude of more helpers
could be _recruited_ if the means of helping were so fun.
with libre vox, people are already recording old books.
audiobooks, always popular, are getting even more so.
podcasting is growing the base of recording experience
(and audience) in the user-population at a _huge_ rate.
and, for those of us keeping track, there has already been
a message posted on the distributed proofing forums from
a person who reported using voice-recognition software
_within_the_current_d.p._system_.  now that's dedication.
and as the form-factors of our machines continue to shrink,
voice-recognition will become more and more important,
and more ingrained, and some people will rely on it entirely.
and speaking of libre vox, it's important to keep in mind that
a _recording_ retains value even _after_ it has been turned into
digital text.  heck, many people will prefer the .mp3 to the .txt.
there's sure a lot more player-hardware out here for the .mp3.
moreover, when a person creates a recording, that product
is _seeped_ with their contribution.  with their own _voice_,
for crying out loud.  can it get much more personal than that?
to some people, that will surely be more satisfying than the
simple credit-line at the top of a project gutenberg e-text...
and hey, it might mean a lot more to the _end-user_ as well!
i can tell you that i've looked at a lot of texts from jon ingram.
lots and lots of them.  and they almost always look very nice.
but none of them has had the impact of the bit he recorded
for libre vox, where his accent had me muttering to myself,
"hey, i forgot, that bloody bloke is from _england_, isn't he?"
there's something very endearing and personal about a voice.
even one with a heavy english accent.         ;+)
so a person who records a book is giving us _two_ products;
one is a route to obtaining digital text via voice-recognition,
and the other is a recording of that book in a human voice.
it might be that down the line, the second dwarfs the first.
it's also quite important to remind ourselves that these two
products are complementary, not competing with each other.
and it's not hard to imagine that the recording will become
_especially_ useful when it gets combined with page-scans.
a recording of each page playing when the scan is displayed
might become the most typical kind of "book" in the future!
likewise, it does _not_ have to be either/or between o.c.r. and
voice-recognition; we can instead make the two work together.
we could do o.c.r. on the scans, and then cross-check the o.c.r.
against the voice-recognition results, then concentrate on the
differences to intelligently remove errors from _both_ versions.
we would expect homonym problems in the voice-recognition,
for instance, and scannos in the o.c.r., so could control for that.
anytime you combine two different methods for the same result,
they can serve as a useful cross-check on each other.  bingo.
in case you didn't know, some of the people who are obtaining
the highest accuracy in their e-texts use text-to-speech to get it.
what i'm talking about here can be viewed as the flip-side of that.
so, in summary, if you're "rolling on the floor laughing" about
voice-recognition and the possibilities it offers to digitizers,
you show your lack of vision.  there's no other way to say it...
of course, your loss is the lurkers' gain,
because it gave me a reason to explain.
-bowerbird
------------------------------------------------------------------------
_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d
-- 
Meredith Dixon <dixonm@pobox.com>
Check out *Raven Days* <www.ravendays.org>
For victims and survivors of bullying at school.
And for those who want to help.