keith-

most of the points you made in your posts this morning
are ones which i feel do not need a response immediately,
if at all, so if i get to them this weekend, i'll write a reply...

and if not...  not...

the one exception is this...

i said:
>   >   so the first tool you'll use for digitization is
>   >   your old friend, your trusty wordprocessor...

keith said:
>   Are you really, advising people to
>   use bloat ware, MS and OO.
>   ;-))))

first, yes, i see the smiley, and i respond under that light...

this series consists of reviews of e-book digitization tools.

your wordprocessor is the first tool in your toolbox, so yes,
indeed, i am really advising that people consider it as such.

the wordprocessor serves a number of different functions.

it's like a crescent-wrench -- it can fit a nut of any size...

for that reason, the crescent-wrench is extremely valuable,
undoubtedly the single most-important wrench in the box.
>   http://en.wikipedia.org/wiki/Adjustable_spanner

nonetheless, most mechanics almost always use a wrench
that's more specialized for the exact task at hand, such as
a socket-wrench with a head that's the same size as the nut.
>   http://en.wikipedia.org/wiki/Wrench

likewise, even though we _could_ be using a wordprocessor
-- my "sweet grapes" series in october took that approach --
this series looks at tools _specialized_ for the tasks at hand.

that's why much of the focus here is on digitization _tasks_.

if you wanna review _tools_, you need to have a good handle
on what you want those tools _to_do_ -- what to accomplish.

the _overarching_ requirements here are simple to list:

1.  obtain the text (via scanners, and o.c.r. software)
2.  clean the text (the first set of tasks for our tools)
3.  turn the text into e-books (the second set of tasks)

i dispensed with the first requirement rather quickly, since
that's not a job most of us need to do these days, thanks to
the big scanning projects from google and internet archive.

the second requirement is also fairly straightforward, in that
you _could_ do the job by using your favorite wordprocessor.

more to the point, though, was to list the tasks there:
2a.  do a spellcheck
2b.  fix spacey punctuation
2c.  restore styling, e.g., italics

once you're aware of the exact tasks, you can evaluate tools.

for instance, the proofing rounds at distributed proofreaders
exist to complete tasks 2a and 2b.  and the formatting rounds
have the job of doing 2c, and _some_ tasks in requirement 3.

it is this backdrop of knowing the tasks that need to be done
which allows me to rail loudly about the inefficiencies of d.p.
they're wasting far too many human resources to accomplish
these relatively-simple tasks; they could be doing much more.

anyway... once you know tasks, you can evaluate the tools.

in that regard, let's list the tasks for the third requirement.

3a.  tagging the structural aspects of the text
3b.  converting the text into .html (intermediate and final)
3c.  converting intermediate .html into e-book output-files

i'm gonna cut directly to the chase, and tell you that there are
scripts to accomplish the 3b and 3c conversion steps, and thus
the main focus from here on will be _tagging_the_structure_...

at distributed proofreaders, which is important to examine
since it _is_ an existing system of the type we want to make
-- because the way to test this stuff is in _the_real_world_,
and _not_ in the realm of some "theoretical discussions" --
some of 3a is done during formatting rounds, with the rest
accomplished by offline, non-distributed "post-processing".

a sizable minority of post-processors, including _lucy24_,
do their work in a wordprocessor.  lucy uses "subethaedit",
a collaborative text-editor that truly is lean and powerful.
>  
http://www.codingmonkeys.de/subethaedit/

but most d.p. post-processors use an app called "guiguts".

guiguts was stagnant for a long time, but hunter monroe
has recently taken on the burden of updating it again, so
i'll give him a chance to step in now and tell you about it...

hunter, are you here?

-bowerbird