kent said:
> At the risk of coming into the middle:
ain't _that_ the truth! ;+)
unless i am prodded further, however, this will be
my last post on this thread. and this will also be
my last thread before i take a long break from here,
with the exception of my final report on "my antonia",
and reports on the book that miranda asked me to do...
and yes, people, it's a long post, because it's full of
detailed thinking and analyses. if that ain't for you,
not your cup'o'tea this afternoon, hit the 'delete' key,
don't go running off complaining to michael and greg...
> My experience is that the time consuming part
> of going from book to E-book is the proofreading.
ok, let's take a look at what you have to say.
> I use a Canon S230 3 megapixel camera
um, it is unlikely that's good enough.
this very issue of using a digital camera
rather than a scanner is being discussed
right this second on another listserve,
but the people there are talking about
5-megapixel and up, even a 10-megapixel.
i seriously doubt a 3-megapixel works well.
there are other concerns with a camera too.
are you using external lighting on the book?
if not, then your images will be substandard.
do you use a tripod? do you focus manually?
as always, photography can be a tricky thing.
> I use a Canon S230 3 megapixel camera
> in a copy stand to get about 300 dpi scans.
there are also issues with the "copy stand".
some stands can be good. others, not at all.
are your scans showing curvature problems?
if so, that can be a killer to o.c.r. recognition.
unevenness in the brightness across the page?
that can significantly impair the o.c.r. too.
> I use a Canon S230 3 megapixel camera
> in a copy stand to get about 300 dpi scans.
300dpi ain't giving your o.c.r. app the best you could.
and isn't really creating what you'd want for archives.
it's much more time-consuming to scan at 600dpi,
and i think it's an open-question whether we want
to ask individuals like you to take that extra time,
or whether we wait to re-do scans until we have
the equipment that will make that process fly by.
but if we do take the 300dpi shortcut in scanning,
or by using a digital camera rather than a scanner,
then we need to do it with the full knowledge that
that decision _might_ impact o.c.r. accuracy, which
in turn _might_ result in more proofing work, which
_might_ end up actually _costing_ us time overall...
given the differences obtained from different scanners,
and different source-texts, and different o.c.r. programs,
and even from different _people_ doing the imaging --
if you've looked at a range of scanned books, you'll know
that different people exhibit a wide range of variability
in how carefully, e.g., straight, they position each page
-- it's very difficult to do the research we'd need to do
to find out exactly _how_much_ time we're wasting by
creating images at less-than-ideal resolution. but we
are _certainly_ wasting some time, in some situations
-- and perhaps a _lot_ of time in more than we know...
i'll put this as plainly as i can:
if we use inferior tools, we _will_ get inferior results.
if you take care to notice it, my statements about
"one evening" are hedged carefully with qualifiers
about "the right scanner", "the right manipulations",
"the right tools", and of course, "an average book"...
a lot of the people who scoff are people who are
using inferior tools, and getting inferior results.
people once thought heavier-than-air flight impossible.
it is, if you do it wrong. if you do _anything_ wrong.
and there are lots and lots of things you can do wrong.
but do _everything_right_ and flying is certainly possible.
now people fly every day in a plane, with no second thought.
and, to be clear, i'm talking about the amount of time
that it takes _after_ the page-scans are cleaned up.
as people have confirmed, the scanning and clean-up
will often take a very long time, all by themselves.
compared to _that_, proofing should be much faster.
before i leave the arena of the image-creation process,
i should say there is only _one_ "right" scanner out there
currently, in the range of personal affordability anyway.
it's that optic3600 that other people have mentioned here.
if you're using another scanner, you're wasting your time.
maybe you're not wasting a _lot_ of your time, perhaps
not enough to consider a $250 scanner as an "investment",
but you need to know that you _are_ wasting some time.
and if you use inferior tools, you will get inferior results.
one more thing, since carlo mentioned that sometimes
he gets inferior results because the p-book is shoddy.
hey, no question that a bad original will make bad scans.
the best answer to that problem, though, is very simple:
go find a cleaner copy of the book to get your scans from.
_somewhere_ out in the world, there _is_ a cleaner copy.
(if not, let that rare book be scanned by a professional!)
and if those bad scans are coming from somewhere else?
the same answer: go find a cleaner copy and scan _that_.
don't waste valuable time dealing with inferior images!
jon noring keeps talking about how wonderful it is that
distributed proofreaders keeps the scans for their books.
and it is. but the truth of the matter is that previous few
of those scans can be considered good enough for archival.
so those books will have to be rescanned in the future too.
let's hope that brewster and/or google are doing it right...
> I use Abby FineReader 5.0
v5 won't give you the accuracy that v7 will.
that's likely the _main_ reason that proofing
is taking you longer than it should. version 7
does a much better job that 5. you will find
the upgrade price _is_ an excellent investment,
even if your time isn't really worth very much...
if you use inferior tools, blah blah blah...
> then comes a first pass proofreading,
> also fixing headers and footers.
> this is often 30 seconds per page.
um, no.
you're getting way ahead of yourself.
after scanning, you _first_ need to clean up the page-scans --
which means deskewing them, standardizing placement, etc.
almost every page is skewed to some degree. even though this
might not be apparent to you without careful analysis, it _is_
a factor with big impact on the o.c.r. accuracy. and furthermore,
when a person views page after page of the images, to read 'em,
even a small skewing causes a subconscious weirdness to them.
as for placement, i mean the left and top margins of each scan
are identical. it's another factor effecting reader subconscious.
while it's less important to o.c.r. accuracy, it does sometimes
exert an impact there too, specifically in regard to the "zoning".
(and yes, you _do_ have to zone the pages to get the best o.c.r.)
there are a whole slew of other ways to manipulate the images.
i don't have any experience with some of them, to discuss them,
but there are some people over at distributed proofreaders who
seem to know a lot, including one person whose name escapes me,
who has formulated his "recipe" for enhancing page-scan images.
interestingly, it includes "blurring" the image at one point, which
certainly seems counterintuitive, but has the effect of converting
the one-pixel dots into two-pixel dots (or some such), which means
they don't get deleted in a later step where the image is downsized.
(d.p. resizes many scans to a size that works well in their system;
that also might be considered a shortcoming in their scan-archive.)
now some of the skeptics out there are probably muttering now that
adding time to the imaging process to save it on the proofing process
isn't really "saving" us any time. and there is a little truth to that.
however, many of these image-cleanup steps can be _automated_,
so they are great candidates for inclusion in our ideal work-flow.
even more importantly, it's vital that we start considering the scans
as a product in and of themselves. i fully agree with michael hart that
"a picture of a book is not an e-book". i too want raw, editable text.
but that doesn't mean a high-quality "picture of a book" isn't useful.
indeed, as pointed out here, it's the first step on the way to getting
the raw, editable text. and even after that, it continues to be useful.
people _will_ -- in the future -- desire to _replicate_ older books.
they will want print-outs that "look exactly like" the original book.
(_especially_ with books like those by william blake, for instance.)
and the best way to fill that demand is to have high-quality scans.
tomorrow's low-end printers will be 600dpi (if they aren't already).
so that's the resolution that we need to be aiming at with our scans.
yes, i fully realize that that is ridiculous in terms of the present,
when that kind of resolution overwhelms our memory and bandwidth,
as soon as we stop thinking about books at the individual-book level
and start thinking about them as collections in the tens of thousands.
which is precisely why i tell people now that 300dpi is acceptable,
even for the "archive" versions we're building for the here-and-now,
just as long as the 300dpi scans give us acceptable o.c.r. recognition.
but i give louder applause to the foresight to go to 600dpi right now.
(me, though, i'll go 300dpi unless/until i have a high-speed scanner,
expecting that _every_ book i'll scan will eventually be rescanned.)
> then comes a first pass proofreading,
> also fixing headers and footers.
> this is often 30 seconds per page.
ok, after you've cleaned up the scans, you can start the "proofing".
but there are lots and lots of different ways of "doing the proofing",
so let's be perfectly clear about exactly what we we're talking about.
my software tool guides you through the processes a certain way,
so i'll be discussing that path. like i said, i plan to release my tool
in late spring, about the same time that the internet archive begins
to release scan-books from their toronto project, so if you prefer,
save this post until then, when my tool is out. that's fine with me.
on the other hand, if you want to consider my alternative processes,
to see which ones you can incorporate into your work-flow, read on.
i don't mean to frustrate anyone by saying "i've got a tool to do that"
before the tool is released. but if this advance information helps...
the first thing to do is a quick check that you got all the scans right.
my tool allows you to "thumb through" all of them, from start to end;
it displays them 2-up, so they look exactly like a p-book page-spread.
on the first pass, you'll just look at each spread, ensure it looks good.
on the second pass, you'll be looking at the text instead of the scans.
here, the 2-up view shows the text on one side, the scan on the other.
(my tool uses this 2-up view -- text next to its scan -- throughout.)
in this pass, you'll be formatting the text, to make it match the scan.
i'm still in the process of figuring the best way to save o.c.r. output,
i hope my tool will do most of the formatting right automatically, but
when it doesn't, you will have to do the formatting yourself, manually.
"manually" doesn't mean "editing", like you'd do with a word-processor.
while that may be necessary on some rare occasions here, in general
there will be buttons that you can click to do most of the formatting.
for instance, say there's a block-quote that didn't get auto-formatted.
you would select the lines of the quote, and hit a "block-quote" button.
same for a poem that didn't get indented, or to right-justify an epigraph.
if your book is like most -- one boring page after another boring page --
there will be very little for you to do. for "my antonia", for instance,
the only real excitement here was with the occasional chapter heading.
for books that need heavy formatting, you should save that for later,
and move to the next step, which is where the tool starts "proofing".
my tool -- and the ones that are being developed by other people too --
takes the o.c.r. results and automatically makes some changes _before_
ever presenting them to you "for proofing". for the most part, these are
changes due to known recurring errors in the o.c.r. recognition routines,
so a person generally needs to build a list idiosyncratic to their setup.
(one person doing this had a list of over 400 rules with his old scanner,
but when he bought the optic3600, he was able to drop _half_ of them.)
there are also some checks that are generic to all setups. an example
would be replacing any "tbe" word with "the". undoubtedly a flyspeck
caused that nonsense error, so we would just change it automatically.
remember that all of these changes are taking place _before_ the text
has even been viewed yet by a human being, so if -- for some reason --
it _really_was_ "tbe" instead of "the" (because, for instance, it was
_this_ message that was being scanned), the human can change it back!
(well, if it actually was _this_ message being scanned, then the change
wouldn't be _automatic_, not with my tool anyway, because any "scanno"
that is in quotes is _not_ changed automatically, for just that reason.
but you get my point: it's safe to make automatic changes at this time,
because we know that human beings are still going to review the text.)
there are a number of other checks that happen at this time as well,
based on analyses of the text. i won't say much about these, because
that would give away too much about my program before its release,
but some of the obvious ones would include the one to "close up" the
spaces that o.c.r. often injects around punctuation. (or which, like in
"my antonia", are _really_ right there in the paper-book. an example
is on the very first page -- page 3 -- where "hands" is surrounded by
such floating quotemarks; it's clearly printed as " hands ". even jon,
with his focus on "fidelity", tightened up those floating quotemarks.)
this is where the o.c.r. of "mr," and "mrs," -- followed by a comma,
instead of a period (which i mentioned before) -- would get fixed.
all of these automatic changes are logged to a file, so they can be
reviewed by a human. except that review is often a waste of time,
because these changes are (or at least should be) totally obvious.
and if your review _does_ show an auto-change that was incorrect,
and therefore shouldn't have been made, you would seriously consider
_the_removal_of_ the rule that was responsible for that auto-change.
also, kent, since you specifically mentioned headers and footers,
a good tool will let you retain those right up until the last minute.
they don't hurt anything -- and they help you keep your bearing --
so there's no need to delete 'em. the tool should de-emphasize them
-- mine displays them in gray, which makes 'em unobtrusive _and_
has the benefit of letting you know it identified them correctly --
but they're something you shouldn't have to spend time on in any way.
after the automatic changes comes the fun part. at this time, the app
does the hard work. again, i don't wanna steal thunder from my tool,
but the aim at this point in time is to present to you _each_line_ that
will need your attention (accompanied by the page-scan containing it),
and _only_ those lines that need your attention (i.e., no false-alarms).
that is, the tool seeks to find every line that has an _error_ in it, and
present it to you, alongside a page-scan, so you can correct the error;
and it seeks to show you _only_ those lines that really have an error,
so it doesn't waste your time showing you lines you don't need to fix.
that is the "secret sauce" in the tool -- to show you _every_ line that
you'll need to fix, and _only_ the lines that need fixing, and no others...
of course, that's the _ideal_, and we can only hope to _approach_ that.
after all, if the tool knew for certain where each and every error was,
we could just tell it to correct the errors itself, while we ate lunch.
so we scale our expectations back to something a bit more reasonable,
and have the program bring up -- to the best of its ability to do so --
each line for which it has some good reason to think we need to check.
to put this into a phrase, we have the tool look for _probable_ errors.
some of them might not actually be errors, but we go on probability...
we do want to find _all_ the errors, or as many as we reasonably can,
so we'll accept _some_ "false alarms". they're preferable to _missing_
an actual error. but at the same time, too many of 'em wastes our time.
after all, the tool could just show us _every_ line and say "check it";
but that wouldn't be buying us any improved efficiency now, would it?
so the closer we get to the ideal -- show us every line we need to see,
and not one line that we _don't_ need to see -- the better we like it.
and if the tool tells us what is wrong with the line, and suggests the
correct fix, with a "yes, fix it" button we can click, so much the better.
to use an example from above, let's say that it offered to close up those
floating quotemarks around "hands" with just the click of button. slick!
if we get _close_enough_ to the ideal -- where we are shown only lines
that have errors, and no others -- then we will have just sat there and
button-clicked, while our text became easily and adequately "proofed".
once we've corrected every line that needs to be corrected, we are done!
but we don't really have to get all the way to the ideal to be successful.
again, my "standard" is 1 error every 10 pages. and i expect to do better.
but if i attain that rate, i will consider my tool to have been "successful".
i should say specifically that _spell-check_ is an important part of this.
i find it laughable and ridiculous that distributed proofreaders does _not_
do a spell-check on the o.c.r. results before shipping them off to proofers.
your first reaction might be "why do a spell-check, since that is exactly
the job proofers are gonna be doing anyway?", plus then go on to point out
how much time a spell-check would take, and various other considerations,
perhaps even launch into your spiel about "what a distributed process is".
(spare me; as a social psychologist, i understand it far better than most.)
heck, there is actually some debate over at distributed proofreaders about
whether a spell-check must be done _after_ the text comes out of proofing.
which explains why some e-texts are actually being posted now that have
obvious spelling errors in them that will _not_ pass a spell-check! awful!
except i'm talking about a very specific form of limited spell-check, namely
an analysis of the text that creates a list of all the words used in the book.
again, i won't explain how it works, but the purpose is to compile the words
that are _unique_ to the book. the best example is _names_of_characters_,
another good example is _words_and_phrases_from_a_foreign_language_.
and there are other categories. here are some examples from "my antonia":
> kolaches
> mamenka
> misterioso
> patria
> tatinek
> amour propre
> noblesse oblige
> Optima dies… prima fugit
> palatia Romana
> Primus ego in patriam mecum… deducam Musas
these words are used to create a _book-specific_spell-check_dictionary_:
words not in a normal spell-check dictionary, but which _are_ in the book.
i believe that every e-text should include such a word-list in an appendix.
first, it's useful, from the standpoint of end-users running a spell-check;
once this book-specific word-list is specified as an additional dictionary,
the entire file should pass through spell-check without pausing even once.
but moreover, it's just plain _fascinating_ to browse this list for a book.
it is a quickie road-map to the freakish extremes of that particular book.
back to the job at hand... the word-list _is_ very useful to spell-check
text right out of o.c.r., and _before_ you commence the job of "proofing".
as a good example, remember those character-names? when you browse
an alphabetized version of the word-list, you'll see a name popping up in
a variety of variant forms, such as the possessive, the plural, and so on.
what you'll _also_ see, though, is an occasional place where the name
was misrecognized. boom! my tool allows you to click on it, and then
immediately jumps you to it in the text -- right alongside the image --
so you can verify that it's an error, and change to the correct spelling.
(my plan is to have a button you can just click to make the correction.)
and if the error is obvious enough, you might not even go to the bother
of jumping to its location in the text, but rather just fix it immediately.
(remember, you can review these changes if you want down the line.)
one of the test-books i used to develop my tool, way back when i first
started putting it together, was "the hawaiian romance of laieikawai".
(some of you know this e-text was in the group issued for dp#5000.)
i might've spelled that name wrong; face it, it's a pretty difficult one.
and, as you can imagine, the o.c.r. yielded quite a few variations of it!
there were literally _dozens_ of 'em, off by a letter or two (or more).
and not surprisingly, there were many hawaiian names, long and short,
in this text, and the o.c.r. came up with a number of variants on each!
although it was a pleasant story, and the o.c.r. was relatively clean
for the pages -- remarkably so, considering how bad the scans were --
those difficult names made the task of proofing a terrible nightmare,
so this text took a fairly long time to make it through all the rounds.
using my tool, however, all of the various scannos on those names
were easy to locate, and to correct, and that task was done quickly.
thinking about individual proofers, going to the trouble of correcting
each of those name scannos, independently, manually, i am appalled!
imagine how much of a hassle that was! what a tremendous waste!
but the scenario is even worse, at least for proofers who were careful,
and took their job seriously, because in order to check _whether_ the
name is spelled correctly or not, you must examine _every_instance_.
and that process is extremely error-prone. and fatiguing. and boring.
if the name was _at_least_ in the spell-check-dictionary for the file,
the spell-check on the d.p. page would show it was correctly spelled
(when it was) by failing to highlight it. and flag incorrect spellings.
but until it's in the dictionary, every occurrence must be scrutinized.
think how much of the proofer's time and energy could've been saved
if the instructions would have said, "hey, ignore the hawaiian names,
we fixed them all in a global operation before you got these pages...".
to subject proofers to those difficulties, when such a simpler method
isn't being developed and utilized, is almost an abuse of the good-will
those fine volunteers are giving you by donating their time and energy
along about now, someone will say, "d.p. plans to install the capability
for a proofer to add a word to the spell-check dictionary for a book."
well, gee, after 6,000 books, i would _hope_ you finally got the idea!
and if you did it _right_, you'd create the book-specific dictionary
_automatically_, before the first page is sent to the first proofer.
i don't mean to sound high-handed and morally indignant and all that,
because i fully realize this is an ongoing learning process for everyone,
but hey, i guess it's easy to waste volunteer time if you have lots of it.
and it would address my concerns _greatly_ if the people-in-charge
(and the loudmouths who _act_ like they are) would be _accepting_
when well-intentioned people try to advise them on their processes.
but there is an active hostility over there to constructive criticism.
and i find that tragic. but i digress...
getting back to the matter of an _individual_ doing a book, though,
my objective for that situation is to make that person _efficient_.
so _this_ is the type of spell-checking that you need to do _first_,
one whose essential operating philosophy is a _book-wide_basis_.
and then, only after that, yes, if you are an individual doing a book,
the next thing to do is a _regular_ old spell-check, the type that
goes from one questionable word to the next. the difference here --
and yes, one that my tool facilitates, of course -- is that when you
come to a questionable word, the _page-scan_ is shown right there.
some people actually say, "you should never do a spell-check, because
some words that will pop up are actually as they were in the original,
and they need to be left that way. so a spell-check is a waste of time,
because what you really need to do instead is a line-by-line comparison."
that's poppycock. _of_course_ that situation _can_ happen. sometimes.
and that's why you've got the scan there, to check the questionable word.
i don't advocate a blind "correction" to each and every questionable word.
and you must be able to easily add a word to the book-wide dictionary,
if you find that my tool is continually popping up a word that it shouldn't.
(but odds are that it would've been put in the dictionary in the prior step.)
but _nonetheless_, if you want to find words the o.c.r. _misrecognized_
-- and remember, that's the objective, to isolate _probable_ errors --
the best bet is to look at words that aren't in the spell-check dictionary.
all right, so that takes care of spell-check.
a final set of checks is then done that looks for anomalous situations;
some of these involve punctuation, infrequent juxtapositions, and so on.
there are some words that pass spell-check that you still want to view
-- they are called "stealth scannos" over at distributed proofreaders --
and they are one of the things that are checked in this final set.
and at that point, you're done with the text-cleanup. congratulations.
all in all, as well as i can tell from the testing that i've done so far,
you can expect the tool will present between 1% and 5% of the lines
in the text-file to you for one kind of close examination or another,
and perhaps 75% of those will require a "fix" of some kind or another,
assuming that you got relatively clean o.c.r. results in the first place.
that's a lot better than looking at 100% of the lines to "proof" them.
and that, my friends, is how you can do a whole book in a few hours.
unless you put aside that heavy markup earlier. if so, it's time to do it.
once again, you will page through the book, text and scan side by side,
doing whatever editing needs to be done so the text is formatted right.
without knowing what kind of formatting you'll need to do, it's hard to
tell you how you'll go about doing it. so you'll have to wait until you can
get some hands-on experience with the tool to see exactly how it'll work.
but it definitely will not be anything like the pseudo-markup over at d.p.
-- where, for example, /* and */ are used to bracket poetry and stuff --
and it will most certainly not be any form of x.m.l. or h.t.m.l. markup .
it _will_ be z.m.l. -- invisible markup that mimics the p-book page.
and as my tool gets more and more advanced, it will actually _display_
the text just exactly as it will be shown by the z.ml. viewer-program.
and sooner or later, the two apps will morph into one. (bet on sooner.)
how complex can formatting get using z.m.l.? we'll have to see... ;+)
so now that you've gone through all the post-o.c.r. cleanup my tool does,
and the pages are nicely formatted so they resemble the original p-book,
what next? well, it's probably the case now that your text is _already_
clean enough to meet or exceed our standard of 1 error every 10 pages.
but i assume that if you're doing this book as an individual, it's because
_you_actually_have_an_honest_desire_to_read_or_re-read_this_book._
because _that_ is really the absolute _best_ reason to digitize a p-book.
so read it!
read it in my tool, which allows you to display the image of the page
right alongside the o.c.r. text for that page. keep in mind that you are
reading for the express purpose of catching any errors in the text, so
read carefully. at the same time, though, read for your enjoyment too!
it's only by being engrossed in the story that you'll catch some errors,
such as a word or a line inadvertently dropped. so become engrossed!
if you find an error, first _log_it_! keep records, to improve the tool.
_then_ use your word-processor to search the text for _similar_ errors.
if that search yields other instances, see what you can learn from them,
and expand your search based on anything you can generalize about them.
some errors are flukes -- a coffee-stain on the page, or what have you.
but others can be recurrent, and if you can pin down a recurrent error,
you will become much more efficient in your efforts to clean up a text.
finally, i will mention again that _text-to-speech_ can be _amazing_
in helping you to locate errors in a text you might never have _seen_
my tool will do text-to-speech; it'll even pronounce the punctuation,
if you select that option, so you can verify that in your text as well.
so i highly recommend that -- rather than reading the text to check it
for that final "proof" -- you _listen_ to it instead, via text-to-speech.
this has the added benefit that you can do it away from your computer.
a lot of people enjoy putting a book onto a walkman, or even an ipod,
and listening to it in the car, or at the exercise club, or out jogging.
that's fine. (just be conscientious about _remembering_ any errors!)
once you have done this final check, your "proofing" job is all finished.
say what? does this mean i don't advocate a line-by-line comparison?
isn't that what most people, like d.p., consider to _be_ "the proofing".
well, let me put it this way: if you _want_ to do that, by all means, do!
do i think it's absolutely necessary? well, in most cases, absolutely not!
doesn't a failure to do that mean that you might release a text that has
some small errors in it? well, yes, it certainly does, but that is exactly
why i build the "continuous proofreading" step into my overall processes.
no matter how good a job you might do, certainty requires more eyeballs.
so if you're really feeling insecure, have other people read your file too.
better yet, have someone else process the book completely independently,
and compare their final file to yours. that should catch _every_ error.
but if an error hides through all of the tools, and withstands a reading
by an engrossed human and/or wasn't noticeable during text-to-speech,
then that error is insignificant enough that i'm not gonna worry about it.
i think it _should_ be corrected, and (due to "continuous proofreading")
that it eventually _will_be_ corrected. but i ain't gonna worry about it.
and considering the care i put into listserve posts, it's obvious i'm anal.
there are 6,272 words in 707 lines in this message. find the typo in it.
i circle the mistakes in everything i read, for the sheer fun of doing it.
so if i can live with that error, hey, you can probably live with it too...
at the point of insignificant errors, our attention is much better spent
with a focus on digitizing additional books. i'll repeat, so it sinks in,
that if someone _wants_ to do line-by-line comparison, that's _great_.
but if we can get texts that are far-and-away error-free without it,
then _i_ have far better ways to spend my time, thank you very much.
and don't try to make that out that i don't care about finding errors,
or that i'm talking about "something different" than what you mean,
and that's the only reason i say it can be done in just one evening.
because my processes will give just as accurate results as yours.
and i'll be happy to prove it by finding the errors in _your_ e-texts.
anyway, now you're done _proofing_, but you're not _completely_ done.
because there's just one more step before you can send your e-text out.
up until now, you might have had the text from each page in its own file.
(or maybe you had it all in one file, since my tool can work either way.)
but if you had them in separate files, they'll now need to be combined.
we also want to get rid of the headers and footers and make it all nice.
these are things my tool does for you -- mostly automatically --
but there are a few that do require some input from you, and some
others you have to monitor to make sure they are done correctly.
one example would be footnotes, which are moved to be end-notes.
another example is to make sure all headings are at the right level.
and when the end-line hyphenation is removed, you might be asked to
make decisions for the tool when it seeks your guidance on that job.
but for the most part, the tool will step you through all these tasks.
it assumes that you're not an expert at doing this, and it helps you.
there isn't that much more for me to explain about this final step,
other than to mention that you _might_ want to execute this step
before you read through the book or listen to it via text-to-speech.
once you've concluded these steps, your file is a bona fide e-book.
congratulations! you've moved a book into the realm of cyberspace!
you can load your e-text into my z.m.l. viewer-program, and boom!,
you'll see that what you created is a high-powered electronic-book!
the headings are big and bold! your table-of-contents is hot-linked!
words that were italicized in the p-book, which my tool marked with
underscores like _this_, are again shown in all their italicized glory!
illustrations are displayed on the appropriate page, automatically,
and all you did was make sure their file-name was nearby that text.
after this step, future versions of my tool might perform conversions
of the e-text to other formats, like .html and .pdf and .rtf, if you want.
plans in that regard are still fairly tentative, and i might decide that
i will leave that matter to the end-reader using my viewer-program.
your time might be better allocated by proceeding on to the next book.
after all, it was fun to do it, wasn't it? and it only took one evening!
> The real problem is my day job is using up most of my available
> concentration, so I don't feel up to spending too much time proofing.
well, yeah, there's no question that this job does take concentration.
there's really no way around that. i will say, however, that my tool
helps to _conserve_ your concentration by helping you to _focus_ on
the things that require your attention, and not the things that don't.
and that's really the big secret in making people more efficient here.
indeed, that's what enables you to do an average book in just one evening.
anyway, i have exposed enough flaws and gored enough sacred cows
in this post that i can feel the vilification efforts building already.
like i said, unless i am prodded, this is my last post in this thread.
and except for a few final reports on the other threads, i'm all done.
if those vilification efforts break out, though, and i am challenged,
i _will_ remain here to defend myself, as i stand behind this post...
otherwise, i'll be out of here until one of these tools is released,
either from me or from one of the other people working on them,
or until someone comes on here trying to tell you this job is hard.
it ain't, folks. it's easy. and people have been flying for decades...
the choice is
up to you, people...
-bowerbird