and so we come to the conclusion of our lessons in this thread,
where i've explained how to digitize a book quickly and easily...
you can find the materials we used for this series of lessons by
searching archive.org for "booksculture00mabiuoft" for the book.
***
in our lesson before this one, i showed you this .pdf:
> http://zenmarkuplanguage.com/grapes124a.pdf
but to illustrate for you the process of _customization_,
i have generated another .pdf, using other preferences...
> http://zenmarkuplanguage.com/grapes125b.pdf
this one uses a different font (verdana) in a bigger size.
i think if you view the two, side by side, you will find that
you have a definite preference for one or the other -- or
perhaps you like the font in one and the size in the other.
the point is, if we give end-users the capability to generate
a .pdf which they customized to their personal preferences,
that's clearly going to be the optimal experience for them...
there are a lot more demo .pdfs which i could generate,
not just different customizations, but of different types.
for instance, these 2 .pdfs were created to be free-flowing.
but we could generate one which retained the linebreaks
and the pagebreaks which matched the original p-book...
that .pdf would match the page-by-page website which i
showed you in the previous lesson, and we could even do
some things which _created_synergy_ between the .pdf and
that website. specifically, we could put a web-link on each
page of the .pdf that opens the respective web-page for it.
if we then put a discussion-box on that web-page, then
the offline .pdf would be a tool complementing an effort
to provide a social-networking component to the book...
you could read offline, but jump online to write or read
annotations for a certain page if you felt moved to do so.
i'll discuss some of this stuff at a later date, when i do a
new "sweet grapes" series on the topic of my e-book tools.
***
we'll recap in a minute, to finish this, but first: _caveats!_
the code i've shared with you in this thread is not "solid".
it's meant to demonstrate one thing -- only one thing --
namely, that the code necessarily to accomplish the task
of cleaning o.c.r. and making e-books isn't hard to write.
indeed, if you are willing to write quick-and-dirty code,
like the code that i wrote in these lessons, you can do it
with a minimum of time and effort, even if you're doing
that coding in a language that you've never used before,
which i demonstrated by using a language i'd never used.
the programming constructs you need are the simple loops
and string operations that you can execute in any language.
there's nothing difficult about this, nothing, in the slightest.
having said all of that, though, quick-and-dirty code is still
dirty code. it's not robust to violations of your assumptions,
so it's fragile. and it most definitely won't do everything that
you will find -- eventually -- that you need to be able to do...
sometimes the code i wrote was outright bad. not in the sense
that it had bugs -- it probably did, but for the most part, it all
worked good enough to do the job for this book -- but it was
most definitely not the _best_ way to write all those routines...
sometimes i wrote it "the wrong way" because it was easiest
to do it that way, and sometimes i wrote it "the wrong way"
because i don't care to share all the hard-earned experience
that i've gained over the years with the enemies i have here,
and often i wrote it "the wrong way" just to make you think...
i do have code that is solid, and capable, and flexible, and
i know without question that it works across _lots_ of books,
because i _developed_ it across lots of books. if i had wanted
to give you the best code i could write, i'd have given you that.
that wasn't my goal. my goal was pretty much the opposite,
to share the _worst_possible_code_ that would _do_the_job_.
so i wrote everything "from scratch", looking at no other code.
and the bottom line is that you shouldn't "just trust" that code.
use it and learn from it, especially what there is about it that
enabled it to "work", in the sense that it "did the desired job".
but don't "just trust" it.
and if you do use it on some other book, know that you must
examine the output closely, to see _if_ it worked correctly for
_that_ book; you must find ways to _test_ for correct behavior.
it's also the case that the _order_ in which i did the tasks was
perhaps not optimal. i was _teaching_ and _demonstrating_,
so i followed the order which optimized _that_, not "the job".
and if you want to go through this exercise again, with me,
on another book, to see what it'd be like if we did it again,
i would suggest that we use _this_ book as our raw material:
> http://www.pgdp.net/phpBB2/viewtopic.php?t=49047
it's going through d.p. now, so we'll have another digitization
which we can use for comparison, so as to evaluate our work.
***
now for the recap...
this thread shows how to digitize a book quickly and easily.
we've gone from raw o.c.r. to finished e-books, using only
your wordprocessor and code written by this python virgin.
congratulations, friends. now you know how easy it can be.
and you will never again let anyone snow you with the _lie_
that it has to be some complicated and difficult procedure...
it's a matter of running spellcheck, to correct o.c.r. errors,
and doing a few consistency checks, to fix some other stuff.
will a couple of "stealth scannos" survive? yes, it's possible.
and you certainly won't catch all of the punctuation glitches.
but if you ask your end-users to contribute error-reports
-- and then _act_ on the reports _promptly_ (unlike p.g.),
and give your reporters full credit for finding the errors --
you will soon discover that they'll be delighted to help you.
plus, after your text is cleaned, you can just click a button
to generate the three "canonical" output-formats you need.
then distribute the text as-is -- that's right, the .zml file --
to allow your end-users to generate their own output files,
personalized to their own idiosyncratic needs and desires...
that is the way the cyberlibrary of the future should work!
and thus concludes this thread of lessons.
have a good weekend... :+)
-bowerbird