this thread explains how to digitize a book quickly and easily.
search archive.org for "booksculture00mabiuoft" for the book.
***
of course, if there is no lesson #13, then lesson #14 essentially
is the unlucky one... so let us use this lesson to discuss a bug...
i gave you the list of words that were "notfound" in our book.
i know, i know, i'm giving you a lot of dense lists of stuff, so
i really can't blame you if your eyes just gloss over, but _still_,
i was testing to see if you actually look at this stuff or not...
evidently "not", or you woulda noticed these were "notfound":
> conceit
> conceived
> concentration
> conception
> conceptions
> concern
> concerned
> concerning
> concert
> conclusion
> conclusions
well, quite obviously, those words should be in our dictionary.
and -- again, obviously -- they _are_. but for _some_ reason,
our program didn't find them, so we've gotta track that down...
it ends up, it was caused by a rejoined end-of-line hyphenate.
(not just an end-of-line hyphenate, but an end-of-page one.)
the hyphenate "con-crete" occurs across pages 144-145, and
"crete" on page 145 was misrecognized with an initial capital.
this caused our code, as written, to misbehave just a little bit,
because that mid-word capital-letter confused our sort logic,
resulting in a glitch on all the "con..." words after "concrete"...
i will provide a fix in the next version of the spellchecker code.
***
meanwhile, i've fixing the newly-exposed errors in our text:
> http://zenmarkuplanguage.com/grapes004.txt
after a few more checks and corrections, it'll be good to go...
***
for now though, the code is good enough that we can install it
in our page-by-page viewer-program. you can see that here:
> http://zenmarkuplanguage.com/grapes202.py
i'm "color-coding" the words now.
first of all, the "common" words are rendered in light-gray.
the list of those "common words" -- 555 of 'em -- is here:
> http://z-m-l.com/common555.txt
it's amazing how much of the book is these 555 words...
next, the british words are blue, "regular" words are gray,
"special" words are green, hyphenates are light-blue, and
"notfound" words are rendered in red, for best visibility...
thus, all the problem words can be noticed at first glance.
i've also installed a _search_ capacity... just enter the term
to find, and then click the button labeled "sf" ("search for").
you'll be transported to the next page containing the term.
clicking "sf" again will take you to the next occurrence, etc.
you also get a list of links to other pages with that term...
our "engine" for page-by-page stuff is coming right along...
-bowerbird