this thread explains how to digitize a book quickly and easily.
search archive.org for "booksculture00mabiuoft" for the book.

***

in the last lesson, we culminated with a program that gave us
a search capability on the original o.c.r. text-file of our book.

we can make that program work on our edited text as well.

the newest version of our edited text is here:
>   http://zenmarkuplanguage.com/grapes002.txt

you can use your comparison tool to see the changes made
from the previous version...  but i'll tell you that one set of
changes was to ensure all of the pagenumbers are correct.

and since the pagenumbers are correct, we can show them,
(rather than the linenumber in the file) for lines we display.

so we'll do that.  we'll show the pagenumber for each line,
and we'll include the linenumber _on_that_specific_page_...
(so we kinda know if the line is up top, middle, or bottom.)

plus, as we said earlier, having this information be correct
means that -- once we pull out any particular line of text --
we can show (or link to) the _scan_ of the page containing it.

so let's incorporate these additions into our search program.

the search program, now working on grapes002.txt, is here:
>   http://zenmarkuplanguage.com/grapes109.py
>   http://zenmarkuplanguage.com/grapes109.txt

as you can see, once you run a search, each located line is
preceded by the pagenumber:linenumber -- plus the line
(in its entirety) has been made into a link to its page-scan.

click on a line to refer to its scan to resolve any questions.

for an example of this, consider that -- when i am doing
edits on the text -- if there is something about which i am
uncertain, i'll mark it with a "??", so i can return to it later.

when i was doing the "first-pass" edits to this file, i didn't
bother to look at pagescans, but if something was unclear,
and i couldn't make the decision without seeing the scan,
i just used the "??" notation.  so, now that we have links to
view the pagescan for any line found by the search routine,
you can play along with me and search for those "??" lines...

the tool will show you these:
>   052:10 _____ the 'Myth of the Soul' ?? I discovered
>   092:18 _____ spell which they all possess, ?? -- the
>   100:06 _____ sonalities ?? the atmosphere of such re-
>   105:22 _____ Kourotrophos ?? the mother of corn and
>   138:18 _____ ?? climate; intensity of heat and light;
>   146:22 _____ characterise ?? the man of culture will be
>   266:18 _____ us. They disclose wide ?? diversity of
>   275:11 _____ in the educational ?? of the indi-

it's simple enough to click the line, so as to view its scan.
for instance, clicking the top line will display the scan for
page 52, and we know the line is #10, so midway down...
and yes, the scan tells me that there _was_ a comma there
(which is what i had been unsure about).  so i will fix that.

indeed, i can now fix all of those lines in similar fashion...

***

another thing i've added is the ability to list all of the lines
on a particular page.  just enter "p#123" in the search-box,
for instance, and all the lines for page 123 will be shown...

and all of the lines are links to the scan for page 123, yes,
but in this case, we obviously know we wanna see the scan,
so the program just displays it, without us having to tell it.

***

if you are paying attention, all this should give you lots of
ideas about what could be done next, so think about that...

-bowerbird