don said:
>   Trac

i've explained to y'all many times how to do this job simply.
yet you continue to flock to the most complicated processes.
so go ahead and do it that way; you will get what you deserve.

***

for the people who wanna actually get something accomplished,
though, i'll step through a process to show how it can be done...

***

here's a copy of "pride and prejudice" at archive.org:
>   http://archive.org/details/harvardclassicss03elio

you want to use a scan-set from archive.org or google,
because those are canonical copies available to the public.

no one has to "take your work" that this is an edition that
is true to the author, because it'd been sitting in a library.

and no one can accuse you of "doctoring" the scans, since
the scan-set is being housed by an entity external to you,
and can be fully accessed independently of your influence.

a scan-set from a book you bought at barnes&noble has
_neither_ of those two necessary and sufficient qualities.

***

to work on that edition, i set up this folder on my site:
>   http://zenmagiclove.com/prhpr

first off, i copied over all of the scans from archive.org.
"pride and prejudice" was reprinted with another book,
meaning that it didn't actually even begin until page 141,
so the scans start with page 141, and continue up to 512.
>   http://zenmagiclove.com/prhpr/prhprp141.jpg
>   http://zenmagiclove.com/prhpr/prhprp512.jpg
(512 is actually a blank page, but we need an even number.)

you will find the original .djvu o.c.r. text-file in that folder:
>   http://zenmagiclove.com/prhpr/harvardclassicss03elio_djvu.txt

i also re-saved that file with a new name, befitting its new folder:
>   http://zenmagiclove.com/prhpr/prhpr-000.zml

i did some reworking of the file, mostly to show pagebreaks:
>   http://zenmagiclove.com/prhpr/prhpr-001.zml
i think you can pretty clearly see this is just a text-evolution.

then i did more work, for correct paging, and some niceties:
>   http://zenmagiclove.com/prhpr/prhpr-002.zml
i think you can still see that this is just more text-evolution.

then i wrote a program to turn that .zml file into rough .html:
>   http://zenmagiclove.com/prhpr/prhpr-002h.html
and again, the relationship to the earlier .zml file is very direct.

then i did a few global-type changes to start cleaning the o.c.r.:
>   http://zenmagiclove.com/prhpr/prhpr-003.zml

then, to show how simple it can be to use a diffing procedure,
i wrote a python script that compares 002.zml against 003.zml:
>   http://zenmagiclove.com/prhpr/docomp.py

if you run that script -- by just clicking on that link there --
you'll see the lines that were edited within the first 2000 lines.
the top line's from prhpr-002.zml, bottom from prhpr-003.zml.

there is also a "change-line" below the other 2 lines, which shows
where the lines differ (as sometimes it's hard to see the change).

for instance, the change-lines of the diffs on lines 176 and 183
reveal that i globally deleted the space in front of a semicolon...

the leftmost part of the change-line is a link that opens the scan
for the relevant page in another window.  (it will continue to open
the scan in a named window, so you should make sure that it _is_
a window, not a tab, so you'll see both windows at the same time.)

i have coded compare programs which are much more powerful,
but this one serves its purpose as a little app to demo comparison.

more tomorrow.  but for now, the takeaway:  none of this is hard.

-bowerbird

p.s.  there's something very revealing in those .html versions, so
take a little look at them and see if you can tell me what that is...