
so, while you guys have yet another discussion, i'm doing some work on "pride and prejudice"... *** here's the version i'm using, from archive.org:
*** you'll find my work in this folder on my site:
*** the original o.c.r., from archive.org, is here:
subsequent edits of that file bump up the filename:
http://zenmagiclove.com/prhpr/prhpr-001.zml http://zenmagiclove.com/prhpr/prhpr-002.zml http://zenmagiclove.com/prhpr/prhpr-003.zml
the first two of those were just to get the text into my standard page-separated skeleton, which marks the beginning of each page of o.c.r. text with a line that has _double-braces_ with the scan-name inside. the ending of each page is signified by a line with _brackets_ that hold the page-number of the page. *** "prhpr-003.zml" was the first file where i did edits. these consisted primarily of global-type changes...
here is a program which documents the changes that were made from the "002" to the "003" files.
as you can see, if you run that script, there were 7,607 lines edited by these global-type changes. *** i've now done more edits, creating another version:
most of these changes were of a "global" style, but i also fixed idiosyncratic errors when i spotted 'em. as before, you can view all lines that were changed:
as shown, 515 lines changed from "003" to "004". *** also, like yesterday, we can convert the .zml to .html:
this "entire-book-on-one-page" is in the style of the way most of the books from pg/dp are made, manifesting the "scroll" methodology of the web... the fact that you can get a rather-nicely formatted e-book using barely-reworked o.c.r. output would -- i believe -- be somewhat surprising to many of the post-processors from d.p., who seem to believe that this is work which requires a modicum of effort. (but note that the text remains unrefined o.c.r., and all the inline text-styling still has yet to be applied.) *** the new twist for today is that we have also spit out files that show each page of text alongside its scan:
you will have to adjust the zoom-level and the size of your browser-window for the best presentation. (advanced versions of this will give better control, using javascript to customize the elements better.) and of course, these pages connect with each other, via the "prev" and "next" links on each one. (you can also advance to the next page by clicking the scan.) this is the "design" you're used to seeing from me... a paginated display like this one lends itself more to smoothreading or, as i call it, "continuous proofing". but we are not ready for smoothreading yet, because we still have bad o.c.r. text that needs to be corrected. more later... -bowerbird p.s. did anyone notice the thing that i hinted about?