ok then, let's go to work (part 1)

ok then, let's go to work... greg asked for actual examples of how online scans might be used to good effect by project gutenberg. one good set of scans is that created by jon noring, for his "my antonia" demo. those scans are here:
i subjected the scans to o.c.r. using finereader v7. one of the options in finereader is to output a .pdf. i did that, and have uploaded that .pdf for your perusal:
if you look at this .pdf, you will find that finereader does a rather amazing job of retaining the book's formatting. of course, there are scannos in the text, which makes it unusable, but from a _formatting_ standpoint, it is fine. if the .pdf we create at the end of our digitization would look as good as this one finereader makes automatically, just from the scans, we could feel proud of ourselves... that's enough for today... tomorrow we'll go on to look at the o.c.r. output itself... -bowerbird

Bowerbird@aol.com wrote:
i subjected the scans to o.c.r. using finereader v7.
what an original idea ...
one of the options in finereader is to output a .pdf.
Now, as you yourself said, the pdf format is useless for further editing. So why don't you try to output something useful, like, say, an XML file. If you were able to get a TEI file out of your finereader, you could actually proofread the thing and produce a pdf file out of it that looks a *lot* better than the one you posted. Besides you would get an html and a plain text file into the bargain, ready for posting. -- Marcello Perathoner webmaster@gutenberg.org

Bowerbird@aol.com wrote:
one of the options in finereader is to output a .pdf. i did that, and have uploaded that .pdf for your perusal:
if the .pdf we create at the end of our digitization would look as good as this one finereader makes automatically, just from the scans, we could feel proud of ourselves...
Personally, I think our TEI -> PDF output looks a lot better. http://pglaf.org/~joshua/15775/15775-pdf.zip (The Rejuvenation of Aunt Mary) http://pglaf.org/~joshua/13945/13945-pdf.zip (Sunny Memories) http://pglaf.org/~joshua/15573/15573-pdf.zip (Judith of the Plains) http://pglaf.org/~joshua/15695/15695-pdf.zip ("Doc." Gordon) http://pglaf.org/~joshua/15796/15795-pdf.zip (Joy in the Morning) (If anyone is interest, remove the *-pdf.zip from the link and you'll see text and html renditions of the same books as well). Josh

Joshua Hutchinson wrote on 9/6/2005, 4:20 PM:
Bowerbird@aol.com wrote:
one of the options in finereader is to output a .pdf. i did that, and have uploaded that .pdf for your perusal:
if the .pdf we create at the end of our digitization would look as good as this one finereader makes automatically, just from the scans, we could feel proud of ourselves...
Personally, I think our TEI -> PDF output looks a lot better.
http://pglaf.org/~joshua/15775/15775-pdf.zip (The Rejuvenation of Aunt Mary) http://pglaf.org/~joshua/13945/13945-pdf.zip (Sunny Memories) http://pglaf.org/~joshua/15573/15573-pdf.zip (Judith of the Plains) http://pglaf.org/~joshua/15695/15695-pdf.zip ("Doc." Gordon) http://pglaf.org/~joshua/15796/15795-pdf.zip (Joy in the Morning)
(If anyone is interest, remove the *-pdf.zip from the link and you'll see text and html renditions of the same books as well).
Josh _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d
I LOVE the PDF output from the TEI files, it looks very good, Josh :) I have no complaints at all :P Jared
participants (4)
-
Bowerbird@aol.com
-
Jared Buck
-
Joshua Hutchinson
-
Marcello Perathoner