
oh yeah, one more thing before i return to my laboratory. for jon noring. or 2 things, actually. no, make that 3. first, jon, since you've been makin' some big noises about "my antonia", could you please make available a .zip file containing all of your image-scans and the o.c.r. output? i plan on using them in a nice little project of mine, and downloading the scans one at a time is a pain in the neck. second, since you regularly assert your insistence that markup must be "semantic" rather than "presentational", can you elucidate the structural aspects that typically should be marked up in books? that list would include things like chapter-headings, footnotes, block-quotes; and what else? would also be nice if you could say _how_ these things should be marked up, with actual examples, but since even the .tei experts can't seem to agree on it... third, over on the bookpeople list, john mark ockerbloom moderated out my replies to your late-december posts where you issued some "friendly challenges" to me; but let it be known that my replies accepted your challenges. i'll be creating a space soon where we can discuss them... -bowerbird

Bowerbird wrote:
first, jon, since you've been makin' some big noises about "my antonia", could you please make available a .zip file containing all of your image-scans and the o.c.r. output? i plan on using them in a nice little project of mine, and downloading the scans one at a time is a pain in the neck.
Good idea. Unfortunately I do not have OCR output, but I have the page scans. I'll zip up the 600 dpi 2-color (B&W) scans which have already gone through a clean-up stage (they will be PNG files, and occupy if memory serves me right, about 50 megs of space.) These should import nicely into an OCR program. If you don't have an OCR program, someone here may offer to do that for you. (Note that the page scans which are individually linked from the My Antonia online document were resampled from the 600 dpi 2-color scans to 120 dpi with greyscale antialiasing to improve legibility at lower resolutions -- the 120 dpi versions probably are not as good to use for OCRing.) Anyone?
second, since you regularly assert your insistence that markup must be "semantic" rather than "presentational", can you elucidate the structural aspects that typically should be marked up in books? that list would include things like chapter-headings, footnotes, block-quotes; and what else? would also be nice if you could say _how_ these things should be marked up, with actual examples, but since even the .tei experts can't seem to agree on it...
Also a very good suggestion. Remind me if I don't answer anytime soon. Got a lot of projects on my plate (and just got done with a several day project to upgrade the hardware, OS and software on my main computer.) Yes, the TEI people also disagree, but that's because the full vocabulary of TEI is quite extensive. When I talked with Charles last year on this topic, his vision at the time seemed to be that DP will settle upon a required base subset, maybe an extended subset that those who are interested can use but that's not required for basic support (e.g., including semantic information as to who speaks a particular quote, which can be marked up but is probably overkill for basic markup support.) I should probably make the inquiry over at the DP forums, but those working with DP who are familiar with DP's consideration of blessing a TEI subset for its master documents, let me know.
third, over on the bookpeople list, john mark ockerbloom moderated out my replies to your late-december posts where you issued some "friendly challenges" to me; but let it be known that my replies accepted your challenges. i'll be creating a space soon where we can discuss them...
Thanks. I look forward to it! (Really, I do.) Jon

Bowerbird asked:
first, jon, since you've been makin' some big noises about "my antonia", could you please make available a .zip file containing all of your image-scans and the o.c.r. output?
The 600 dpi bitonal page scans of My Antonia (as PNG, archived in ZIP) now available via: http://www.openreader.org/myantonia I encourage others to download the ZIP to preserve the page scans. But be forewarned the ZIP file is 49 megs in size. Using one of the CCITT bitonal compression algorithms it would be possible to do better with lossless compression, maybe 50% better than the currently used PNG. But virtually everyone can view PNG files, while those CCITT algorithms (usually encapsulated in TIFF) are oftentimes obscure. Jon

Btw, to the "My Antonia" beta page I've added an entry for "regularized" plain text, with one format in this category being Bowerbird's ZML. I have heard of a couple other systems being touted for regularized plain text, but none of them are being discussed in Project Gutenberg. Jon

Bowerbird@aol.com wrote:
second, since you regularly assert your insistence that markup must be "semantic" rather than "presentational", can you elucidate the structural aspects that typically should be marked up in books? that list would include things like chapter-headings, footnotes, block-quotes; and what else? would also be nice if you could say _how_ these things should be marked up, with actual examples, but since even the .tei experts can't seem to agree on it...
Hmm, I've yet to find a TEI "expert" that doesn't agree on the fundamental markups. <p></p> is a paragraph container. <head></head> for a divisional (chapter, section, part, etc) heading. <note place="foot"></note> for a footnote. * replace "foot" with "margin" or "endnote" as appropriate for other note markers. <quote rend="display"></quote> for a block quote. <figure url="file_name"></figure> for an inline illustration. *** The problems with TEI don't tend to lie in the markup, but rather in the conversion of said markup to a final presentation format. And usually then it is in markup that requires a bit of intelligence on the part of the rendering engine ... like complex tables, for instance. Josh
participants (3)
-
Bowerbird@aol.com
-
Jon Noring
-
Joshua Hutchinson