
Bowerbird wrote:
first, jon, since you've been makin' some big noises about "my antonia", could you please make available a .zip file containing all of your image-scans and the o.c.r. output? i plan on using them in a nice little project of mine, and downloading the scans one at a time is a pain in the neck.
Good idea. Unfortunately I do not have OCR output, but I have the page scans. I'll zip up the 600 dpi 2-color (B&W) scans which have already gone through a clean-up stage (they will be PNG files, and occupy if memory serves me right, about 50 megs of space.) These should import nicely into an OCR program. If you don't have an OCR program, someone here may offer to do that for you. (Note that the page scans which are individually linked from the My Antonia online document were resampled from the 600 dpi 2-color scans to 120 dpi with greyscale antialiasing to improve legibility at lower resolutions -- the 120 dpi versions probably are not as good to use for OCRing.) Anyone?
second, since you regularly assert your insistence that markup must be "semantic" rather than "presentational", can you elucidate the structural aspects that typically should be marked up in books? that list would include things like chapter-headings, footnotes, block-quotes; and what else? would also be nice if you could say _how_ these things should be marked up, with actual examples, but since even the .tei experts can't seem to agree on it...
Also a very good suggestion. Remind me if I don't answer anytime soon. Got a lot of projects on my plate (and just got done with a several day project to upgrade the hardware, OS and software on my main computer.) Yes, the TEI people also disagree, but that's because the full vocabulary of TEI is quite extensive. When I talked with Charles last year on this topic, his vision at the time seemed to be that DP will settle upon a required base subset, maybe an extended subset that those who are interested can use but that's not required for basic support (e.g., including semantic information as to who speaks a particular quote, which can be marked up but is probably overkill for basic markup support.) I should probably make the inquiry over at the DP forums, but those working with DP who are familiar with DP's consideration of blessing a TEI subset for its master documents, let me know.
third, over on the bookpeople list, john mark ockerbloom moderated out my replies to your late-december posts where you issued some "friendly challenges" to me; but let it be known that my replies accepted your challenges. i'll be creating a space soon where we can discuss them...
Thanks. I look forward to it! (Really, I do.) Jon