
[cc: Jose Menendez] Jeroen Hellingman wrote:
Although I agree with Michael that there is no need to preserve things as linebreaks in most texts -- if you really need to go to that level of detail, there is always the original or the scans to fall back upon -- I want to make a case for preserving page numbers, if not at least as recognisable anchors in text, and only for those books being referenced to regularly by other books.
First off, I agree with Bowerbird in the sense that it is a good thing to preserve both the line breaks and page breaks in the master marked- up texts converted from a source book. I assume with the DP work flow that this would not be that difficult of a thing to do, so why not do it if it could be done (mostly) automatically? For the OpenReader Publication Format, which is in an advanced stage of development, we're now putting together an OpenReader namespace set of elements to do various tasks. These elements may be used for all XML content documents which OpenReader now supports (an XHTML subset) and plans to support in the future (such as a subset of TEI). The namespaced elements include (attributes not described here): <or:hlink> ... </or:hlink> (simple hypertext linking) <or:object/> (embedding images, video and audio) <or:page/> (page break in a paper source) <or:lb/> (line break in a paper source) <or:marker/> (a generic marker) (both or:hlink and or:object will be defined using XLink.) With the permission of Jose Menendez, he is letting us use his copy of "My Antonia" (which is more accurate than the one I've been working on which hasn't yet been completely proofed), to put it into a demo of the OpenReader format. I've "diffed" it to my version and checked all differences found by consulting the original page scans, and it's been restored to the original 1918 edition (including textual errors -- the errors are specially marked however, including what the text should be based on both the Univ. of Nebraska online edition and Jose's edition), and have added precise line breaks and page breaks. For line breaks, I've placed the line breaks at the precise place of hyphenation. If the broken word does not have a natural hyphen, I use a (a soft hyphen) to indicate that -- if the broken word does have a natural hyphen at the break, the hard hyphen character "-" is used. Here's an example paragraph (the 63rd paragraph in the text) which includes a page break, soft and hard hyphens: **************************************************************************** <p id="p0063">The little girl was pretty, but Án-tonia —<or:lb/> <or:page id="page026"/>they accented the name thus, strongly, when<or:lb/> they spoke to her — was still prettier. I re<or:lb/>membered what the conductor had said about<or:lb/> her eyes. They were big and warm and full<or:lb/> of light, like the sun shining on brown pools<or:lb/> in the wood. Her skin was brown, too, and<or:lb/> in her cheeks she had a glow of rich, dark<or:lb/> color. Her brown hair was curly and wild-<or:lb/>looking. The little sister, whom they called<or:lb/> Yulka (Julka), was fair, and seemed mild and<or:lb/> obedient. While I stood awkwardly confront<or:lb/>ing the two girls, Krajiek came up from the<or:lb/> barn to see what was going on. With him was<or:lb/> another Shimerda son. Even from a distance<or:lb/> one could see that there was something strange<or:lb/> about this boy. As he approached us, he began<or:lb/> to make uncouth noises, and held up his hands<or:lb/> to show us his fingers, which were webbed to<or:lb/> the first knuckle, like a duck’s foot. When he<or:lb/> saw me draw back, he began to crow delight<or:lb/>edly, “Hoo, hoo-hoo, hoo-hoo!” like a rooster.<or:lb/> His mother scowled and said sternly, “Ma<or:lb/>rek!” then spoke rapidly to Krajiek in Bo<or:lb/>hemian.</p> ***************************************************************************** If the above is rendered in plain text preserving the line breaks (ignore the page break), we have: (since this is an ASCII text email, I've converted the A-acute in "Antonia" to a unaccented A, em-dashes to "--", and curly quotes/apostrophes to the straight varieties.) ***************************************************************************** The little girl was pretty, but An-tonia -- they accented the name thus, strongly, when they spoke to her -- was still prettier. I re- membered what the conductor had said about her eyes. They were big and warm and full of light, like the sun shining on brown pools in the wood. Her skin was brown, too, and in her cheeks she had a glow of rich, dark color. Her brown hair was curly and wild- looking. The little sister, whom they called Yulka (Julka), was fair, and seemed mild and obedient. While I stood awkwardly confront- ing the two girls, Krajiek came up from the barn to see what was going on. With him was another Shimerda son. Even from a distance one could see that there was something strange about this boy. As he approached us, he began to make uncouth noises, and held up his hands to show us his fingers, which were webbed to the first knuckle, like a duck's foot. When he saw me draw back, he began to crow delight- edly, "Hoo, hoo-hoo, hoo-hoo!" like a rooster. His mother scowled and said sternly, "Ma- rek!" then spoke rapidly to Krajiek in Bo- hemian. ***************************************************************************** Of course, comments welcome on the above! Jon Noring