
One could make the argument that the paragraph and perhaps the chapter are useful tags. Poetry and sidenotes and footnotes seem fairly established in PG without attitional tagging. Also will we be scanning 20 editions of Dickens, all with different line breaks and page numbers? nwolcott2@post.harvard.edu ----- Original Message ----- From: "Brad Collins" <brad@chenla.org> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@pglaf.org> Sent: Saturday, June 24, 2006 10:06 PM Subject: Re: [gutvol-d] the end of the line Marcello Perathoner <marcello@perathoner.de> writes:
My advice is: forget entirely about line breaks. They are random artefacts introduced by the person operating the typesetting machine and indirectly by the person who chose paper size and font. They have no raison d'ĂȘtre once you separate the ebook from the scans, ie. after it left DP. (That this suggestion was by "You Know Who" should have tipped you off immediately.)
I agree. Before encoding a text you have to decide if you are encoding the expression of the text or the manifestation of the text.[1] Marking up an expression is the structure and text of the text. This is what the author has created and has handed over to a publisher. Marking up a manifestation is all about layout and presentation. This is the realm of the publisher and this is where you get into fonts, line breaks etc. You can easily mark up a text as either one or the other, but it's not practical to try to do both in the same markup. There are a few examples of texts and manuscripts which would be worth having an expression level markup and a second manifestation markup, but these will be rare. I seriously doubt that any manifestation of Willa Cather's work would fall into this catagory :) Dead tree books fix a manifestation into a permanent arrangement. Electronic manifestations, which use systems like CSS to mold the manifestation to the moment and to the device on the fly, are liquid, if you try to hold them in your hand it just escapes through your fingers. The world of print books puts the publisher, and the manifestation at the center. The manifestation is more important than the author who has takes a back seat to the glorious manifestation that was made of the expression of her work. But when copying and distribution is for all practical purposes free and the manifestation has been reduced to an algorithm which an electronic reader interprets, the manifestation itself takes a back seat to the expression. The Age of the manifestation and the publisher is drawing to an end and we are slowly seeing the emergence of the Age of the expression and the author. PG is well named. Gutenberg's press was the first instance of fixing a manifestation so that millions of identical copies could be made. Before Gutenberg, each copy of a text was a different manifestation. Being able to make error free copies was a revolution, but came at the expense of easily being able to mold manifestations for different uses and environments. But you can make exact copy of an electronic text without it depending on any one manifestation of it. This is just as significant as Gutenburg's press. Is it useful to include some information from some manifestations in an expression level markup? Damn yes -- page breaks are the anchor and hyperlink in the world of paper. Countless millions of references to page numbers have been made over the last two centuries. Preserving page breaks is an essential part of preserving all those references which use them. So if you want to create a markup of a text which preserves a specific manifestation that's fine, there are whole sections of TEI devoted to allowing you to pick the tiniest bit of navel lint and preserving it for eternity. But for most purposes page scans of the original manifestation will provide enough of this information for most questions about a text, as well as provide the source material for the lint pickers to encode away to their heart's content for specific manifestations. But electronic books will mostly be in the business of preserving the expression of a work which can then be converted into other markup languages like XML or OR for dynamically generating flexible, ephemeral manifestations on the fly. b/ Footnotes: [1] I am using work, expression and manifestation as defined in the FRBR (Functional Requirements for Bibiographic Records). work :: the concept representing an intellectual or creative creation. expression :: includes the specific sequence of words, images and structure of work. manifestation :: includes the specific layout, typography, pagination etc of a specific expression. -- Brad Collins <brad@chenla.org>, Banqwao, Thailand _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d