Critique of automated conversion of PGTEI on PG website

Continuing my evalaution of Marcello's PGTEI setup on the gutenberg website (http://www.gutenberg.org/tei/)... I used the same Declaration of Independence file I used last week to comment on the XML markup itself. This time I'm converted that XML file to HTML and TEXT using the online services section. Below are the bulleted items that *I* believe need some improvement. If anyone wants to duplicate my conversions, see my post from last week that contained the XML I used (or send me a quick e-mail and I'll forward the file on). Josh *** HTML conversion items: 1 - First thing that jumps out is the need for bigger left and right margins. This is a simple CSS change. Currently, DP has *mostly* standardized on 10% margins on the left and right. This gives some nice white space for easier reading and gives room for things like original source page numbers and sidenotes to be put in the margin area. 2 - If the author field is left blank, the conversion shouldn't put a "by" out there all by itself. Both the HTML and the TEXT version have this dangling word. 3 - The publication and edition date are both being printed, but it isn't clear which is which. Maybe put "Original publication date:" label before the date itself? 4 - Since the title, author, etc. is already list in the first few lines, the second listing below the gutenberg disclaimer line is redundant. Also, in that same spot, the language code is printed, which is nice, but I would suggest changing the format slightly. Namely, put the language code in brackets after the written out language. i.e.: English-United States [en-us] For most of us normal humans, the language codes are not intuitive. 5 - In the CONTENTS section, if there are no footnotes/endnotes, don't list a NOTES section. 6 - Use standard HTML paragraph spacing. Right now, the CSS specifies no blank line between paragraphs and an indent to the beginning of each paragraph. While this matches the original paper source, for me at least, it is jarring to read on a computer screen. This type of formatting would make perfect sense in the PDF conversion, since that one is geared for printing on paper. 7 - Need a horizontal rule (75% width seems right to me) between the CONTENTS section and the first section of the text. Right now, they run together. 8 - Need horizontal rule between major divisions of the text. Currently, the large type header gives a visual indication, but I don't believe it is enough. 9 - No need for the extra horizontal rule to mark off the FOOTNOTES section if there is no footnote section in that text. Currently, this situation makes for two horizontal rules in a row in a text with no footnotes. *** TEXT conversion items: 1 - It lists "The Project Gutenberg EBook of" twice. 2 - Has a dangling "by" line even when no author is specified. 3 - Same redundant title/author info as in the HTML conversion. 4 - Notes section appears whether there are any footnotes or not.

Joshua Hutchinson wrote:
1 - First thing that jumps out is the need for bigger left and right margins. This is a simple CSS change. Currently, DP has *mostly* standardized on 10% margins on the left and right. This gives some nice white space for easier reading and gives room for things like original source page numbers and sidenotes to be put in the margin area.
OTOH I like to read texts in a small (horizontally) browser window so I can put a shell window and the browser window on one screen. The shell is usually compiling something or doing boring work. If the shell stumbles over something I can immediately switch over, correct and switch back to my reading. Big margins in the browser window would definitely be a major annoyance. I think, the CSS provided is just an example. Everybody here has enough skills to build a CSS he/she likes. For the end user we may consider an "alternate stylesheet" model where she may switch between a set of predefined ones.
6 - Use standard HTML paragraph spacing.
Same as above.
7 - Need a horizontal rule (75% width seems right to me) between the CONTENTS section and the first section of the text. Right now, they run together.
8 - Need horizontal rule between major divisions of the text. Currently, the large type header gives a visual indication, but I don't believe it is enough.
Use the rend="newpage" or rend="newdoublepage" attribute on a div, front, back element like eg.: <div rend="newpage" type="chapter"> This will start a new page on paginated media and put a rule on HTML. -- Marcello Perathoner webmaster@gutenberg.org

Marcello wrote:
Joshua Hutchinson wrote:
1 - First thing that jumps out is the need for bigger left and right margins. This is a simple CSS change. Currently, DP has *mostly* standardized on 10% margins on the left and right. This gives some nice white space for easier reading and gives room for things like original source page numbers and sidenotes to be put in the margin area.
... I think, the CSS provided is just an example. Everybody here has enough skills to build a CSS he/she likes. For the end user we may consider an "alternate stylesheet" model where she may switch between a set of predefined ones.
The beauty of transforming "standardized" TEI documents into XHTML [see note at end] is that, when done right (with no presentational markup), the XHTML for all the documents will itself be uniform and standardized, thus amenable to swappable CSS style sheets which can be applied to almost the whole collection, if not all of it. Of course, the documents will also be reasonably accessible since accessibility is enhanced by this approach. A favorite site of mine which demonstrates the power of swappable CSS is "CSS Zen Garden", http://www.csszengarden.com/ , which essentially uses the same, high quality (and accessible) document, and invites anyone to submit their own CSS style sheet -- hundreds of style sheets have been submitted so far from many web designers/artists/enthusiasts. It's amazing to see the variation of complex styling which can be applied to such a simple document (try viewing the base document without CSS -- images are separate from the document and also swappable in CSS Zen Garden.) Certainly, how PG would enable style sheet swapping may be different than how CSS Zen Garden does it, but that's beside the point. The important point is that it can be done, and will be an exciting addition to PG by allowing readers to "have it their way" rather than "having it our way." We will not have to argue on whether we want 10% or 20% margins, etc. This will also entice many to submit their own CSS designs for people to use. But it all starts with the Master markup being done *right*. Jon Noring [Note referenced above: This indicates that there should be NO presentational markup in the source TEI-conforming documents -- to take a pure structural/semantic approach to markup. About XHTML, the documents spit out from XSLT should be XHTML 1.1, or at least the content markup itself between <body>...</body> be valid to XHTML 1.1. I suppose we could also offer a "legacy", pre-styled, non-CSS HTML for those running really old and crusty, non-CSS browsers.]
participants (3)
-
Jon Noring
-
Joshua Hutchinson
-
Marcello Perathoner