Re: [gutvol-d] PGTEI and more

newer
Re: [gutvol-d] PGTEI and more

older
on the question of sidenotes,...

Joshua Hutchinson

28 Oct 2004 28 Oct '04

4:25 p.m.

----- Original Message ----- From: Scott Lawton <scott_bulkmail@productarchitect.com>

...

My feedback on PGTEI is too long for email, so I posted it here: http://Classicosm.com/xml/feedbackonpgtei.html

Feedback welcome!!!

quote is used in an example but apparently isn't part of TEI Lite (it's not in link_outAppendix A). What's the story? It is part of the full TEI spec. Thanks for pointing it out. I meant to have it in my test.xml, but I forgot. The test.xml should have <quote rend="display"> for blockquotes (and will on the next update.) TEI-Lite is the starting point, but we will probably pull in other stuff from the full spec where we need it. ** q: in cases where the quotation marks don't balance, it may be difficult to automatically convert quotation marks to the appropriate q.../q form, and time consuming to manually proof. Accordingly, I suggest this step be left as optional. I actually agree here. I prefer using " instead of <q>. Can any of the experts explain why this is a "bad idea"? ** pgHeader looks like it's contains information that should be described in teiHeader (though I'm new to TEI so may be wrong). alice.tei and lmiss.tei both contain pgHeader; the generated PGTEI does not. Assuming I understand this part right ... The teiHeader contains all the information. pgHeader is the call out to the part that takes the info in teiHeader and formats into a standard display header when you convert to HTML or TEXT. Marcello is probably the guy to explain it more fully. ** Having separate index tags for TOC, PDF and PDB strikes me as unnecessary and prone to error. Shouldn't the TOC one suffice for all? In fact, the tag itself seems redundant. Shouldn't the head itself suffice? (If TEI requires it, that's another example of where I think TEI is too complex.) Well, the reason they are separate is for the occasion where you have a header, but you don't want that header to appear in the Tabel of Contents. HTML requires an anchor and <h1> markup both ... this is the TEI equivalent. As for the multiple index entries, I wondered about the need myself, but I haven't gotten around to asking Marcello about it (or digging through documentation to try to understand the need). ** alice.tei: reg="Carroll, Lewis" should use the complete "authority" form, which I believe is "Carroll, Lewis, 1832-1898". Note that unlike the PG website, there are no parens around the dates. Here's an illustration of paren usage: "Baum, L. Frank (Lyman Frank), 1856-1919". I'm hoping consistency in format will be achieved when we have 1) some examples in place and 2) a web form for generating the, admittedly confusing, teiHeader section. ** There appear to be two validation errors, e.g. in the link_outPGTEI documentation: Error (7/117): <SPAN> must not contain block level elements like <H1>. Error (379/1): The start tag for </P> can't be found. Marcello knows about these and they will be fixed. ** In the documentation, why is "Versprich mir, Heinrich" repeated in the output, the second time in white? This one confused me for a minute, too... Then I realized, it is the only way a HTML browser will be able to space over the right amount. In effect, Marcello is trying to make the text invisible. There may be a better way to hide the spacing text, but I haven't given it much thought yet. It works now, if not in an "elegant solution" manner. ** The lack of space between paragraphs goes against Web conventions. (It's fine as an option but a poor choice for the default.) Agreed. I promise it will be changed. ** Thanks again for your analysis! Josh

Show replies by date

bkeir＠pgdp.net

29 Oct 29 Oct

3:49 a.m.

New subject: PGTEI and more

...

q: in cases where the quotation marks don't balance, it may be difficult to automatically convert quotation marks to the appropriate q.../q form, and time consuming to manually proof. Accordingly, I suggest this step be left as optional.

I actually agree here. I prefer using " instead of <q>. Can any of the experts explain why this is a "bad idea"?

Presumably <q> </q> is meant to top and tail a quotation, making it possible to extract quotations from within a work if desired. However I'd be worried about going to <q> because of the possible ambiguities in quotations of multiple paragraphs, and the dangers of these being retransformed to " incorrectly for the text versions. "We often find at DP that people brought up on reading only contemporary works, which rarely quote several paragraphs at a time, incorrectly expect that each paragraph of a quotation needs a closing quote mark. "People who have read a lot of 19th century books are well aware that correct usage is that while each paragraph in a quoted passage starts with a quotation mark, only the final paragraph in a quoted passage gets a closing one. "Like this."

Steve Thomas

30 Oct 30 Oct

7:29 a.m.

New subject: PGTEI and more

Joshua Hutchinson wrote:

...

quote is used in an example but apparently isn't part of TEI Lite (it's not in link_outAppendix A). What's the story?

The common advice seems to be to use <q> to enclose quoted speech *inline*, and use <quote> for quoting larger blocks of text. The P4 TEI manual was a bit vague on this, but that seems to be a sensible convention worth using.

...

It is part of the full TEI spec. Thanks for pointing it out. I meant to have it in my test.xml, but I forgot. The test.xml should have <quote rend="display"> for blockquotes (and will on the next update.)

As I understand this (from an earlier post), 'rend="display"' is supposed to mean that the block should be indented (rather like the HTML blockquote). This seems like a very poor choice of terms to me. CSS has a "display" property, which can take values such as "inline", "block", and -- crucially -- "none". "display:none" is used where you don't want the content displayed at all. So using this rend="display" seems likely to result in confusion. In any case, the choice is poor because it does not convey the information desired. If you use <quote> on its own without rend="display", does that indicate you don't want to display the content? Or that you don't want to indent it? I personally don't see any need to use rend here. If you are quoting a passage from some other work, then enclose it in <quote> .. </quote>. That's enough. When someone comes to present this (e.g. in an HTML version), the most natural thing would be to convert the tag to blockquote. The rend is redundant.

...

q: in cases where the quotation marks don't balance, it may be difficult to automatically convert quotation marks to the appropriate q.../q form, and time consuming to manually proof. Accordingly, I suggest this step be left as optional.

I actually agree here. I prefer using " instead of <q>. Can any of the experts explain why this is a "bad idea"?

This was thrashed out at great length almost a year ago. Basically, while purists will see enormous merit in using <q> instead of quote marks, the practical approach is to stick with the quote marks, due to reasons outlined by another poster. (The terminating quote question with muli-paragraph quotes.) There's also nothing *wrong* with using this: <q>"Hello,"</q> she said. at least it's not disallowed in TEI. I believe there's a place in the TEI header to indicate which practice you are using in the text. -- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/

Marcello Perathoner

10 a.m.

New subject: PGTEI and more

Steve Thomas wrote:

...

The common advice seems to be to use <q> to enclose quoted speech *inline*, and use <quote> for quoting larger blocks of text. The P4 TEI manual was a bit vague on this, but that seems to be a sensible convention worth using.

That would be presentational markup and very against the TEI specs. The specs are very detailed on this: 6.3.3 Quotation This section discusses the following elements, all of which are often rendered by the use of quotation marks: * <q> contains a quotation or apparent quotation — a representation of speech or thought marked as being quoted from someone else (whether in fact quoted or not); in narrative, the words are usually those of of a character or speaker; in dictionaries, q may be used to mark real or contrived examples of usage. * <quote> contains a phrase or passage attributed by the narrator or author to some agency external to the text. * <cit> A quotation from some other document, together with a bibliographic reference to its source. * <soCalled> contains a word or phrase for which the author or narrator indicates a disclaiming of responsibility, for example by the use of scare quotes or italics. One form of presentational variation found particularly frequently in written and printed texts is the use of quotation marks. As with the typographic variations discussed in the preceding section, it is generally helpful to separate the encoding of the underlying textual feature (for example, a quotation or a piece of direct speech) from the encoding of its rendering (for example, the use of a particular style of quotation marks). The most common and important use of quotation marks is, of course, to mark quotation, by which we mean simply any part of the text attributed by the author or narrator to some agency other than the narrative voice. Typical examples include passages cited from other works, for which the element <quote> may be used, and words or phrases attributed to other voices within the current work, for which the element <q> may be used. If this distinction between intra-textual and inter-textual voices cannot be made reliably, or is not of interest, then all quoted matter may simply be marked using the <q> tag. The editorial policy in this respect should be stated in the encoding description of the TEI Header. The <soCalled> element is used for cases where the author or narrator distances him or herself from the words in question without however attributing them to any other voice in particular. http://www.tei-c.org/P4X/CO.html#COHQQ

...

As I understand this (from an earlier post), 'rend="display"' is supposed to mean that the block should be indented (rather like the HTML blockquote).

This seems like a very poor choice of terms to me. CSS has a "display" property, which can take values such as "inline", "block", and -- crucially -- "none". "display:none" is used where you don't want the content displayed at all.

So using this rend="display" seems likely to result in confusion.

In any case, the choice is poor because it does not convey the information desired. If you use <quote> on its own without rend="display", does that indicate you don't want to display the content? Or that you don't want to indent it?

"These Guidelines make no binding recommendations for the values of the rend attribute; the characteristics of visual presentation vary too much from text to text and the decision to record or ignore individual characteristics varies too much from project to project. Some potentially useful conventions are noted from time to time at appropriate points in the Guidelines." -- http://www.tei-c.org/P4X/ref-GLOBAL.html Thus we are perfectly right in making up a convention of our own. But TEI is not CSS. Although CSS and the rend attribute are both purely presentational we should not mix TEI and CSS conventions. The "display" choice may be poor but it is exactly the same choice Sebastian Rahtz made in his stylesheets. Look at the code in: http://www.tei-c.org/Stylesheets/P4/html/teihtml-misc.xsl While not dictated by TEI specs, using rend="display" makes our convention compatible with Sebastian's stylesheets. Also, using <q rend="block"> would be a still poorer choice because the rend attribute is global and can be used on all TEI elements. <div rend="block"> is perfectly valid TEI and it would be quite counter-intuitive to have it set a display margin around the block, whereas <div rend="display"> makes quite clear what you want.

...

This was thrashed out at great length almost a year ago. Basically, while purists will see enormous merit in using <q> instead of quote marks, the practical approach is to stick with the quote marks, due to reasons outlined by another poster. (The terminating quote question with muli-paragraph quotes.)

Using <q> has advantages: - automatically finds quotation mark errors - renderer can use prettiest quote in output format, eg. plain ugly apostrophe in TXT and pretty typografical quotes in PDF. - automatically extract quotes from text and disadvantages: - more work The argument about the terminating quote character in multi-paragraph quotes is moot since there is a way to deal with it: <p>He said: <q rend="pre">Blah.</q></p> <p><q>And blah.</q></p> -- Marcello Perathoner webmaster@gutenberg.org

Carlo Traverso

10:34 a.m.

New subject: PGTEI and more

...

...
...
...
...
"Marcello" == Marcello Perathoner <marcello@perathoner.de> writes:

Marcello> Steve Thomas wrote: >> The common advice seems to be to use <q> to enclose quoted >> speech *inline*, and use <quote> for quoting larger blocks of >> text. The P4 TEI manual was a bit vague on this, but that seems >> to be a sensible convention worth using. Marcello> That would be presentational markup and very against the Marcello> TEI specs. The specs are very detailed on this: If TEI has to be used only semantically, then it is inadequate for PG needs. PG markup has to contain presentational elements, in such a way that one can obtain presentations "faithful to the original". A PG-TEI encoded text should allow to call a transform to a presentation form with an "original" formatting specification, allowing to recover whatever was in the original, (as well as other specifications allowing to change it). This might include, (referring to quotations), the possibility of rendering a quoted section with running quotation marks at the start of each line. One should never forget that presentation IS semantic: this is evident with heavily formatted poetry, (Mallarme's "Un coup de des jamais n'abolira le hazard" is a quite extreme case) but in some form or another it is always true. Carlo

Marcello Perathoner

11:49 a.m.

New subject: PGTEI and more

Carlo Traverso wrote:

...

Marcello> Steve Thomas wrote:

>> The common advice seems to be to use <q> to enclose quoted >> speech *inline*, and use <quote> for quoting larger blocks of >> text. The P4 TEI manual was a bit vague on this, but that seems >> to be a sensible convention worth using.

Marcello> That would be presentational markup and very against the Marcello> TEI specs. The specs are very detailed on this:

If TEI has to be used only semantically, then it is inadequate for PG needs. PG markup has to contain presentational elements, in such a way that one can obtain presentations "faithful to the original".

I didn't say that. I said that using <q> and <quote> to markup inline and block quotes respectively was wrong. In TEI all of the presentational stuff should be done with the rend attribute. <q rend="display"> As to the "faithful to the original" debate: Most people are far too much enamoured of exactly replicating the one edition of the text they happen to work on. (I can understand people wanting to faithfully replicate a Shakespeare First Folio, but not the books PG usually produces.) Most of the presentational attributes of any edition of a text are just whims of the publisher. Who cares if the authors name was printed in Zapf Chancery Slanted 17,4 pt gold embossed with 0.1em of extra inter-character spacing added? If you get a different edition of the same work the authors name will be printed in a very different font. The best guess is to just encode that this is the authors name.

...

One should never forget that presentation IS semantic: this is evident with heavily formatted poetry, (Mallarme's "Un coup de des jamais n'abolira le hazard" is a quite extreme case) but in some form or another it is always true.

That is a half truth at the best. Presentation encodes semantics, but it is a lossy encoding. The same presentational attribute "italics" can encode a wide range of semantic features like "emphasis", "foreign word", "name", etc. If presentation could losslessly encode semantics, and an accepted standard existed how to do this, a program could recover the semantics from the presentation and mark up a text all by itself. But then, if a program can guess, why mark up at all? This is Bowerbirds ZML approach. What Bowerbird does not understand is that there are far too many semantic features to make a presentational encoding reversible. (Technically Bowerbird is farther off the rocker still: he says that ASCII TXT can encode all semantics in the world, which is even sillier than to say that typography can.) Mathematically speaking: Let PRE be the set of all presentational attributes that can reasonably be distinguished by human eye, and SEM be the set of all semantics. Then there is no bijective function PRE = f (SEM) Thus we can say "presentation hints at semantics" but not "presentation IS semantic". -- Marcello Perathoner webmaster@gutenberg.org

Brad Collins

1:41 p.m.

New subject: PGTEI and more

Marcello Perathoner <marcello@perathoner.de> writes:

...

Carlo Traverso wrote:

...
Marcello> Steve Thomas wrote: >> The common advice seems to be to use <q> to enclose quoted >> speech *inline*, and use <quote> for quoting larger blocks of >> text. The P4 TEI manual was a bit vague on this, but that seems >> to be a sensible convention worth using. Marcello> That would be presentational markup and very against the Marcello> TEI specs. The specs are very detailed on this: If TEI has to be used only semantically, then it is inadequate for PG needs. PG markup has to contain presentational elements, in such a way that one can obtain presentations "faithful to the original".

Marcello of course is completely correct, but that doesn't mean that Steve is wrong.... Einstein didn't invalidate Newton, he refined Newton. That's how progressive passes of markup should work. A lot of people are coming to TEI from an HTML background. It's the 'ol when the only tool you have is a hammer, everything begins to look like a nail. And in a way, as a general rule you could say that <q> is for inline and <quote> is for block quotes. And many times you'd be right, even though many times it would be for the wrong reasons. Steve has voiced a sort of first-pass, rule of thumb. It's a bit like the <hi> tag in TEI which isn't terribly semantic when used as a first pass general markup tag. I would love to see a defined first-pass set of markup tags which would be as easy as HTML to learn and apply. This would help enormously in early stages of markup which could then be done by folks who haven't spent long lonely hours pouring over the TEI manual and then testing chunks of code in nxml-mode (an XML editing mode in Emacs). b/ Who is bloody thankful the sun just went down after a blistering day in the big shitty.... sometimes I wish I could afford air-con. And the hot season is still yet to come. -- Brad Collins <brad@chenla.org>, Bangkok, Thailand

Jon Noring

2:57 p.m.

New subject: PGTEI and more

Carlo wrote:

...

Marcello wrote

...
Steve Thomas wrote:

...

...
...
The common advice seems to be to use <q> to enclose quoted speech *inline*, and use <quote> for quoting larger blocks of text. The P4 TEI manual was a bit vague on this, but that seems to be a sensible convention worth using.

...

...
That would be presentational markup and very against the TEI specs. The specs are very detailed on this:

...

If TEI has to be used only semantically, then it is inadequate for PG needs. PG markup has to contain presentational elements, in such a way that one can obtain presentations "faithful to the original".

Is this a requirement that it be possible *without some manual work* to regenerate the typographic layout of the source document? And what impact does this attempt to be 'faithful to the original' have on accessibility and non-visual uses of the PG texts?

...

A PG-TEI encoded text should allow to call a transform to a presentation form with an "original" formatting specification, allowing to recover whatever was in the original, (as well as other specifications allowing to change it). This might include, (referring to quotations), the possibility of rendering a quoted section with running quotation marks at the start of each line.

This implies, for example, that "long-s" characters, common in pre-19th century English texts, should be preserved (e.g., use the Unicode character equivalent). For modern usage someone can later transform all Unicode "long-s" characters to the ordinary "s". But to do it the other way around is more difficult. (Yes, a special character is not usually a "presentation" issue, but in this case it has become a modern presentation issue.)

...

One should never forget that presentation IS semantic: this is evident with heavily formatted poetry, (Mallarme's "Un coup de des jamais n'abolira le hazard" is a quite extreme case) but in some form or another it is always true.

I disagree with this in a general sense. Presentation is most used to communicate document structure and sometimes the semantics of particular chunks of content (e.g., "this is a foreign phrase".) In a few cases visual layout becomes part of content itself ("poetry as visual art"). In these rare cases I believe that SVG should be used since there are facilities in SVG for accessibility, and SVG will truly get it exactly right all the time. SVG is XML-based, too. Jon Noring

Greg Newby

6:20 p.m.

New subject: PGTEI and more

On Sat, Oct 30, 2004 at 08:57:30AM -0600, Jon Noring wrote:

...

Carlo wrote:

...
If TEI has to be used only semantically, then it is inadequate for PG needs. PG markup has to contain presentational elements, in such a way that one can obtain presentations "faithful to the original".

Is this a requirement that it be possible *without some manual work* to regenerate the typographic layout of the source document?

I have not heard this as a requirement. Of course, some eBook producers might believe it's valuable, and they are welcome to prepare their work to be "typographyically correct" (whatever that might mean to them). However, it *is* a requirement to automatically regenerate plain text and HTML (perhaps other formats as desired) from the XML. -- Greg

Jon Noring

6:43 p.m.

New subject: PGTEI and more

Greg Newby wrote:

...

Jon Noring wrote:

...
Carlo wrote:

...

...
...
If TEI has to be used only semantically, then it is inadequate for PG needs. PG markup has to contain presentational elements, in such a way that one can obtain presentations "faithful to the original".

...

...
Is this a requirement that it be possible *without some manual work* to regenerate the typographic layout of the source document?

...

I have not heard this as a requirement. Of course, some eBook producers might believe it's valuable, and they are welcome to prepare their work to be "typographyically correct" (whatever that might mean to them).

My question was more rhetorical rather than inquisitive. The discussion shows there's different views on the issue of what we preserve, in a presentational sense, of the original source document. For me, only rarely must the typographic layout be reproduced in some manner (such as "poetry as visual art" and a few other rarities as have been brought out here.) And for this, I recommend using SVG rather than trying to use presentational markup plus CSS to effect the desired result in the digital text version. I've previously commented on eschewing tabs and spaces for poetry/verse used to preserve visual indentation (my view is to use structural or semantic markup instead -- and where poetry moves into the visual art realm, then use SVG.) Whether to preserve the "long-s" or not is more problematic, since where do we draw the line? For example, if we have an old Russian text, do we transliterate the character set to Latin? Of course we don't. Isn't the use of a "long-s" part of a variant character used at the time of publication? It is easy to auto-convert the Unicode equivalent of the 'long-s' character to an ordinary 's' (as it is for the German ess-tsett), but going the other way is much more difficult.

...

However, it *is* a requirement to automatically regenerate plain text and HTML (perhaps other formats as desired) from the XML.

Definitely! Both repurposeability and accessibility is vital. My view is that, as much as possible, make the final master digital content as agnostic with respect to presentation type as possible. And in the rare instances this is not possible, then use SVG, which when done right allows much better accessibility and repurposeability. If enough agree here, we might want to begin discussing how to integrate islands of SVG within the TEI framework. Jon Noring

Scott Lawton

10:33 p.m.

New subject: capture original presentation?

I've taken the liberty of starting a new thread since I think this issue is important. It's clear that some people (myself included) would like to capture more information about presentation than would be done if the goal were ONLY semantic markup. It's fine if others don't place any or much importance on that goal, but I hope they will still contribute TEI/markup knowledge so that this choice is supported. Here, I think we can have our cake and eat it too.

...

Jon Noring typed:

My view is that, as much as possible, make the final master digital content as agnostic with respect to presentation type as possible. And in the rare instances this is not possible, then use SVG

I think there's a better middle ground here. Yes, SVG is useful in "extreme" cases, but I don't think it addresses the primary use case. My suggestion is that structural markup is *required*, and additional presentational markup is *optional*. For those who want an agnostic master file, just ignore the presentational markup -- i.e. we have to design the XML so that the presentation is clearly distinct from structure. Paraphrasing what Brad said to me in an earlier thread, the "rend" attribute describes the original presentation but doesn't enforce any specific output presentation. Here's an example where SVG is clearly overkill: --Introduction-- 1. The Cyclone 2. The Council with the Munchkins Structural markup and regeneration would yield: Introduction 1. The Cyclone 2. The Council with the Munchkins That's perfectly reasonable, and may suffice for most people. I just want there to be a way for those who think it's worth the effort to capture the former presentation in the master file. For example: <head index=" --Introduction--">Introduction</head> Or, using Marcelo's index tag: <index index="toc" level1=" --Introduction--" /> <head>Introduction</head> (In both cases, -- should probably be — and there may be a better solution than hardcoding the leading spaces.) -- Cheers, Scott S. Lawton http://Classicosm.com/ - classic books http://ProductArchitect.com/ - consulting

Greg Newby

31 Oct 31 Oct

12:57 a.m.

New subject: capture original presentation?

On Sat, Oct 30, 2004 at 06:33:03PM -0400, Scott Lawton wrote:

...

I've taken the liberty of starting a new thread since I think this issue is important.

It's clear that some people (myself included) would like to capture more information about presentation than would be done if the goal were ONLY semantic markup.

Just a quick note related to this, and my apologies if it turned up in the thread already and I missed it: We're planning to include the scanned page images along with eBooks. In fact, this is part of the intent with the new directory structure for the PG servers (the /1/0/8/0/... structure). We haven't done any (or many, anyway) because we're still trying to figure out how to best name the page files, and how to link them on a page-by-page basis into the (marked up?) eBooks. Jim Tinsley drafted some general guidelines for the image files themselves, but linking them to the eBooks is something we need to figure out still. (BTW, the Million Books project at archive.org uses djvu for this purpose. It's not bad, but I like our intended solution of XML markup much better. Plus, of course, the MBP is mostly working with relatively poor quality proofreading. For PG, the text has taken the main emphasis, not the appearance.) My notion is that the PGTEI and TEI lite solutions I've been reading about in this list will be easily adaptable to including links to specific page image files, so I've not mentioned it until now. But since it's related to your desire for preservation of the actual appearance of the scanned page, I figured I'd type it up now. That accomplished, please continue with your further thoughts - preserving appearance is definitely something that is frequently desired. -- Greg

Jeroen Hellingman

30 Oct 30 Oct

11:47 a.m.

New subject: PGTEI and more

Marcello Perathoner wrote:

...

Steve Thomas wrote:

...
The common advice seems to be to use <q> to enclose quoted speech *inline*, and use <quote> for quoting larger blocks of text. The P4 TEI manual was a bit vague on this, but that seems to be a sensible convention worth using.

That would be presentational markup and very against the TEI specs. The specs are very detailed on this:

I do not agree with this, especially not in the context of pre-existing books, for a number of reasons. 0. TEI is highly flexible, and prescribes fairly little. You choose what elements you wish to mark up and which not. 1. Quotations do not nest well with paragraphs. TEI (or XML) do not provide mechanisms to properly represent overlapping hierarchies. Older books can be quite difficult to mark up this way, as closing marks are often missing, etc. (I can provide examples) 2. Quotation marks can be considered part of the content, and thus should be retained. Adding <q> elements to these parts is fully optional, and I would only provide these if I have a good reason to do so, as indicated in Marcello's mail. (and I would add, if you would like to create an aural style sheet, and have parts spoken by different voices, they also make sense, just as providing expantions of abbreviations, etc.!) 3. Adding <q> to all quotations (even with help of a script) is labour intensive, and adds little value.

...

The argument about the terminating quote character in multi-paragraph quotes is moot since there is a way to deal with it:

<p>He said: <q rend="pre">Blah.</q></p>

<p><q>And blah.</q></p>

And you will need a very smart renderer to correctly supply them, leave the quotation marks intact (inside or outside the <q>) or provide cumbersome rend attributes.

...

Marcello Perathoner

12:35 p.m.

New subject: PGTEI and more

Jeroen Hellingman wrote:

...

0. TEI is highly flexible, and prescribes fairly little. You choose what elements you wish to mark up and which not.

Yes. But *if* you mark up you have to use the right element. Using <quote> for all displayed quotes is wrong.

...

1. Quotations do not nest well with paragraphs. TEI (or XML) do not provide mechanisms to properly represent overlapping hierarchies. Older books can be quite difficult to mark up this way, as closing marks are often missing, etc. (I can provide examples)

I have marked up a lot of books with multi-paragraph quotations. I also have a script that replaces most quotation signs with <q> </q> and even gets <q rend="pre"> right most of the time. I found a lot of quotation mark errors in PG texts this way.

...

2. Quotation marks can be considered part of the content, and thus should be retained. Adding <q> elements to these parts is fully optional, and I would only provide these if I have a good reason to do so, as indicated in Marcello's mail. (and I would add, if you would like to create an aural style sheet, and have parts spoken by different voices, they also make sense, just as providing expantions of abbreviations, etc.!)

1. Quotation marks are just presentational markup for "this is a quote", no more than italic is presentational markup for "this is emphasized". You should retain the underlying semantic feature not the presentation. 2. Replacing quotation signs with <q> </q> will actually preserve them *better*. Unless you replace all apostroph chars with the correct lsquo and rsquo characters or entities, almost every output will look nearer to the original if the renderer can insert the correct unicode lsquo rsquo glyphs. (Note: its difficult for a renderer to guess from context if it should render apos as apos, lsquo or rsquo, but it is easy to transform <q> and </q>.) But *if* you replace apos with lsquo and rsquo you may as well replace it with <q> and </q>. But of course all this discussion is moot, because my converter supports both ways and you can do as you like.

...

3. Adding <q> to all quotations (even with help of a script) is labour intensive, and adds little value.

Not at all. The script finds most of these. The validator finds some more. Then you make a last pass in the editor with a regexp search. (Of course doing Mark Twain will take a little longer.) -- Marcello Perathoner webmaster@gutenberg.org

Carlo Traverso

10:03 a.m.

New subject: PGTEI and more

It is usual, in freench typography (and in french typewriting too, btw) to include an half-width, non-breaking space before "broken punctiation", i.e. [:;!?]. Some typsestting engines (e.g. TeX through the \frenchspacing declaration, and LaTeX through the \usepackage[francais]{babel} header), implement this convention. So the TeX source should not contain these spaces, that will be included by the rendering engine. Putting in and uot these spaces can of course be automated. What should be done to encode correctly a french text in TEI, and what is (should be) done by the text rendering engine? For french text in ISO-Latin it is customary to include a full non-breaking space, in Unicode half-width spaces should be used. Similar conventions apply for em-dashes; here however spaces can be broken, so half-width (breaking) spaces can be used instead. Carlo

Jeroen Hellingman

11:49 a.m.

New subject: PGTEI and more

Carlo Traverso wrote:

...

It is usual, in freench typography (and in french typewriting too, btw) to include an half-width, non-breaking space before "broken punctiation", i.e. [:;!?].

My own go would be to ignore it in the encoded version, and let the rendering process deal with it. Jeroen. PS. The dutch story about Pisa is now in PP. Hope to post it somewhere next week -- and ofcourse will prepare a TEI version.

Joshua Hutchinson

1:43 p.m.

New subject: PGTEI and more

Steve Thomas wrote:

...

As I understand this (from an earlier post), 'rend="display"' is supposed to mean that the block should be indented (rather like the HTML blockquote).

This seems like a very poor choice of terms to me. CSS has a "display" property, which can take values such as "inline", "block", and -- crucially -- "none". "display:none" is used where you don't want the content displayed at all.

So using this rend="display" seems likely to result in confusion.

In any case, the choice is poor because it does not convey the information desired. If you use <quote> on its own without rend="display", does that indicate you don't want to display the content? Or that you don't want to indent it?

I personally don't see any need to use rend here. If you are quoting a passage from some other work, then enclose it in <quote> .. </quote>. That's enough. When someone comes to present this (e.g. in an HTML version), the most natural thing would be to convert the tag to blockquote. The rend is redundant.

You know... Thank you, Steve. When I read this, I had a "duh!" moment and slapped my head. You are absolutely right. <quote> *should* just result in a blockquote when converted to HTML. The rend=display is redundant here. Josh

7545

Age (days ago)

7548

Last active (days ago)

List overview

Download

16 comments

10 participants

participants (10)

bkeir＠pgdp.net
Brad Collins
Carlo Traverso
Greg Newby
Jeroen Hellingman
Jon Noring
Joshua Hutchinson
Marcello Perathoner
Scott Lawton
Steve Thomas