re: Re: [gutvol-d] Final PGTEI... page numbers

brad said:
Detractors of XML on this list have brought up the fact that the TEI manual is 1400 pages long as a negative. Why?
actually, it was jeroen who initially mentioned that fact. but since you asked, the reason this is seen as a "negative" is because we think that precious few of the volunteers who have traditionally shouldered the effort of creating e-texts will continue to do so if an understanding of those 1400 pages of t.e.i. documentation were to become a prerequisite. but maybe now distributed proofreaders has enough people on-board that they feel less uncomfortable taking that risk... or maybe not, as their tentative plan thus far involves adding two "markup" rounds (at least, and maybe more) to their existing two "proofing" rounds, so as to minimize the number of people who need to be concerned with markup.
This shows that TEI is well documented.
um, well, yes, i guess it does. although _more_ documentation is not _always_ a good sign of _better_ documentation, is it?
As a general rule, the more documentation that is available for a spec the more mature and useful the standard and the easier it is to learn and implement.
i'm not quite so sure i agree with that "general rule", brad... i think it would be just as possible -- and more compelling -- to formulate a "general rule" that the more documentation that a spec needs, the more complex it is, which means that it is _harder_ to "learn and implement"... i'm not afraid of documentation. indeed, quite to the contrary, i'm one of those rare people who often prefers reading it _first_, because if you can stomach it, it'll save you lots of fiddling time. and i'm a word geek too. so i find the massive t.e.i. documentation -- and indeed the whole framework itself -- to be a remarkable and fascinating piece of work. it is mind-boggling to witness how _complex_ and _variegated_ a comprehensive examination of text can become, once you pour a foundation and start building a building. on the other hand, i can equally admire a system that boils things down to their essence, and creates great benefits with few costs. if all the volunteers contributing their efforts to project gutenberg were word geeks as willing to throw themselves into a devotion of documentation, like you and me, brad, it might not matter whether we went with the complex system or one that is a lot more easy. but given that they probably aren't, we should think very carefully before committing them to a world with a high degree of difficulty. as you put it, the learning of a complex system like t.e.i. is often "a gradual process of incremental epiphanies". can we _survive_ the situation where thousands of volunteers are put through that? with perhaps many becoming alienated in the course of doing so? unless i miss my guess, just the last few days of "how do we do this?" posts on this listserve have tried the patience of most subscribers... (which leads me to suggest that perhaps there is another listserve that is more appropriate for that, where the markup geeks can go?) -bowerbird

Bowerbird said:
brad said:
Detractors of XML on this list have brought up the fact that the TEI manual is 1400 pages long as a negative. Why?
but since you asked, the reason this is seen as a "negative" is because we think that precious few of the volunteers who have traditionally shouldered the effort of creating e-texts will continue to do so if an understanding of those 1400 pages of t.e.i. documentation were to become a prerequisite.
but maybe now distributed proofreaders has enough people on-board that they feel less uncomfortable taking that risk...
The "1400" pages is for the full-blown TEI spec, which includes some pretty obscure stuff. Interspersed within it include long and (to me) fascinating general discourses on the structure of textual documents, with copious examples. In essence, it is probably one of the better "textbooks" ever written on this topic even if it is only there to support the description of the TEI markup.
or maybe not, as their tentative plan thus far involves adding two "markup" rounds (at least, and maybe more) to their existing two "proofing" rounds, so as to minimize the number of people who need to be concerned with markup.
Essentially yes. Distributed Proofreader's longer-term vision, as I understand it (and Juliet can correct me where I'm off anywhere in this message), is to settle upon some subset of TEI to apply to all documents (either use TEI-Lite or some other comparable subset -- for the occasional oddball document the more extended TEI will be used in "manual" mode.) In addition, for most of DP's volunteers, the markup will be "under-the-hood" and largely invisible -- most of the volunteer work anyway is for copyediting the text (correcting OCR errors), not markup insertion, so no need to require these volunteers to learn the gory details of TEI. Only the most experienced and interested of the DP volunteers, who do the final cleanup/finishing stages, will actually play with the markup itself.
as you put it, the learning of a complex system like t.e.i. is often "a gradual process of incremental epiphanies". can we _survive_ the situation where thousands of volunteers are put through that? with perhaps many becoming alienated in the course of doing so?
Well as I noted above, DP, where the action is for large-scale production of e-texts (they are now the actual engine which drives PG's growth), does not plan to inflict TEI on the general first-level volunteers (this is what I inferred from my talks a while back with Charles.) With regards to the specifics of the markup which DP will eventually use (likely a subset of TEI as previously noted), that will ultimately be determined by them based on compatibility with the production interface as well as what works best for the various uses (note the plural) of the texts. [Aside: the DP-produced XML Master texts will certainly be used for many purposes, all of which instill requirements on the markup specification, and which must be considered -- this is the biggest missing area not being discussed on gutvol-*. The most exciting of these is where the DPXML texts will be archived into a special library-like repository which allows a very high-level of end-user interface and customizability to the collection (e.g., bookmarking, annotation, interlinking within the repository and to other content repositories, blogging, etc. -- all things several associates and I are now working on.) Of course, the other uses are to generate portable digital formats as the end-user wants, higher-quality text-to-speech capability, and Michael Hart's dream of language translation. These, too, guide the nature of the Master markup vocabulary. Of course, there must be library-compatible and properly designed catalog, metadata, and identifier information for each e-text in the repository. And where they exist, the original page scans of the source documents will also be available and interlinked with the XML versions. Brewster Kahle at the Internet Archive will *gladly* archive the page scans for DP/PG. I envision that most of the earlier portion of the PG collection, which contains most of the classics, will be redone by DP from source documents to assure proper metadata collection, uniformity and conformity with the rest of the DPXML texts and to have the page scans available. Once DP gets into major production with many more volunteers, redoing the earlier texts won't be a big deal -- it needs to eventually be done anyway, in my view.] I would think and hope that DP will convene a formalized working group of the various experts and enthusiasts here and elsewhere to hammer out the DP Markup Specification based on requirements gathering and analysis, which is the proper way to do this. The DPMWG will have a more formalized and committed leadership structure, with weekly teleconference calls. From my standards working group experience, it's amazing how much stuff gets done during weekly teleconferences and the occasional face-to-face meeting (biannual or annual), while written listserv exchanges in a group like gutvol-* usually ends up going around and around in circles. I expect it won't take that long to hammer out the "beta" of the DP Markup Vocabulary when the working group is organized properly and committed to generate and then resolve the various requirements. I would even ask someone like C. Michael Sperberg-McQueen to be an advisor to the working group (his brother Roger Sperberg and I have worked closely together on various projects in the past. <smile/>.) I would think that DP's vision to include TEI in its next generation system so as to do *large-scale* production of e-texts (possibly up to a few hundred *per day* to begin the process of one million texts in a decade or two) will greatly excite the TEI community and we will attract some pretty smart and dedicated working group members to add to the several already here. Volunteerism is not only for the "rank and file" (those who will do the basic copyediting), but also includes those who are more technically minded and understand the markup issues as it relates to the production environment. Jon Noring

Jon Noring wrote:
I would think and hope that DP will convene a formalized working group of the various experts and enthusiasts here and elsewhere to hammer out the DP Markup Specification based on requirements gathering and analysis, which is the proper way to do this.
I think design-by-committee is the wrong way to go about this. Experimenting markup with more and more complicated books and refining the specs along the way seems to me far more promising. But that's the Cathedral vs. the Bazaar discussion again. To see a particularly disgusting example of design by committee just look at XSLT.
The DPMWG will have a more formalized and committed leadership structure, with weekly teleconference calls. From my standards working group experience, it's amazing how much stuff gets done during weekly teleconferences and the occasional face-to-face meeting (biannual or annual), while written listserv exchanges in a group like gutvol-* usually ends up going around and around in circles.
Teleconferencing will essentially shut out all non-us based people via the prohibitive costs or via the language barrier. Non native English speakers like me may have a better standing in a written discussion channel.
I would even ask someone like C. Michael Sperberg-McQueen to be an advisor to the working group
I don't know if the TEI people could advise us much. What we need is not advice about the use of TEI as markup language but about the use of TEI as master format for automatic rendition into a wide variety of output formats. There is the tei-presentation list for this sort of thing but traffic there has been very light. The only person who could really help is Sebastian Rahtz. -- Marcello Perathoner webmaster@gutenberg.org
participants (3)
-
Bowerbird@aol.com
-
Jon Noring
-
Marcello Perathoner