
Bowerbird said:
brad said:
Detractors of XML on this list have brought up the fact that the TEI manual is 1400 pages long as a negative. Why?
but since you asked, the reason this is seen as a "negative" is because we think that precious few of the volunteers who have traditionally shouldered the effort of creating e-texts will continue to do so if an understanding of those 1400 pages of t.e.i. documentation were to become a prerequisite.
but maybe now distributed proofreaders has enough people on-board that they feel less uncomfortable taking that risk...
The "1400" pages is for the full-blown TEI spec, which includes some pretty obscure stuff. Interspersed within it include long and (to me) fascinating general discourses on the structure of textual documents, with copious examples. In essence, it is probably one of the better "textbooks" ever written on this topic even if it is only there to support the description of the TEI markup.
or maybe not, as their tentative plan thus far involves adding two "markup" rounds (at least, and maybe more) to their existing two "proofing" rounds, so as to minimize the number of people who need to be concerned with markup.
Essentially yes. Distributed Proofreader's longer-term vision, as I understand it (and Juliet can correct me where I'm off anywhere in this message), is to settle upon some subset of TEI to apply to all documents (either use TEI-Lite or some other comparable subset -- for the occasional oddball document the more extended TEI will be used in "manual" mode.) In addition, for most of DP's volunteers, the markup will be "under-the-hood" and largely invisible -- most of the volunteer work anyway is for copyediting the text (correcting OCR errors), not markup insertion, so no need to require these volunteers to learn the gory details of TEI. Only the most experienced and interested of the DP volunteers, who do the final cleanup/finishing stages, will actually play with the markup itself.
as you put it, the learning of a complex system like t.e.i. is often "a gradual process of incremental epiphanies". can we _survive_ the situation where thousands of volunteers are put through that? with perhaps many becoming alienated in the course of doing so?
Well as I noted above, DP, where the action is for large-scale production of e-texts (they are now the actual engine which drives PG's growth), does not plan to inflict TEI on the general first-level volunteers (this is what I inferred from my talks a while back with Charles.) With regards to the specifics of the markup which DP will eventually use (likely a subset of TEI as previously noted), that will ultimately be determined by them based on compatibility with the production interface as well as what works best for the various uses (note the plural) of the texts. [Aside: the DP-produced XML Master texts will certainly be used for many purposes, all of which instill requirements on the markup specification, and which must be considered -- this is the biggest missing area not being discussed on gutvol-*. The most exciting of these is where the DPXML texts will be archived into a special library-like repository which allows a very high-level of end-user interface and customizability to the collection (e.g., bookmarking, annotation, interlinking within the repository and to other content repositories, blogging, etc. -- all things several associates and I are now working on.) Of course, the other uses are to generate portable digital formats as the end-user wants, higher-quality text-to-speech capability, and Michael Hart's dream of language translation. These, too, guide the nature of the Master markup vocabulary. Of course, there must be library-compatible and properly designed catalog, metadata, and identifier information for each e-text in the repository. And where they exist, the original page scans of the source documents will also be available and interlinked with the XML versions. Brewster Kahle at the Internet Archive will *gladly* archive the page scans for DP/PG. I envision that most of the earlier portion of the PG collection, which contains most of the classics, will be redone by DP from source documents to assure proper metadata collection, uniformity and conformity with the rest of the DPXML texts and to have the page scans available. Once DP gets into major production with many more volunteers, redoing the earlier texts won't be a big deal -- it needs to eventually be done anyway, in my view.] I would think and hope that DP will convene a formalized working group of the various experts and enthusiasts here and elsewhere to hammer out the DP Markup Specification based on requirements gathering and analysis, which is the proper way to do this. The DPMWG will have a more formalized and committed leadership structure, with weekly teleconference calls. From my standards working group experience, it's amazing how much stuff gets done during weekly teleconferences and the occasional face-to-face meeting (biannual or annual), while written listserv exchanges in a group like gutvol-* usually ends up going around and around in circles. I expect it won't take that long to hammer out the "beta" of the DP Markup Vocabulary when the working group is organized properly and committed to generate and then resolve the various requirements. I would even ask someone like C. Michael Sperberg-McQueen to be an advisor to the working group (his brother Roger Sperberg and I have worked closely together on various projects in the past. <smile/>.) I would think that DP's vision to include TEI in its next generation system so as to do *large-scale* production of e-texts (possibly up to a few hundred *per day* to begin the process of one million texts in a decade or two) will greatly excite the TEI community and we will attract some pretty smart and dedicated working group members to add to the several already here. Volunteerism is not only for the "rank and file" (those who will do the basic copyediting), but also includes those who are more technically minded and understand the markup issues as it relates to the production environment. Jon Noring