
First of all, please take this as it is intended, namely my experiences while attempting to convert a short and sweet text to the PGTEI format found here (http://www.gutenberg.org/tei/). It is my hope that this will lead to some improvements in the process. I also apologize for the large size of this message. I felt it was necessary to get all the information in place in the email so no one was making assumptions about info that was alluded to but not present. Ok, I decided to start at the beginning ... namely our first e-text, the US Declaration of Independence (original: http://www.gutenberg.org/etext/1). The upside is that is short and very simple. The first thing I did was grab the standard PGTEI header from the documentation (http://www.gutenberg.org/tei/doc/pg-guide.html#toc_12). This spaghetti is lot easier than it looks. Further, I definitely see how this could be easily generated by filling out a web form (somewhat similar to what I understand is done right now when DP submits a text to the whitewashers). It contains information like the original creation date, who wrote it, Library of Congress subject classifications, who converted it to TEI (me), etc. (The PGTEI encoded file is attached at the end of this message for reference purposes.) The nice thing here is that the PG header and footer information is auto-generated by the <front></front> and <back></back> sections. The Declaration file in PG has a small foreword from Michael. I felt that should not be marked as if it were part of the main document. Luckily, TEI-Lite documentation provided a solution. (http://www.tei-c.org/Lite/teiu5_en.html#h52) You can mark up some text in the header section with a type of foreword. EXAMPLE: <div1 type="foreword"> This is an example foreword section. </div1> Note: Paragraphs have to be surrounded by <p></p> markup, just like HTML. This shouldn't be difficult for anyone trying to tackle this... It certainly felt natural enough for me. Next, I added the actual Declaration text in the <body></body> section. Again, all the paragraphs needed to be wrapped in <p></p>. I also ran into one problem with the & character. Instead of looking up the escape code, I went lazy and just converted it to 'and'. This would also be required for an HTML edition, so I don't consider that a big deal. That was it on creating the PGTEI markup. Total time, even with looking things up, maybe 20 minutes of my time. And, no, I haven't done this before, so this is coming into it raw. Next step was to use the validator on the page (http://www.gutenberg.org/tei/services/tei-online). It complained about one typo on my part and the & I mentioned before. The errors are NOT very friendly, but anyone familiar with the W3C validator should be able to puzzle it out. Next, I had it create a text file. This went very well. The resulting file looked pretty good to me. I didn't run it through GutCheck, but nothing jumped out at me as being problematic. Granted, this was a very simple text, so there are probably limitations in this conversion that I just haven't run into yet. Lastly, I had it create a HTML file. There are two problems I encountered here. One, cosmetic and fixable by changing the CSS, isn't that big a deal. The second is more of a deal breaker, but still fixable, I'd imagine. 1) The CSS specifies more of a printed page style than a web based style. For instance, all the paragraphs have no blank line between them and have a first line indent, just like a printed page. However, to me, this was a bit jarring, since it isn't the format I'd used to on the web. Again, this is mostly cosmetic and easily changeable. 2) The resulting HTML, while rendering fine in the browsers I have here, is NOT valid HTML. The file specifies HTML 4.01 strict, but there were 13 warnings/errors when I used W3C's validator on it. I didn't check real closely, but it looked like some of them were perfectly valid under HTML 4.01 transitional, and the others are fixable. The XSLT conversion process can probably be tweaked by someone knowledgable in that area to eliminate the validation errors. *** Well, that's my quick personal experiment. My question for the experts: Can the HTML validation problem be easily fixed? I'd also like to request a change to the CSS used, but that is a personal preference and something to really worry about after the show-stoppers are fixed. My next experiment will choose a text with some other stuff like poetry in it, so that I can see what more complexity does to the whole process. Josh **** Attached Declaration PGTEI file: <?xml version="1.0" encoding="iso-8859-1" ?> <!DOCTYPE TEI.2 SYSTEM "pgtei.dtd"> <TEI.2 lang="en-us"> <teiHeader> <fileDesc> <titleStmt> <title>The Declaration of Independence</title> <author></author> </titleStmt> <editionStmt> <edition n="12">Edition 12 <date value="2004-10">October 2004</date> </edition> </editionStmt> <publicationStmt> <publisher>Project Gutenberg</publisher> <pubPlace><xref url="www.gutenberg.org">www.gutenberg.org</xref></pubPlace> <date value="2004-10">October 2004</date> <idno type='etext-nr'>1</idno> <idno type='etext-file'>when</idno> <availability status='free'> <p>This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included online at <xref url="www.gutenberg.org/license">www.gutenberg.org/license</xref></p> </availability> </publicationStmt> <sourceDesc> <bibl> unknown </bibl> </sourceDesc> </fileDesc> <encodingDesc> <classDecl> <taxonomy id="lc"> <bibl> <title>Library of Congress Classification</title> </bibl> </taxonomy> </classDecl> </encodingDesc> <profileDesc> <langUsage> <language id="en-us">American</language> </langUsage> <textClass> <classCode scheme="lc"> JK: Political science: Political inst. and pub. Admin.: United States </classCode> <keywords> <list> <item>Government</item> <item>United States</item> </list> </keywords> </textClass> </profileDesc> <revisionDesc> <change> <date value="1971-12">December, 1971</date> <respStmt> <name>Michael S. Hart</name> </respStmt> <item>Project Gutenberg Edition 12</item> </change> <change> <date value="2004-10">October 2004</date> <respStmt> <name>Joshua Hutchinson</name> </respStmt> <item>TEI markup</item> </change> </revisionDesc> </teiHeader> <text> <front> <divGen type="titlepage" /> <divGen type="pgheader" rend="newpage" /> <divGen type="toc" rend="newdoublepage" /> <div1 type="foreword"> <p>The United States Declaration of Independence was the first Etext released by Project Gutenberg, early in 1971. The title was stored in an emailed instruction set which required a tape or diskpack be hand mounted for retrieval. The diskpack was the size of a large cake in a cake carrier, cost $1500, and contained 5 megabytes, of which this file took 1-2%. Two tape backups were kept plus one on paper tape. The 10,000 files we hope to have online by the end of 2001 should take about 1-2% of a comparably priced drive in 2001.</p> <p>This file was never copyrighted, Sharewared, etc., and is thus for all to use and copy in any manner they choose. Please feel free to make your own edition using this as a base.</p> <p>In my research for creating this transcription of our first Etext, I have come across enough discrepancies [even within that official documentation provided by the United States] to conclude that even "facsimiles" of the Declaration of Indendence will NOT going to be all the same as the original, nor of other "facsimiles." There is a plethora of variations in capitalization, punctuation, and, even where names appear on the documents [which names I have left out].</p> <p>The resulting document has several misspellings removed from those parchment "facsimiles" I used back in 1971, and which I should not be able to easily find at this time, including "Brittain."</p> </div1> </front> <body> <div> <head>The Declaration of Independence of The United States of America</head> <p>When in the Course of human events, it becomes necessary for one people to dissolve the political bands which have connected them with another, and to assume, among the Powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation.</p> <p>We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty, and the pursuit of Happiness. That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed, That whenever any Form of Government becomes destructive of these ends, it is the Right of the People to alter or to abolish it, and to institute new Government, laying its foundation on such principles and organizing its powers in such form, as to them shall seem most likely to effect their Safety and Happiness. Prudence, indeed, will dictate that Governments long established should not be changed for light and transient causes; and accordingly all experience hath shown, that mankind are more disposed to suffer, while evils are sufferable, than to right themselves by abolishing the forms to which they are accustomed. But when a long train of abuses and usurpations, pursuing invariably the same Object evinces a design to reduce them under absolute Despotism, it is their right, it is their duty, to throw off such Government, and to provide new Guards for their future security. --Such has been the patient sufferance of these Colonies; and such is now the necessity which constrains them to alter their former Systems of Government. The history of the present King of Great Britain is a history of repeated injuries and usurpations, all having in direct object the establishment of an absolute Tyranny over these States. To prove this, let Facts be submitted to a candid world.</p> <p>He has refused his Assent to Laws, the most wholesome and necessary for the public good.</p> <p>He has forbidden his Governors to pass Laws of immediate and pressing importance, unless suspended in their operation till his Assent should be obtained; and when so suspended, he has utterly neglected to attend to them.</p> <p>He has refused to pass other Laws for the accommodation of large districts of people, unless those people would relinquish the right of Representation in the Legislature, a right inestimable to them and formidable to tyrants only.</p> <p>He has called together legislative bodies at places unusual, uncomfortable, and distant from the depository of their Public Records, for the sole purpose of fatiguing them into compliance with his measures.</p> <p>He has dissolved Representative Houses repeatedly, for opposing with manly firmness his invasions on the rights of the people.</p> <p>He has refused for a long time, after such dissolutions, to cause others to be elected; whereby the Legislative Powers, incapable of Annihilation, have returned to the People at large for their exercise; the State remaining in the mean time exposed to all the dangers of invasion from without, and convulsions within.</p> <p>He has endeavoured to prevent the population of these States; for that purpose obstructing the Laws of Naturalization of Foreigners; refusing to pass others to encourage their migration hither, and raising the conditions of new Appropriations of Lands.</p> <p>He has obstructed the Administration of Justice, by refusing his Assent to Laws for establishing Judiciary Powers.</p> <p>He has made judges dependent on his Will alone, for the tenure of their offices, and the amount and payment of their salaries.</p> <p>He has erected a multitude of New Offices, and sent hither swarms of Officers to harass our People, and eat out their substance.</p> <p>He has kept among us, in times of peace, Standing Armies without the Consent of our legislatures.</p> <p>He has affected to render the Military independent of and superior to the Civil Power.</p> <p>He has combined with others to subject us to a jurisdiction foreign to our constitution, and unacknowledged by our laws; giving his Assent to their Acts of pretended legislation:</p> <p>For quartering large bodies of armed troops among us:</p> <p>For protecting them, by a mock Trial, from Punishment for any Murders which they should commit on the Inhabitants of these States:</p> <p>For cutting off our Trade with all parts of the world:</p> <p>For imposing taxes on us without our Consent:</p> <p>For depriving us, in many cases, of the benefits of Trial by Jury:</p> <p>For transporting us beyond Seas to be tried for pretended offences:</p> <p>For abolishing the free System of English Laws in a neighbouring Province, establishing therein an Arbitrary government, and enlarging its Boundaries so as to render it at once an example and fit instrument for introducing the same absolute rule into these Colonies:</p> <p>For taking away our Charters, abolishing our most valuable Laws, and altering fundamentally the Forms of our Governments:</p> <p>For suspending our own Legislatures, and declaring themselves invested with Power to legislate for us in all cases whatsoever.</p> <p>He has abdicated Government here, by declaring us out of his Protection and waging War against us.</p> <p>He has plundered our seas, ravaged our Coasts, burnt our towns, and destroyed the lives of our people.</p> <p>He is at this time transporting large armies of foreign mercenaries to compleat the works of death, desolation and tyranny, already begun with circumstances of Cruelty and perfidy scarcely paralleled in the most barbarous ages, and totally unworthy of the Head of a civilized nation.</p> <p>He has constrained our fellow Citizens taken Captive on the high Seas to bear Arms against their Country, to become the executioners of their friends and Brethren, or to fall themselves by their Hands.</p> <p>He has excited domestic insurrections amongst us, and has endeavoured to bring on the inhabitants of our frontiers, the merciless Indian Savages, whose known rule of warfare, is an undistinguished destruction of all ages, sexes and conditions.</p> <p>In every stage of these Oppressions We have Petitioned for Redress in the most humble terms: Our repeated Petitions have been answered only by repeated injury. A Prince, whose character is thus marked by every act which may define a Tyrant, is unfit to be the ruler of a free People.</p> <p>Nor have We been wanting in attention to our British brethren. We have warned them from time to time of attempts by their legislature to extend an unwarrantable jurisdiction over us. We have reminded them of the circumstances of our emigration and settlement here. We have appealed to their native justice and magnanimity, and we have conjured them by the ties of our common kindred to disavow these usurpations, which would inevitably interrupt our connections and correspondence. They too have been deaf to the voice of justice and of consanguinity. We must, therefore, acquiesce in the necessity, which denounces our Separation, and hold them, as we hold the rest of mankind, Enemies in War, in Peace Friends.</p> <p>We, therefore, the Representatives of the United States of America, in General Congress, Assembled, appealing to the Supreme Judge of the world for the rectitude of our intentions, do, in the Name, and by the Authority of the good People of these Colonies, solemnly publish and declare, That these United Colonies are, and of Right ought to be Free and Independent States; that they are Absolved from all Allegiance to the British Crown, and that all political connection between them and the State of Great Britain, is and ought to be totally dissolved; and that as Free and Independent States, they have full Power to levy War, conclude Peace, contract Alliances, establish Commerce, and to do all other Acts and Things which Independent States may of right do. And for the support of this Declaration, with a firm reliance on the Protection of Divine Providence, we mutually pledge to each other our Lives, our Fortunes and our sacred Honor.</p> </div> </body> <back rend="newdoublepage"> <divGen type="footnotes" /> <divGen type="colophon" rend="newpage" /> <divGen type="pgfooter" rend="newpage" /> </back> </text> </TEI.2>