About the XML debate

Going through the archives in my mail box the last few days I wanted to add my .02 as someone who is not as close to the project as any of you. I think that any XML work (if Gutenberg goes that way) needs to be done in addition to, not in replacement of plaintest. I don't care if XML becomes as common as plaintext and everyone uses it, you can run into a problem in 20 years where XML falls out of favor and there won't be software to render it properly. This will lead poor fools having to redo all the documents all over again. This is not a good thing. Picking plaintext is genious in the sense, that unless basic ASCII changes (not likely compared to XML losing favor) plaintext will always be able to be read. This allows it also to be read on older machines. Maybe some of you don't care that the guy with commodore 64 can read plaintext but can't read XML because he is only one person on the planet. But when you see all the other 1 person implementations add together it becomes a decent percentage. My thoughts on a software to do the annotations would be to have a read that could overlay annotations on the screen but maintain the base document in plaintext or maintain a seperate annotated edition. The problem also we come into when we discuss modern annotations is who do we decide who is qualfied to release (write up) the annotations for a certain book. I may not agree with the annotationist that Bob likes, and Sue will hate the choices bob and I will make. The best solution I could honestly see to keep a degree of sanity is to Wiki each book you wanted to annotate. But I'll go back to reading now the archives now, I just though plaintext still needed a champion before the whole world went completely XML crazy. Remember - plaintext was supposed to be replaced by Postscript plaintext was supposed to be replaced by word perfect plaintext was supposed ot be replaced by word plaintext was supposed to be replaced by PDF plantext was supposed to be replaced by HTML plaintext is supposed ot be replaced by XML? Not bloody likely

On 18 Aug 2005, at 12:14, Brent Gueth wrote:
Going through the archives in my mail box the last few days I wanted to add my .02 as someone who is not as close to the project as any of you. I think that any XML work (if Gutenberg goes that way) needs to be done in addition to, not in replacement of plaintest.
You are absolutely right, and I do not think you have anything to fear. The way I understood it, if some application of XML is going to be used at all, it will be as a storage format. From that format an immediate plain vanilla text file will be generated, that will be stored alongside the XML version. -- branko collin collin@xs4all.nl

Brent Gueth wrote:
This is not a good thing. Picking plaintext is genious in the sense, that unless basic ASCII changes (not likely compared to XML losing favor) plaintext will always be able to be read.
Are we confusing ASCII with plain text? Because the former is an encoding and the latter is a format. You are comparing apples with rocks and telling us we should eat rocks because they last longer. Plaintext will stay forever because it defines nothing, and so will never have to be changed. TANSTAAPF: there ain't no such thing as a plaintext format. There are roughly 16,000 plaintext formats around, because every etext defines its own format. You cannot talk of a plaintext "format" at all.
Maybe some of you don't care that the guy with commodore 64 can read plaintext but can't read XML because he is only one person on the planet.
That's easy to fix: he should get a girlfriend. (But he should let the C64 at home on the first few dates.) Basically you say that millions of people with modern PCs should be forced to use stone-age technology because one person somewhere cannot afford to get an old PC from ebay? Even the PCs we are sending to African Schools are Pentium class machines!
plaintext was supposed to be replaced by Postscript plaintext was supposed to be replaced by word perfect plaintext was supposed ot be replaced by word plaintext was supposed to be replaced by PDF plantext was supposed to be replaced by HTML plaintext is supposed ot be replaced by XML? Not bloody likely
Horses were supposed to be replaced by cars. Are we confusing existence with fitness for purpose? Or are we confusing existence with demand? Because nobody wants plaintext. Plaintext is ugly on a screen, is ugly on a PDA, is ugly on paper. Plaintext cannot be converted automatically into anything else. But, yes, it exists, like the treponema pallidum. -- Marcello Perathoner webmaster@gutenberg.org

On Thu, Aug 18, 2005 at 12:14:16PM -0700, Brent Gueth wrote:
... The problem also we come into when we discuss modern annotations is who do we decide who is qualfied to release (write up) the annotations for a certain book. I may not agree with the annotationist that Bob likes, and Sue will hate the choices bob and I will make.
The best solution I could honestly see to keep a degree of sanity is to Wiki each book you wanted to annotate.
Thanks for your note, Brent, and for taking the time to read through the archives. A quick comment on this: PG is more likely to let other folks take care of annotation. Although we have some producer-contributed reviews etc. in some eBooks, we generally look to other sites to host reviews and other editorial content. For example, many of our catalog entries have links to Wikipedia articles for info about authors & titles. It might be that we'll have a "PG metadata"-type project affiliate at some point (see our philosphy/FAQ/about documents for some essays on this type of experimentation & growth). But I don't see adding such content to the eBooks themselves any time soon. Of course, such views could change as the people involved in PG change, and the world continues to change... -- Greg

Brent Gueth <creeva@gmail.com> writes:
I don't care if XML becomes as common as plaintext and everyone uses it, you can run into a problem in 20 years where XML falls out of favor and there won't be software to render it properly. This will lead poor fools having to redo all the documents all over again. This is not a good thing.
As has already been mentioned, ASCII is an encoding and plaintext is a format. And ASCII is being replaced with Unicode. Some decades from now ASCII will gradually go the way of the Dodo. This is inevitable as the vast number of people in the world require a larger character set to read and write than native English speakers. As for plaintext, one of the core design goals for XML is that it you'll be able to open it in any text editor and read it. If a file is human readable when it's opened in a text editor then it's a type of plain text. All XML does is place tags around text in order to give the text a structure that machines can understand. As long as you have a text editor, you'll be able to read XML. A good text editor can clean out all of the tags with a simple regular expression like "<.*[^>]*>". Script languages like perl, python, ruby or any other language likely to come down the pike will be able to process XML and convert it into whatever comes along in the future. Very few applications render XML directly (except perhaps word processors), everyone else converts it into html, pdf or other formats for display. SGML (XML's older sister) has been around for, what, twenty years or more? And all SGML documents are easily converted into XML. XML is simplier and designed to be around as an archive format for far longer than that. Think of the XML version of an ebook as expression of a work, which is then converted into various manifestations including html, latex (which can be converted to PDF via Postscript), html, tei as well as a plain text file with not markup. Most people will never know about the master version in XML, they only will see the file formats they use to read books. XML is only a long term and safe archive format which is flexible enough to describe both the structure of a text and if you want it, also the semantic content of a text. I suggest that you google for a basic intro to XML to get an idea of what it really is. If you know anything about HTML, XML is very easy -- you can think of it as HTML where you can invent your own tags. I personally don't like DOM and XSLT which are both used for processing XML and converting it into formats like html which browsers can render. But this is no problem because I can just as easily convert and XML document into a LISP data structure of S-expressions which Lisp, Elisp, Scheme or Guile can process very easily. Once you understand that XML is just plain text, you can use any software for processing text to work with it. As long as there is a text editor, an XML documment will never be lost. b/ -- Brad Collins <brad@chenla.org>, Bangkok, Thailand

Brad Collins wrote:
Brent Gueth writes:
I don't care if XML becomes as common as plaintext and everyone uses it, you can run into a problem in 20 years where XML falls out of favor and there won't be software to render it properly. This will lead poor fools having to redo all the documents all over again. This is not a good thing.
[snip]
As for plaintext, one of the core design goals for XML is that you'll be able to open it in any text editor and read it. If a file is human readable when it's opened in a text editor then it's a type of plain text. All XML does is place tags around text in order to give the text a structure that machines can understand.
Good points. Properly marked up documents, where the XML vocabulary describes the structure and semantics of the text, is highly repurposeable. Should the day come that XML disappears from use, it will be relatively easily to transform such XML documents into whatever is new. Why? As Brad notes it's because an XML document comprises "plain" text which has markup added (the markup itself is also "plain" text) describing what the text is. One can think of markup as simply a sort of descriptive metadata. In the worst case scenario where one can't find anyone to write a script or apply an XML processing application to do the transformation (a scenario which will only happen if world-wide catastrophe strikes), so long as there are running computers with text editors laying around, one can open up the XML document in a text editor, and there is the "plain" text, right in front of you, nicely described with markup. Though it may take some work (depending upon the extent of the markup), and some text metadata information may be lost, one can use the text editor to strip out the markup and restore the content to "traditional" PG plain text -- if so desired. (In essence, XML markup follows Michael Hart's philosophy of using text encoding to digitally preserve public domain Works.) DP plans to apply an intelligently-designed XML vocabulary optimized for book materials to their first-generation masters (they are looking at a well-constrained subset of TEI, such as PGTEI now under development by Marcello and others.) This is a good plan. Jon
participants (6)
-
Brad Collins
-
Branko Collin
-
Brent Gueth
-
Greg Newby
-
Jon Noring
-
Marcello Perathoner