Re: [gutvol-d] About the XML debate

20 Aug 2005

      Joshua wrote:

[keeping his whole reply intact]
...
My main involvement with PG texts comes from a DP background.  I'm one
of the folks that help put the PG texts in place.  So my perspective is
not as much from the point of reading the texts and it is producing the
texts.  This isn't to say I don't consider the reader, but everyone 
tries to scratch their own itches first, and my itches are from a 
producer's point of view.
When you create a PG text now a days, most people create multiple 
"versions."  At the most basic, people usually create the text version
and a HTML version.  Text is because that is the minimum required at PG,
and HTML because there is a lot of information that cannot be well 
represented by a plain text file opened in Notepad.  Images are the 
first example that come to mind.
Then, there are some texts which require/practically beg for additional
"versions".  We have scientific texts that really need a latex master
document that is rendered to PDF.  Languages Other Than English (LOTE)
texts that require a larger character set than ASCII, so you might do a
UTF-8 encoded text.
The problem is, once you've create the first version (let's say it is
the UTF-8 encoded plaintext format), you now have to do the manual work
for the other formats.  Sometimes this is trivial, sometimes it is not.
But to make matters worse, it is not uncommon to notice a typo in the
HTML that you didn't fix earlier.  Now, you have to go back to the other
versions and make the same "fix".  This very quickly becomes an 
organizational nightmare as I'm sure you can imagine.
XML solves this to a large extent.  I create one "master" document and
then literally click a button and I get a UTF-8 encoded .txt file, a
Latin-1 encoded .txt file, an ASCII encoded .txt file, a HTML encoded
file, and a PDF file.  I post all of them to the ww'ers in a fraction of
the time.  Plus, if someone down the road finds a problem in the text,
the fix can be applied to the master XML and the others files can be
regenerated.
We are not doing away with the .txt files you want.  We are coming up
with a more efficient way to create it (along with the many other 
document formats people want).
Oh, and yes, it is possible to create conversion routines for other 
formats as well.  Marcello had a Palm format working at one point, if I
remember correctly.  A MS reader .LIT is possible (the specs are freely
available and under a free license, we just need someone to take the
time to create the converter).  Rocket ebook reader and others should
all be possible as long as the spec for the format is freely available.
Please feel free to ask any questions you want on the subject.  I'll be
happy to run at the mouth all you want!  ;)
Kudos!

This is by far the best reply I've yet seen on the practical benefits of
XML for producing structured digital texts. Cogent, simple, and to the
point, backed up by real-world experience.

Joshua, you might consider submitting what you wrote to David Rothman's
TeleRead blog as a guest blog article (his blog is one of the more popular
blogs on the Internet, and by far the most read blog regarding ebooks
and digital libraries.) Let me know -- I will be glad assist.

Jon

Re: [gutvol-d] About the XML debate

Jon Noring