Re: [gutvol-d] blah blah blah blah blah

4 Oct 2012

...
PG is a repository of digitised books with emphasis on pure text. Why?
Michael thought far ahead; look how other formats are changing the whole
time, and that's why you continue to have your religious wars without end or
solution (I'm not going to be involved!).
First of all, what PG requires sadly is NOT a "text" file.  For example I
can create a "text" file from an html file in two seconds doing "cut and
paste."  The result is a text file.  Is it something that PG will accept as
their "PG-text" file?  Sadly, no.  Michael thought the details of text
format choices didn't matter, and then proceeded to make a "fatal error" in
what detailed text format he chose to standardize on.  Namely he insisted on
detailed and gratuitous extraneous newline inclusion rules.  Also his rules
for even how "PG-text" files work have changed over the years!  Even his
"bottom-line" standard isn't standard!

Secondly, the theory is, or was, that one could move forward from a PG text
file to more modern formats in a relatively small, finite amount of work --
such that the text files represent a "baseline" format.  Well, is this true?
Not really.  The tale of 76.txt shows that this is not true.  While working
forward from an existing PG txt file to a "modern" version of an ebook file
format is certainly LESS work than starting from scratch, it is still a
tremendous amount of work, and requires continued access and comparison to
the original page images.

If one insisted on a true baseline format, what would it be?  Answer: high
resolution digitized page images.  How useful would that be?  Well, I've
read old books in digitized page image form, and it wasn't a whole lot of
fun.  Almost better to read the raw OCR'ed versions.

Re: [gutvol-d] blah blah blah blah blah

James Adcock