Re: [gutvol-d] a review of some digitization tools -- 018

14 Dec 2011

      BB>you know, the kind that jim labels "txt70", and lee calls "an
impoverished text-file"...

And what is perhaps most surprising is that ZML is not in a form that the
WWers would be willing to even accept.

BB>what these guys are never clear about is exactly _how_ volunteers are
supposed to transform the _text_ files (of the type we mentioned up above,
the type you end up with after your digitization) _into_ .html.  

What I am not clear about is why BB insists that what one starts from must
be an "an impoverished text-file" because I never work with text files per
se until I am forced to derive one at the end of my html development as a
needless extra step in order to get the PG WWers to accept my html work.  I
do not start with an "an impoverished text-file" for the simple reason that
my OCR gives me better file format choices which help preserve more of the
information available in the original page images, such that I do not have
to rediscover and re-enter that information again later manually -- after
needlessly throwing that information away in the first place just to reduce
the OCR result to txt70.

PS: I call it "txt70" for the simple reason that I wish to distinguish that
what PG insists one submit is not a text file in any normal sense, anymore
than ZML is a normal text file in any normal sense.  At least ZML has the
arguable advantage that it retains the original line breaks -- but I have
shown how these can be easily rederived.  And the txt70 has a PG-specific
requirement to put in manual line breaks at about every 70 chars, not to
mention reimagining some of the standard ASCII code points as prosodic
markers. PG'ers tend to spend so much time smelling their own roses that
they forget that that which they call a text file really isn't a text file,
anymore than the contents of an html file, or of a ZML file, is a text file.

Re: [gutvol-d] a review of some digitization tools -- 018

Jim Adcock