
Hi Jim, I can not tell you why BB does it, but I might be able to explain some of cavets of approach. The first is it is hard to disassociate the markup from the text to process, process it and put it back together correctly. Why do think that MS et. al. produce such crappy code. The other is for any automatic processing you to have a known state or structure. The more you know or can make correct assumptions on the better a algorithm will work. It actually does not matter in the end what format it is. But, PG does want a simple text file and it is easy to use these "impovirshed" file. It is easy to go from these to somethinhg PG will expect than doing from a more complex layout. You have more work to do. It is always easier to from simple to more complex than from complex to simple. regards Keith. Am 14.12.2011 um 04:35 schrieb Jim Adcock:
What I am not clear about is why BB insists that what one starts from must be an "an impoverished text-file" because I never work with text files per se until I am forced to derive one at the end of my html development as a needless extra step in order to get the PG WWers to accept my html work. I do not start with an "an impoverished text-file" for the simple reason that my OCR gives me better file format choices which help preserve more of the information available in the original page images, such that I do not have to rediscover and re-enter that information again later manually -- after needlessly throwing that information away in the first place just to reduce the OCR result to txt70.
PS: I call it "txt70" for the simple reason that I wish to distinguish that what PG insists one submit is not a text file in any normal sense, anymore than ZML is a normal text file in any normal sense. At least ZML has the arguable advantage that it retains the original line breaks -- but I have shown how these can be easily rederived. And the txt70 has a PG-specific requirement to put in manual line breaks at about every 70 chars, not to mention reimagining some of the standard ASCII code points as prosodic markers. PG'ers tend to spend so much time smelling their own roses that they forget that that which they call a text file really isn't a text file, anymore than the contents of an html file, or of a ZML file, is a text file.