[gutvol-d] Re: In search of a more-vanilla vanilla TXT

17 Sep 2009

      ...
As an example of how much author semantic information is lost going  
from an
author's writing to PG txt format, I went and compared differences  
between a
recent HTML and PG TXT I did -- where after doing the TXT encoding I  
went
back and did three more passes over the images to add back in semantic
differences to the HTML that the PG TXT didn't represent.
   The problem is that there are very few systems that truely represent
   semantic content. Inorder to truely represent such information you
   have to know about it. This requires one to have aditional information
   which is know as "world knowledge". This information is provided  by
Hi There,

Am 16.09.2009 um 08:12 schrieb James Adcock:

the
	reader of books.
...
Now the reality would be that it would take say TEI not HTML to  
represent
all of the author's intent.  But measuring the loss going from HTML  
back to
TXT gives an order of magnitude estimate of how much author  
information we
are throwing away by representing a work in PG TXT.  In the case of  
this
book, the answer was more than 1000 "losses" -- or an average of  
about 3
losses per page.  And this is NOT counting about an addition 1000  
losses in
representation of emphasis.
This problem is a matter of complexity. That is even in pure Vanilla  
Text
	one can reprensent these intentions, but one loses readablity.  
Furthermore
	one has to make assumptions of the true intent of the author!!

regards
	Keith

[gutvol-d] Re: In search of a more-vanilla vanilla TXT

Keith J. Schultz