
Sankar Viswanathan wrote:
The final output from DP is a text. This is processed through Guiguts. Most of the Post Processors in DP use Guiguts for post processing. The html is generated from this text file.
If this is true its all the more waste. If you output a text file from the OCR and later use a human to re-create HTML this is more work than letting the OCR output the HTML directly. And all this crooked workflow is needed because PG requires a txt file for hysterical reasons. No wonder Google is eating our lunch ... they know how to put software to work instead of people.
So no additional work is involved in producing a text file.
Nice sophism. Additional work is required to produce the HTML file. So what?
Again there is no additional work in White Washing because of the text file.
I don't believe you. Working 2 files (3, maybe 4) IS more work than working one file. Even if you just open the file to see if it is the right one, its work. -- Marcello Perathoner webmaster@gutenberg.org