
Most of the post processors in D.P depend on Guiguts for post processing. More than 80% of the texts have been produced by using Guiguts. But for the availability of the Guiguts program many of the post processors would have never ventured to post process. The Guiguts program has been written for the specific purpose of post processing of DP books. It is well supported with additional programs like Gutcheck and Jeebies. Guiguts generates the html from the text automatically. Guiguts has been written taking into account the DP process. Most post processors in DP are not technical people. Again the question is what do the users want? I am talking about people who download books from PG and not producers of other formats. Most of the users download text files. Just to quote an example the text only format of Alice in Wonderland is downloaded more often than the illustrated html version. The text version is the LCM. Do we have statistics about downloading of html and text versions? I am sure most users download the text version. So even if we have put in additional effort to produce a text version it is justified. Do we have any feedback from the actual users? Letters from users who submit detailed Errata shows that the text files are being used for teaching school children in the remote areas of U.S. These are the people who make the effort worthwhile. May be it also benefits people who are still on Dial Up. Plain text can be read in any computer. HTML? With all the quirks of IE6 and other browsers it is not easy to produce html which will render perfectly in all the browsers. The earlier discussion was about whether a ASCII text is necessary? DP does produce TEI text. But there are very few post processors who can do TEI format. The main reason is the absence of a software like Guiguts. On Sat, Sep 12, 2009 at 5:34 PM, Marcello Perathoner <marcello@perathoner.de
wrote:
Sankar Viswanathan wrote:
The final output from DP is a text. This is processed through Guiguts.
Most of the Post Processors in DP use Guiguts for post processing. The html is generated from this text file.
If this is true its all the more waste.
If you output a text file from the OCR and later use a human to re-create HTML this is more work than letting the OCR output the HTML directly.
And all this crooked workflow is needed because PG requires a txt file for hysterical reasons.
No wonder Google is eating our lunch ... they know how to put software to work instead of people.
So no additional work is involved in producing a text file.
Nice sophism. Additional work is required to produce the HTML file. So what?
Again there is no additional work in White Washing because of the text
file.
I don't believe you.
Working 2 files (3, maybe 4) IS more work than working one file. Even if you just open the file to see if it is the right one, its work.
-- Marcello Perathoner webmaster@gutenberg.org _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
-- Sankar Service to Humanity is Service to God