
On 10/24/2011 2:19 PM, Bowerbird@aol.com wrote:
but for heaven's sake, if you feel that you _must_ save the data, then at least have the good sense to store it in a separate file...
Ahh, but what he provided me /was/ the separate file. /Your/ file is http://ia700600.us.archive.org/16/items/artofbook00holm/artofbook00holm_djvu.... There it is, the text, the whole text, and nothing but the text. I'm just being a little more demanding. What /I/ want is the output from FineReader as though the "Save as HTML" option was selected, with all the markup that FineReader was able to intuit, together with information about line breaks, page breaks and soft hyphens, but without any of the geometry data. Now I believe that no file is really good enough to be published without some human attention and refinement. So my next "demand" would be for this FineReader HTML output to be placed in an environment where it /could/ be refined; at this point the autogenerated formats, such as ePub and Kindle, should be generated from the refined file, not the raw file. Like you, I have no expectation that IA will create any kind of environment where human "beans" can refine the texts. But by giving us the output of the PHP script, it should now be possible to off-load the refinement and publication of digital texts to a third-party organization. I think it should be fairly easy to set up a rudimentary system to do this. Does anyone want to furnish me a *nix server with a fat pipe?