
jon said networker said:
Project Gutenberg e-texts satisfy none of these wishes
well, i guess networker will have to start his own project, eh? give him my best wishes! :+)
Conclusions Of course, the goal of this exercise was not to establish the provenance of the Project Gutenberg e-text of _Frankenstein_,
maybe not. but having done so, it is _refreshing_ to know that -- when that's factored in -- only "a handful" of errors surface. so once again, in spite of some very big noises, it ends up that this fails to stand as a good example of an error-ridden e-text.
nor to discover if there are any errors in the PG e-text, but to determine if there was an automated method of reducing errors in newly scanned e-books for which a Project Gutenberg e-text already exists. I'm afraid the jury is still out on this question.
as for this "conclusion", the jury may still be out in _his_ mind, but in mine, the answer is very clear, and i've said it before here: if you do the scanning properly, manipulate those scans correctly, use abbyy in the best way, and subject its results to the right tools, you will reduce the errors in your text to a relatively small number. (the number we've been kickin' around is 1 error for every 10 pages, and at that point, proofreading by the public becomes very viable.) if you then have the rare luxury of evaluating your output against an existing version of the book -- like a project gutenberg e-text -- with the right tool (which networker obviously does not yet have), the comparison between the two, alongside the page-images, should make the process of coming to an error-free version simply a breeze. since this is _exactly_ what will need to be done _increasingly_, as the page-images from the internet archive and (we hope) google -- plus the work done by individual people scanning everywhere -- emerge into cyberspace, that's where my tool-development efforts are now being focused. i suggest networker start reading my blog; it should start being updated on a daily basis starting next week... -bowerbird