Last Winter I did a bit of investigation into methodologies for
improving the quality of works in PG's library. Rather than write a long
report, I will skip to the conclusions.
1. The best method for getting a "better version" into PG with the least
amount of effort by any one person is:
i. Choose an edition from archive.org based on information from a
respected academic source, e.g. the "Notes on the Text" section
section of an Oxford World's Classics edition.
ii. Get DP to digitise it.
iii. Improve the DP version by comparing it against the extant PG
text. The found error rate for a DP produced text averaged 1 error
per 10 pages, a rate I consider more that worth the effort of
correcting. Access to F2 output and page images is required for this
step. Page images are nicely archived at DP, F2 is more of a
problem. It is perfectly possible and useful to find errors in
editions as different as the 1818 and 1831 editions of Frankenstein.
2. The PG errata system is utterly broken. I generated errata for the
new Huck Finn and sent it to the original contributor, who confirmed the
errors. This is errata against a known edition with known page images,
and even in this best case scenario neither I nor the original
contributor managed to get the text updated. PG should really be
considered a write once library, and all references to an errata system
should be removed from the site to avoid people wasting their time.
3. The single biggest impediment to improving quality is the PG policy
of never replacing a text or in any way advertising the existence of a
superior one. A modern DP produced text sourced from a carefully
selected archive.org print edition will almost certainly be superior.
4. Very, very few people access PG texts via PG. Normal people go to the
Kindle store or to the iTunes store. As such the greatest contribution
that PG can make is to provide a digitisation of a well regarded print
edition and get the words and punctuation right. Any sort of reasonably
sane HTML skeleton, including those produced by DP, is fine. Master
formats are not required.
* * * * *
Based on these conclusions, my recommendation:
For each major text, one of the PTB (i.e. Greg, Marcello or a WWer)
needs to select a well regarded print edition from archive.org and
commission DP to digitise it with the assurance that the extant text
will be replaced. A link to the archive.org PDF needs to be added, and
DP needs to keep page images and F2 available for anyone willing and
able to do the edition comparison; errata from such an edition
comparison needs to be prioritised.
* * * * *
And no, I don't think that there is any chance that this will happen. I
think PG is rudderless and the PTB are unwilling or incapable of taking
the sort of meaningful action required. From a user standpoint, for
"classics" I have given up on PG entirely, preferring paperbacks from a
reputable publisher, despite all the advantages that e-reading
brings. It saddens me greatly, because PG should be a great "open"
project on a level with GNU/Linux and Wikipedia. Then again, if Linus
Torvalds had mandated that no line of code could be changed in case it
upset its contributor, and Jimmy Wales had locked down updates to 4
people, those projects probably wouldn't be very good either.
Regards
Jon
I wanted to bring to your attention a proposal for
a Wikimedia grant on which I am a coauthor.
https://meta.wikimedia.org/wiki/Grants:IdeaLab/PlanetMath_Books_Project
If successful, it would help improve technology for
retrodigitizing math books. Given the considerable
collective knowledge of this listserver on this topic,
suggestions and comments would be welcome.
Raymond Puzio