
On Fri, October 12, 2012 4:47 pm, don kretz wrote:
One wonders how many historically significant books are being irretrievably lost in the destruction and violence in Syria and Egypt and Libya, literarily [sic] among the most historically active areas in the world.
I do not believe that it is now, or ever has been, the mission of Project Gutenberg to preserve world literature. According to the web site, the mission of Project Gutenberg (written in Michael Hart's inimitable style) is: "To encourage the creation and distribution of eBooks." There is nothing in the explanatory text surrounding this mission statement that suggests that preservation plays any part in PG's mission, although to be fair there is much in Mr. Hart's statement that is at odds with the current practices of Project Gutenberg. It seems to me that the mission of Project Gutenberg has nothing to do with the /preservation/ of literary works and everything to do with the /popularization/ and /accessibility/ of those works. And while Mr. Hart never said, "we encourage our volunteers to furnish us with as many rare texts as they can," he did say, "[W]e are happy to bring eBooks to our readers in as many formats as our volunteers wish to make.... [P]eople are still encouraged to send us eBooks in any format and at any accuracy level and we will ask for volunteers to convert them to other formats, and to incrementally correct errors as times goes on." I have come to believe that when Mr. Hart started Project Gutenberg on the donated mainframe time he understood the potential of storing mass amounts of text on computers, but he did not understand the transformative power of computing. He understood the power of the hard drive, but not the power of the CPU. Thus, when he first started placing text into storage instead of using a rich format that could be transformed into the Format Of Any Day, he chose to carefully, manually transform each text directly into the Format Of His Day, which in that day was 80 character lines, ASCII-only text, suitable for use on the VT52 terminal. Over time, the Format Of The Day has changed, but given the difficulty of up-converting VT52 format to more modern formats, and the fact that most modern operating systems can still display VT52 text files, however badly, most PG texts have remained in their original, sorry state. This state of affairs has persisted for so long that most of the PG old-timers see it as being not only normal, but desirable. Most of the complaints now leveled at PG are not that the archive is too incomplete, but that the contents of the archive are so visually unappealing as to be unusable. Thus, the true mission of Project Gutenberg, "[t]o encourage the creation and distribution of eBooks," is now no longer being satisfied. So, the first advice /I/ would give to someone wanting to volunteer at Project Gutenberg is to start by learning how to create an electronic book from an existing file (a tutorial to this effect should be created). Then s/he should practice what s/he has learned by taking an existing PG file that s/he is interested in, and which sucks, and make it suck less. The text can then be returned to PG for the kind of incremental update that Mr. Hard envisioned. This kind of approach provides a gentle introduction to the creation of e-texts. You can take an existing text, see how someone else has done it, see where the mistakes are, fix a few simple mistakes, check it in to PG, see what kind of feedback you get, fix more mistakes, take on a more challenging text, and so on until you're comfortable with markup. Then, go get some OCR'ed text from IA or Google, fix that up and check that in. When you finally get around to doing OCR yourself, you have all the underlying knowledge to fix up your own OCR. Stay in the shallow end until you learn how to tread water, and do not use the high dive until you are an expert swimmer.