Michael Lockey wrote:
Andrew Sly wrote:
I've been around long enough that I can honestly say the quality of submissions from DP is much higher than when individuals were doing all their own work, with no checks except ad hoc. Naithless, PSTB have determined that quality and
Interesting argument. One that I might disagree with on some levels.
Some books, prepared by some people for PG have had a truly exceptional quality, without going through the DP process.
Definitely. Some of the PG texts I've used for republishing (where I have to carefully go over the transcription quality) have been outstanding -- very low error rates and faithfulness to the source (which I borrow to compare the text to.) But then I've had the dog as well where the text is full of errors, and there's other troubling anomalies such as reworked paragraphs and so on. To add to this, NetWorker recently determined that the text of PG's 'Frankenstein' is pretty much identical to that of a modern edited edition issued by Bantam (ca. 1980 if I remember correctly), and "conveniently" there are no original page scans to verify this -- and there is no record of what was the source book, as was Michael Hart's policy up until recently. (His policy a few years ago was to specifically NOT allow detailed source information to be included with the texts, with certain exceptions granted.) However, what is not talked about much is that DP is a more open and public system. That is, DP is not relying on one scribe working alone, whose labor has no public oversight. Rather, the proofing at DP is done in the open, among a group of people who otherwise have no relationship with each other, and no vested interest in a particular outcome. The perception of trustworthiness and faithfulness this adds to the final work product cannot be underestimated, especially when markets such as education, public libraries, and scholars, wish to use the texts. In addition, DP is preserving the page scans, and in my private conversations with Juliet, will eventually make them public. This adds to the faithfulness aspect, and allows others to know for certain the source book, and have the ability to verify textual faithfulness. It is a show of "good will" and confidence the work was done right: "we stand behind our work and proudly provide the page scans as proof of the trustworthiness of our work" (of course, there are also tangible benefits, besides perception alone, to providing the page scans, but I'm focusing on the perception of trust aspect.) Most of the pre-DP texts do not have scans to back them up (many were typed in so no scans were made), and for most of these texts the source book is not even referenced. Note again what I mentioned above about 'Frankenstein' -- NetWorker analyzed only one book this way, and had a 100% hit -- one wonders how many other pre-DP PG texts were taken from modern edited editions and nobody knows about it except the transcriber? Will they ever talk to reveal their sources? Unlikely. I hope that PG Canada will embrace the DP system or something similar (where, in a public setting, multiple people are involved in proofing and checking each other's work), and that the page scans from the original source are made publicly available to proudly sit side-by-side the structured digital text versions. Let me state it clearly that what I say above is not intended to disparage those who do work alone -- most of them are diligent and committed to textual fidelity. They produce great work. But the problem is, how in 50-100 years does anyone know a particular text, done alone this way, is textually faithful to the original? One could say "John Scribe did this book, and you can trust his work". Well, the obvious question 50-100 years from now is "Who is Joe Scribe? Show me the original page scans, and was that book transcribed by a group of people working in public who checked each other's work?" One of the books I publish is the Kama Sutra of Vatsyayana, which was written ca. 3rd century CE in India. Unfortunately, only a few copies exist, and none of them are complete. In addition, comparing these copies with each other we find huge difference between them. Sir Richard Burton had to combine some source copies he had available to him, and using textual analysis, his legendary linguistic ability, and historical knowledge, try to discern what the original text must have been. It is obvious that over the centuries, as scribes copied the work, that the scribes introduced their own changes to the text (of course in some cases there had to be "modernization" of the language as the Hindi language evolved over the centuries.) Because there was no dedication to faithful preservation, we no longer have the original Kama Sutra. It is gone forever. Today we have a Public Domain, and it's the only one we have, and any group digitizing the Public Domain should commit themselves to textual fidelity, and setting up the process that will not only achieve this, but to show others that indeed textual fidelity to the source was achieved, and can be demonstrated. Technology now allows us to achieve this transparency.
Please; I would just like to make it clear that much excellent work HAS been, and continues to be, published by individuals,
Agreed! I've done some texts myself a few years ago for issuance as commercial books, having typed them in by hand, and proofed. I've slaved over these texts (both by eye and various search tools) and I know the error rate is *very low*. Yet, because I did them myself with no public oversight, and did not do any page scans, I will not release the texts to PG because they were not done using a trustworthy process, even though *I know* they are very faithful and done well. One of the texts, Burton's "Kama Sutra" (see above), I recently scanned at high quality, and the scans have been submitted to DP for proofing, which I hope will soon commence. The text that results *will* be much more trustworthy and worthy of public preservation than mine, even though I know mine is textually faithful. I hope this makes sense in that I practice what I preach, and will not submit digital texts of Public Domain works without taking steps to assure trustworthiness (and since DP is there, that's where I presently will submit them.) I trust my work, but you should NOT trust my work. And if you tell me you cannot trust my work by *how* I did it, I will not be offended, and neither should those who also transcribed texts alone. If they are offended, I don't believe they yet understand the larger issues involved. It is understandable they may not want to understand, after having put in hundreds and even thousands of hours of hard labor into their lone proofing activities.
and even works of lesser quality can be the equivalent of (so-called) published books. I don't think I've ever seen a book without errata, and some- including best sellers- can be appalling. It'll be interesting to see how Mr. Potter fares this time... No; I'm talking about the overall performance; and not making any comparisons. Our mandate must simply include continuous improvement.
Definitely! Having the scans available to the public is an important component since it allows rapid determination if there is a transcription error. Michael Hart now wants to "correct" the errors found in many of the non-DP texts. Good luck. There's no source information recorded (so no source to consult with), and no scans to consult. How can one correct errors when one doesn't have the source to correct to? Some errors can be corrected, but others cannot. What about "Huck Finn" where misspellings are the norm? And what about broken up and run together paragraphs and even sentences? (Not to mention missing sentences and paragraphs due to transcribing errors -- how will these be found without painstaking comparison to a printed edition?) And what about "Americanized" British spelling? The list goes on and on. I've stated many times that PG should simply redo the older texts through the DP system, and this time find an authoritative source copy, and get the page scans online. Problem solved. (Regarding "authoritative", let me give another example. Burton published another book contemporaneous to the "Kama Sutra", the "Perfumed Garden of Sheik Nefzaoui", published about 1885. This book has proven to be quite rare, and not found in many libraries. Around 1913 (or so), a pirated reissue was published. Because of the fear of prosecution or simple disgust, a whole chapter was removed, and a few paragraphs elsewhere removed. So the 1913 issue is a censored copy. Unfortunately, that censored pirated copy was widely sold, and is the dominant edition found in academic libraries today. Well, guess what edition was used for the online text versions of the "Perfumed Garden"? You got it right -- it is the censored, pirated edition. What is scary is that those who put this text online did not know this, and even after I told them about it, they didn't care -- they didn't even bother to put up a disclaimer or to say it comes from a later censored pirated reissue. And this is the text that is being spread in every corner of the Internet, on mirrored hard drives and backup media all around the world. Unless something is done, in 50-100 years when one searches for Burton's "Perfumed Garden", the censored edition will be the one brought up, and the end-user will not even know it is the censored edition -- they will take it at face value that it is the original. (This "face value" acceptance is what worries me about the PG collection.) I'm working to try to rectify this, but the lesson is that those who digitize public domain texts need to do a little study on each work being digitized, to assure the source(s) they use is/are reasonably authoritative, and of course to record the source metadata along with making the page scans public. There are lots of scholars out there who know bibliographic information about particular works who would be glad to advise. Do your homework, post on the Internet to ask for help, and learn a little about each work before going through the digitization process...)
DP is, I assume, aware that folk with enthusiasm will be more involved with a work. Upcoming books are announced at several stages of the process so that anyone with interest can grab stuff that excites them, both for processing and whitewashing. (I have just grabbed The Old Coast Road, since I'm very interested in historic transportation.) By definition, proofers will, when possible, go for what they enjoy.
Definitely! And for particular book topics, one can go outside to various forums and quickly recruit volunteers. There now seems to be forums for just about every subject and author out there. DP has not done this because it doesn't have to at this time (they are working at their limit), but I see PG Canada doing this for particular books. What about an old book on the history of Barrie? (a hypothetical example). Just contact the Barrie historical group(s) and other Barrie organizations, and voile', a bunch of people will show up at the door to help with basic and mundane proofing. Provide a convenient online interface to assist with the proofing (like DP does), and one will be able to find willing and enthusiastic volunteers for just about every book under the sun. Just my $0.02 worth. Jon