[PGCanada] Status of Mothership and DP-EU (and even DP-CAN)

Jon Noring jon at noring.name
Mon Jul 18 11:00:55 PDT 2005


Michael Lockey wrote:
> Andrew Sly wrote:
  
>> I've been around long enough
>> that I can honestly say the quality of submissions from DP is much
>> higher than when individuals were doing all their own work, with no
>> checks except ad hoc.  Naithless, PSTB have determined that quality and
  
> Interesting argument. One that I might disagree with on some levels.
>
> Some books, prepared by some people for PG have had a truly exceptional
> quality, without going through the DP process.

Definitely. Some of the PG texts I've used for republishing (where I
have to carefully go over the transcription quality) have been
outstanding -- very low error rates and faithfulness to the source
(which I borrow to compare the text to.) But then I've had the dog as
well where the text is full of errors, and there's other troubling
anomalies such as reworked paragraphs and so on.

To add to this, NetWorker recently determined that the text of PG's
'Frankenstein' is pretty much identical to that of a modern edited edition
issued by Bantam (ca. 1980 if I remember correctly), and "conveniently"
there are no original page scans to verify this -- and there is no record
of what was the source book, as was Michael Hart's policy up until
recently. (His policy a few years ago was to specifically NOT allow
detailed source information to be included with the texts, with certain
exceptions granted.)

However, what is not talked about much is that DP is a more open and
public system. That is, DP is not relying on one scribe working alone,
whose labor has no public oversight. Rather, the proofing at DP is done
in the open, among a group of people who otherwise have no
relationship with each other, and no vested interest in a particular
outcome. The perception of trustworthiness and faithfulness this adds
to the final work product cannot be underestimated, especially when
markets such as education, public libraries, and scholars, wish to
use the texts. In addition, DP is preserving the page scans, and in my
private conversations with Juliet, will eventually make them public.
This adds to the faithfulness aspect, and allows others to know for
certain the source book, and have the ability to verify textual
faithfulness. It is a show of "good will" and confidence the work was
done right: "we stand behind our work and proudly provide the page
scans as proof of the trustworthiness of our work" (of course, there
are also tangible benefits, besides perception alone, to providing
the page scans, but I'm focusing on the perception of trust aspect.)

Most of the pre-DP texts do not have scans to back them up (many were
typed in so no scans were made), and for most of these texts the source
book is not even referenced. Note again what I mentioned above about
'Frankenstein' -- NetWorker analyzed only one book this way, and had a
100% hit -- one wonders how many other pre-DP PG texts were taken from
modern edited editions and nobody knows about it except the transcriber?
Will they ever talk to reveal their sources? Unlikely.

I hope that PG Canada will embrace the DP system or something similar
(where, in a public setting, multiple people are involved in proofing
and checking each other's work), and that the page scans from the
original source are made publicly available to proudly sit side-by-side
the structured digital text versions.

Let me state it clearly that what I say above is not intended to
disparage those who do work alone -- most of them are diligent and
committed to textual fidelity. They produce great work. But the
problem is, how in 50-100 years does anyone know a particular text,
done alone this way, is textually faithful to the original? One could
say "John Scribe did this book, and you can trust his work". Well, the
obvious question 50-100 years from now is "Who is Joe Scribe? Show me
the original page scans, and was that book transcribed by a group of
people working in public who checked each other's work?"

One of the books I publish is the Kama Sutra of Vatsyayana, which was
written ca. 3rd century CE in India. Unfortunately, only a few copies
exist, and none of them are complete. In addition, comparing these
copies with each other we find huge difference between them. Sir
Richard Burton had to combine some source copies he had available to
him, and using textual analysis, his legendary linguistic ability,
and historical knowledge, try to discern what the original text must
have been. It is obvious that over the centuries, as scribes copied
the work, that the scribes introduced their own changes to the text
(of course in some cases there had to be "modernization" of the
language as the Hindi language evolved over the centuries.) Because
there was no dedication to faithful preservation, we no longer have
the original Kama Sutra. It is gone forever.

Today we have a Public Domain, and it's the only one we have, and any
group digitizing the Public Domain should commit themselves to textual
fidelity, and setting up the process that will not only achieve this,
but to show others that indeed textual fidelity to the source was
achieved, and can be demonstrated. Technology now allows us to achieve
this transparency.


>  Please; I would just like to make it clear that much excellent
> work HAS been, and continues to be, published by individuals,

Agreed!

I've done some texts myself a few years ago for issuance as commercial
books, having typed them in by hand, and proofed. I've slaved over these
texts (both by eye and various search tools) and I know the error rate
is *very low*. Yet, because I did them myself with no public oversight,
and did not do any page scans, I will not release the texts to PG
because they were not done using a trustworthy process, even though *I
know* they are very faithful and done well. One of the texts, Burton's
"Kama Sutra" (see above), I recently scanned at high quality, and the
scans have been submitted to DP for proofing, which I hope will soon
commence. The text that results *will* be much more trustworthy and
worthy of public preservation than mine, even though I know mine is
textually faithful. I hope this makes sense in that I practice what I
preach, and will not submit digital texts of Public Domain works
without taking steps to assure trustworthiness (and since DP is there,
that's where I presently will submit them.)

I trust my work, but you should NOT trust my work. And if you tell me
you cannot trust my work by *how* I did it, I will not be offended,
and neither should those who also transcribed texts alone. If they
are offended, I don't believe they yet understand the larger issues
involved. It is understandable they may not want to understand, after
having put in hundreds and even thousands of hours of hard labor into
their lone proofing activities.


> and
> even works of lesser quality can be the equivalent of (so-called)
> published books.  I don't think I've ever seen a book without
> errata, and some- including best sellers- can be appalling.  It'll
> be interesting to see how Mr. Potter fares this time...  No; I'm
> talking about the overall performance; and not making any
> comparisons.  Our mandate must simply include continuous improvement.

Definitely! Having the scans available to the public is an important
component since it allows rapid determination if there is a
transcription error.

Michael Hart now wants to "correct" the errors found in many of the
non-DP texts. Good luck. There's no source information recorded (so no
source to consult with), and no scans to consult. How can one correct
errors when one doesn't have the source to correct to? Some errors can
be corrected, but others cannot. What about "Huck Finn" where
misspellings are the norm? And what about broken up and run together
paragraphs and even sentences? (Not to mention missing sentences and
paragraphs due to transcribing errors -- how will these be found
without painstaking comparison to a printed edition?) And what about
"Americanized" British spelling? The list goes on and on.

I've stated many times that PG should simply redo the older texts
through the DP system, and this time find an authoritative source
copy, and get the page scans online. Problem solved.

(Regarding "authoritative", let me give another example. Burton
published another book contemporaneous to the "Kama Sutra", the
"Perfumed Garden of Sheik Nefzaoui", published about 1885. This book
has proven to be quite rare, and not found in many libraries. Around
1913 (or so), a pirated reissue was published. Because of the fear of
prosecution or simple disgust, a whole chapter was removed, and a few
paragraphs elsewhere removed. So the 1913 issue is a censored copy.
Unfortunately, that censored pirated copy was widely sold, and is
the dominant edition found in academic libraries today. Well, guess
what edition was used for the online text versions of the "Perfumed
Garden"? You got it right -- it is the censored, pirated edition. What
is scary is that those who put this text online did not know this, and
even after I told them about it, they didn't care -- they didn't even
bother to put up a disclaimer or to say it comes from a later censored
pirated reissue. And this is the text that is being spread in every
corner of the Internet, on mirrored hard drives and backup media all
around the world. Unless something is done, in 50-100 years when one
searches for Burton's "Perfumed Garden", the censored edition will be
the one brought up, and the end-user will not even know it is the
censored edition -- they will take it at face value that it is the
original. (This "face value" acceptance is what worries me about the PG
collection.) I'm working to try to rectify this, but the lesson is
that those who digitize public domain texts need to do a little study
on each work being digitized, to assure the source(s) they use is/are
reasonably authoritative, and of course to record the source metadata
along with making the page scans public. There are lots of scholars
out there who know bibliographic information about particular
works who would be glad to advise. Do your homework, post on the
Internet to ask for help, and learn a little about each work before
going through the digitization process...)


> DP is, I assume, aware that folk with enthusiasm will be more
> involved with a work.  Upcoming books are announced at several
> stages of the process so that anyone with interest can grab stuff
> that excites them, both for processing and whitewashing.  (I have
> just grabbed The Old Coast Road, since I'm very interested in
> historic transportation.)  By definition, proofers will, when
> possible, go for what they enjoy.

Definitely! And for particular book topics, one can go outside to
various forums and quickly recruit volunteers. There now seems to be
forums for just about every subject and author out there. DP has not
done this because it doesn't have to at this time (they are working at
their limit), but I see PG Canada doing this for particular books.
What about an old book on the history of Barrie? (a hypothetical
example). Just contact the Barrie historical group(s) and other Barrie
organizations, and voile', a bunch of people will show up at the door
to help with basic and mundane proofing. Provide a convenient online
interface to assist with the proofing (like DP does), and one will be
able to find willing and enthusiastic volunteers for just about every
book under the sun.


Just my $0.02 worth.

Jon




More information about the PGCanada mailing list