On Wed, Sep 26, 2012 at 06:06:25PM +0100, Jon Hurst wrote:
On 2012-09-26, Roger wrote:
The main task is that PG would host a master scan particular to one preferred edition. Along with that master scan would be a text-based version, the RTT, marked up in some currently undefined way, ...
As mentioned, it is already FullySupportedAndEncouraged that scans be provided with eBooks. DP generally doesn't do this, but they have the scans archived (not always immediately available). I counted over 7000 "page-images" subdirectories in the PG collection, though I don't know how many are actually complete scan sets. In other words: There ARE master scans available for a number of eBooks. I recommend getting the "ls-lR" file from ftp://ftp.ibiblio.org/pub/docs/gutenberg/ls-lR to choose a title that seems to have a complete scan set.
From my perspective, the proposal has evolved to this:
1. Develop a methodology for nominating master scans where no record of provenance if extant text exists. This is PG policy creation and above the pay grade of any individual volunteer. If this is not in place, nothing further can be achieved.
This is two separate topics: 1) For many items, especially older items (pre #5000 or so), it will be challenging or impossible to identify the scan set. We didn't have a procedure to receive the scans, and didn't have the space to save them. 2) PG does not enforce adherance to a particular master (scan set, dead trees, etc.). So, for a title where there IS a scan set, there may be inconsistencies which are not errors. Already discussed. My suggestion has been to start from scratch with a new scan set, rather than trying to fix existing eBooks. It's OK to choose a title we already have. I'm going to visit B&N today to see whether they might have a suitable dead trees edition for this purpose. -- Greg
2. Source master scans and upload to PG.
3. Run master scans through DP until P2 using LOTE non-clothing exception. Capture P2 output and diff it against extant text to produce the RTT. Note that if P2 output is perfect the RTT _is_ P2 output.
4. Diff RTT against extant text to produce comprehensive errata list, and deliver to WWs so they can update the e-texts.
And that, for the moment, is the limit of my ambition. As Bowerbird points out, I have more than likely just described an impossible scenario, even though from a technical point of view it is completely trivial.
Now it has been explained to me, I understand that final formatted versions uploaded to PG _will_ be buried, and are therefore pointless. A change of policy here is not possible. In the unlikely event we reach step 4, I suggest we harvest the MS and RTT and work up final versions at FadedPage. The 1000 most popular ebooks in demonstrably higher quality has a small chance of becoming consequential, and that would be enough for me.
Step 1 requires "someone else with sufficient influence" to drive it: no single volunteer can. It is therefore unlikely to happen. Nothing else can happen until it does happen. Therefore I am reverting to lurk mode.
Step 3 requires DP buy in. What I am talking about are bog standard DP projects that happen to terminate after P2. Louise will very likely ignore any request and if pushed will block it without explanation. She will do this even though it would help balance the rounds and would likely help with retention. I personally do not believe it is worth commencing step 3 without DP support.
For the moment then, I lurk.
Cheers
Jon _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d