
The other day I noticed that DP has recently uploaded a reworking of Frankenstein (#41445), the 12th most downloaded "work" on PG. I am assuming that it wasn't cross-checked against #84, so I've started the process of doing this. I've done the quick and easy bit and now have the text rewrapped to the images. I would like to get these files uploaded to #41445's fileset - they are sitting at: https://github.com/JonHurst/frankenstein If you want to just get a zip file rather than use git use: https://github.com/JonHurst/frankenstein/archive/master.zip I'm not sure what the protocol for doing this is, beyond it being "supported and encouraged" to have the scans present in the fileset of a given "expression" -- Greg, please let me know what you would like to do. I've also taken the opportunity to knock up a very simplistic viewer interface to help errata reporters etc. This is at: http://www.hursts.eclipse.co.uk/viewer.html Make sure it has finished loading before you try to view a page image or you will end up opening the image in a new tab rather than next to the text. As I said, it's very simplistic. If there is some way that PG could host a more advanced version of this viewer it would certainly make checking errata a lot easier. I am intending to do the cross-checking with #84 at some point over the next month or so, so I'll send any errata once I'm done. I would also quite like to do a LaTeX version in order to produce A5 and Kindle sized PDFs with decent typography -- would PG be willing to host these? Regards Jon

On 11/26/2012 10:22 AM, Jon Hurst wrote: [snip]
I am intending to do the cross-checking with #84 at some point over the next month or so, so I'll send any errata once I'm done.
I wouldn't bother. The PG version of Frankenstein has been the subject of probably more analysis than any other work stored at PG. The inescapable conclusion, is that the PG version is a true Frankentext, stitched together from at least two, possibly more, extant version of Mary Shelly's Frankenstein, at least one of which is still probably under copyright in the U.S. (Mary Shelly herself published two separate versions). See http://groups.yahoo.com/group/ebook-community/message/22105 and links cited therein. Interestingly, the version of e-text #84 referenced in that message (from 2007) was number 16. The current version has no version, but was uploaded in 2009. The latest prior version in the /old folder is 15, uploaded in 2005. It would appear that someone (Al Haines, who credits himself as being the author of the HTML version?) replaced version 16 without moving a backup of that version to the /old folder. The Wayback Machine at archive.org may still have a copy; otherwise, that little nugget of history may have been lost. Thus, it will likely be impossible to reconcile e-text #41445 with #84, particularly given the fact that #41445 was apparently derived from a single scanned edition. My suggestion is to use #41445 as the basis for your preferred typography, and to ignore the very existence of #84. I would also suggest that as you move forward with improving #41445 that you regularly check in interim versions to your account on github.com so that /that/ history at least can be preserved. Another simplistic page viewer (which needs some serious debugging) for Frankenstein can be found at http://www.ebookcoop.net/ebookcoop/ (registration required, but any bogus data will do because I'm not yet doing any validation). So far, only seems to work on Firefox, and I haven't taken the time to figure out the vagaries of other browsers.

On 2012-11-28, Lee Passey wrote:
On 11/26/2012 10:22 AM, Jon Hurst wrote:
I am intending to do the cross-checking with #84 at some point over the next month or so, so I'll send any errata once I'm done.
I wouldn't bother. . . .
By cross-checking I mean using #84 to polish #41445, not the other way round; I agree #84 is, as Don puts it, unassailable. I haven't done any work on #41445 yet beyond wrapping the submitted utf-8 file using P3 as a template. I really just wanted to harvest this particularly low hanging and tasty fruit before it disappeared from DP, and I want to get it into the #41445 fileset to make sure #41445 does not itself become unassailable.
Another simplistic page viewer (which needs some serious debugging) for Frankenstein can be found at http://www.ebookcoop.net/ebookcoop/ (registration required, but any bogus data will do because I'm not yet doing any validation). So far, only seems to work on Firefox, and I haven't taken the time to figure out the vagaries of other browsers.
The viewer was just a 10 minute hack so Greg could see what files I was wanting to upload. I'm not for one second proposing that PG uses it! A scan versus "line synched text" viewer would be a very nice thing to have at PG -- it would certainly save a lot of time hunting for errata references -- but kind of a lot would have to happen before that could become even a vague possibility. On 2012-11-28, Greg Newby wrote:
I'm attaching the posted message (hopefully it will pass through the email distribution intact).
Thanks Greg. I'll email Al and see what he wants to do. Regards Jon

The viewer was just a 10 minute hack so Greg could see what files I was wanting to upload. I'm not for one second proposing that PG uses it! A scan versus "line synched text" viewer would be a very nice thing to have at PG -- it would certainly save a lot of time hunting for errata references -- but kind of a lot would have to happen before that could become even a vague
possibility. Abby has a dev library version of their scanner version 10 which has more interesting interface options which would be more useful for PG/DP type work. They are reluctant to turn over dev library info, unfortunately.

On Mon, Nov 26, 2012 at 05:22:33PM +0000, Jon Hurst wrote:
The other day I noticed that DP has recently uploaded a reworking of Frankenstein (#41445), the 12th most downloaded "work" on PG. I am assuming that it wasn't cross-checked against #84, so I've started the process of doing this.
I've done the quick and easy bit and now have the text rewrapped to the images. I would like to get these files uploaded to #41445's fileset - they are sitting at:
https://github.com/JonHurst/frankenstein
If you want to just get a zip file rather than use git use:
https://github.com/JonHurst/frankenstein/archive/master.zip
I'm not sure what the protocol for doing this is, beyond it being "supported and encouraged" to have the scans present in the fileset of a given "expression" -- Greg, please let me know what you would like to do.
You're saying you have the page scans that were used to make #41445, but they were not provided as part of the original file upload? Usually, scans come from whoever submitted the rest of the book. The only trick is there is a naming convention for the filenames. Did you already do that (sorry, I did not look at the files mentioned above)? There is a page at www.pgdp.net (somewhere) that describes the naming convention. I have it somewhere, too. If so, just get in touch with the WWer who posted the files, to arrange delivery. If you don't know who this is, or none of this makes sense, email me directly or email pgww@lists.pglaf.org Thanks! -- Greg
I've also taken the opportunity to knock up a very simplistic viewer interface to help errata reporters etc. This is at:
http://www.hursts.eclipse.co.uk/viewer.html
Make sure it has finished loading before you try to view a page image or you will end up opening the image in a new tab rather than next to the text. As I said, it's very simplistic. If there is some way that PG could host a more advanced version of this viewer it would certainly make checking errata a lot easier.
I am intending to do the cross-checking with #84 at some point over the next month or so, so I'll send any errata once I'm done. I would also quite like to do a LaTeX version in order to produce A5 and Kindle sized PDFs with decent typography -- would PG be willing to host these?
Regards
Jon _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On 11/28/2012 10:20 AM, Greg Newby wrote:
If so, just get in touch with the WWer who posted the files, to arrange delivery. If you don't know who this is, or none of this makes sense, email me directly or email pgww@lists.pglaf.org
How does one discover the identity of the WWer who posted any file?

On Wed, Nov 28, 2012 at 10:38:23AM -0700, Lee Passey wrote:
On 11/28/2012 10:20 AM, Greg Newby wrote:
If so, just get in touch with the WWer who posted the files, to arrange delivery. If you don't know who this is, or none of this makes sense, email me directly or email pgww@lists.pglaf.org
How does one discover the identity of the WWer who posted any file?
They announce the posting to posted@lists.pglaf.org. The list & archives are open: http://lists.pglaf.org I'm attaching the posted message (hopefully it will pass through the email distribution intact). -- Greg

On 11/28/2012 11:08 AM, Greg Newby wrote:
On Wed, Nov 28, 2012 at 10:38:23AM -0700, Lee Passey wrote:
How does one discover the identity of the WWer who posted any file?
They announce the posting to posted@lists.pglaf.org. The list & archives are open: http://lists.pglaf.org
This is great information. Can the list be made searchable, so if I have only the e-text number I can find the associated posting (or reposting) message? When a work is reposted, is the prior version retained for historical purposes?

Below...
-----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Lee Passey Sent: Wednesday, November 28, 2012 10:44 AM To: gutvol-d@lists.pglaf.org Subject: Re: [gutvol-d] Frankenstein
On 11/28/2012 11:08 AM, Greg Newby wrote:
On Wed, Nov 28, 2012 at 10:38:23AM -0700, Lee Passey wrote:
How does one discover the identity of the WWer who posted any file?
They announce the posting to posted@lists.pglaf.org. The list & archives are open: http://lists.pglaf.org
This is great information. Can the list be made searchable, so if I have only the e-text number I can find the associated posting (or reposting) message?
Posting notes are archived by month/year. They're ordered by the date in which they were sent by the WWers, which is more or less in numerical order. There are two exceptions: when etext numbers are reserved for future use and when an old, pre10K, etext is reposted. When a pre10K etext is reposted, its repost note appears in the month in which the reposting WWer sent the repost note. Reservations occur when numbers are prerequested, usually by DP, for a multi-volume set with cross-links. Such reserved numbers may not actually get used/posted for some time, i.e. they might be requested in June, but not actually be submitted/posted until several months later, depending on how long it takes to post-process the various volumes of the set. Posting notes for reserved numbers will appear in the month of posting, not the month of request.
When a work is reposted, is the prior version retained for historical purposes?
Yes. Etext #84 is a typical example. Al
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol> -d

On 11/28/2012 1:15 PM, Al Haines wrote:
On Wed, Nov 28, 2012 at 10:38:23AM -0700, Lee Passey wrote:
This is great information. Can the list be made searchable, so if I have only the e-text number I can find the associated posting (or reposting) message?
Posting notes are archived by month/year. They're ordered by the date in which they were sent by the WWers, which is more or less in numerical order. There are two exceptions: when etext numbers are reserved for future use and when an old, pre10K, etext is reposted.
When a pre10K etext is reposted, its repost note appears in the month in which the reposting WWer sent the repost note.
Reservations occur when numbers are prerequested, usually by DP, for a multi-volume set with cross-links. Such reserved numbers may not actually get used/posted for some time, i.e. they might be requested in June, but not actually be submitted/posted until several months later, depending on how long it takes to post-process the various volumes of the set. Posting notes for reserved numbers will appear in the month of posting, not the month of request.
A beautiful example of a non sequitur. The question was not "how is the list archive structured?" the question was "can it be made searchable so messages involving a specific e-text number can be found?" Of course I can download the zip file for every month of each of the past 3 years (apparently the limit of the archive), extract each file from the gzip/tar, and then grep for the number, but a web-based interface would be more user friendly.
When a work is reposted, is the prior version retained for historical purposes?
Yes. Etext #84 is a typical example.
Great. Can you point me to version 16 which existed in 2007, and which is a (the?) precursor to the current version? Were there other versions between 16 and the current version?

The various "frankNN.txt" versions ended with frank15.txt. To my knowledge, PG never had a published "version 16", i.e. a file named frank16.txt. When I reposted frank15.txt into the post10K structure in 2008, I made a copy of it named frank16.txt strictly as a local working file. That frank16.txt file, and the frank16.htm file I generated from it, were then processed by PG's posting software to create the published 84.txt and 84-h.htm fileset. If there's another frank16.txt file out there somewhere, it didn't come from me.
-----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Lee Passey Sent: Wednesday, November 28, 2012 12:50 PM To: gutvol-d@lists.pglaf.org Subject: Re: [gutvol-d] Frankenstein
On 11/28/2012 1:15 PM, Al Haines wrote:
On Wed, Nov 28, 2012 at 10:38:23AM -0700, Lee Passey wrote:
This is great information. Can the list be made searchable, so if I have only the e-text number I can find the associated posting (or reposting) message?
Posting notes are archived by month/year. They're ordered by the date in which they were sent by the WWers, which is more or less in numerical order. There are two exceptions: when etext numbers are reserved for future use and when an old, pre10K, etext is reposted.
When a pre10K etext is reposted, its repost note appears in the month in which the reposting WWer sent the repost note.
Reservations occur when numbers are prerequested, usually by DP, for a multi-volume set with cross-links. Such reserved numbers may not actually get used/posted for some time, i.e. they might be requested in June, but not actually be submitted/posted until several months later, depending on how long it takes to post-process the various volumes of the set. Posting notes for reserved numbers will appear in the month of posting, not the month of request.
A beautiful example of a non sequitur. The question was not "how is the list archive structured?" the question was "can it be made searchable so messages involving a specific e-text number can be found?" Of course I can download the zip file for every month of each of the past 3 years (apparently the limit of the archive), extract each file from the gzip/tar, and then grep for the number, but a web-based interface would be more user friendly.
When a work is reposted, is the prior version retained for historical purposes?
Yes. Etext #84 is a typical example.
Great. Can you point me to version 16 which existed in 2007, and which is a (the?) precursor to the current version? Were there other versions between 16 and the current version?
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol> -d

On Wed, Nov 28, 2012 at 01:50:22PM -0700, Lee Passey wrote:
On 11/28/2012 1:15 PM, Al Haines wrote:
On Wed, Nov 28, 2012 at 10:38:23AM -0700, Lee Passey wrote:
This is great information. Can the list be made searchable, so if I have only the e-text number I can find the associated posting (or reposting) message?
Posting notes are archived by month/year. They're ordered by the date in which they were sent by the WWers, which is more or less in numerical order. There are two exceptions: when etext numbers are reserved for future use and when an old, pre10K, etext is reposted.
When a pre10K etext is reposted, its repost note appears in the month in which the reposting WWer sent the repost note.
Reservations occur when numbers are prerequested, usually by DP, for a multi-volume set with cross-links. Such reserved numbers may not actually get used/posted for some time, i.e. they might be requested in June, but not actually be submitted/posted until several months later, depending on how long it takes to post-process the various volumes of the set. Posting notes for reserved numbers will appear in the month of posting, not the month of request.
A beautiful example of a non sequitur. The question was not "how is the list archive structured?" the question was "can it be made searchable so messages involving a specific e-text number can be found?" Of course I can download the zip file for every month of each of the past 3 years (apparently the limit of the archive), extract each file from the gzip/tar, and then grep for the number, but a web-based interface would be more user friendly.
We're using the packaged version of Mailman mailing list software, as delivered by Ubuntu Linux. This doesn't include a search capability. You don't need to download the monthly gzip'd text, though. Since they're mostly in order, as Al mentioned, you can just take a few guesses to narrow in (sort of a binary search). I remember there were some plugins to Mailman that would make lists searchable, but don't recall their names to look for them. If anyone else remembers, I could take a stab at adding a search functionality. It might be that some of the mailing lists meta-archivers have a searchable interface, but I couldn't find one just now. All that said: if you are just looking for "who posted this, and when," feel free to ask me, or email pgww@lists.pglaf.org, and I can dig it up from my archives. -- Greg

On Thu, November 29, 2012 12:48 am, Greg Newby wrote:
It might be that some of the mailing lists meta-archivers have a searchable interface, but I couldn't find one just now.
It should be a trivial matter to add "posted" to gmane.org and mail-archive.com, then search requests can be handled by those sites. May I have your permission to do so?

On Thu, Nov 29, 2012 at 10:02:18AM -0700, Lee Passey wrote:
On Thu, November 29, 2012 12:48 am, Greg Newby wrote:
It might be that some of the mailing lists meta-archivers have a searchable interface, but I couldn't find one just now.
It should be a trivial matter to add "posted" to gmane.org and mail-archive.com, then search requests can be handled by those sites.
May I have your permission to do so?
Sure, thanks. -- Greg

On Thu, November 29, 2012 10:35 am, Greg Newby wrote:
Sure, thanks.
Done. I added your e-mail address for confirmation. See http://gmane.org/import.php for information on how to add list archives to gmane. They prefer files in mbox format, which I presume you can get a hold of.

On Wed, Nov 28, 2012 at 12:20 PM, Greg Newby <gbnewby@pglaf.org> wrote:
On Mon, Nov 26, 2012 at 05:22:33PM +0000, Jon Hurst wrote:
The other day I noticed that DP has recently uploaded a reworking of Frankenstein (#41445), the 12th most downloaded "work" on PG. I am assuming that it wasn't cross-checked against #84, so I've started the process of doing this.
I've done the quick and easy bit and now have the text rewrapped to the images. I would like to get these files uploaded to #41445's fileset - they are sitting at:
https://github.com/JonHurst/frankenstein
If you want to just get a zip file rather than use git use:
https://github.com/JonHurst/frankenstein/archive/master.zip
I'm not sure what the protocol for doing this is, beyond it being "supported and encouraged" to have the scans present in the fileset of a given "expression" -- Greg, please let me know what you would like to do.
You're saying you have the page scans that were used to make #41445, but they were not provided as part of the original file upload?
Usually, scans come from whoever submitted the rest of the book.
The only trick is there is a naming convention for the filenames. Did you already do that (sorry, I did not look at the files mentioned above)? There is a page at www.pgdp.net (somewhere) that describes the naming convention. I have it somewhere, too.
If so, just get in touch with the WWer who posted the files, to arrange delivery. If you don't know who this is, or none of this makes sense, email me directly or email pgww@lists.pglaf.org
Thanks!
The page images are from a "photo-reprint" of the 1818 Lacking, Hughes, et al. edition, which, judging by the scans, is code for microfiche/microfilm. They're at DP for now, and when archived will end up at OLS. http://www.pgdp.net/c/project.php?id=projectID508470307d6dd Weren't there discussions at some point on automated or semi-automated uploading of DP scansets to PG? They all get archived, even the ones that aren't publicly displayed at OLS because of politeness/permission issues. -R C

On Wed, Nov 28, 2012 at 12:44:04PM -0500, Robert Cicconetti wrote:
On Wed, Nov 28, 2012 at 12:20 PM, Greg Newby <gbnewby@pglaf.org> wrote:
On Mon, Nov 26, 2012 at 05:22:33PM +0000, Jon Hurst wrote:
The other day I noticed that DP has recently uploaded a reworking of Frankenstein (#41445), the 12th most downloaded "work" on PG. I am assuming that it wasn't cross-checked against #84, so I've started the process of doing this.
I've done the quick and easy bit and now have the text rewrapped to the images. I would like to get these files uploaded to #41445's fileset - they are sitting at:
https://github.com/JonHurst/frankenstein
If you want to just get a zip file rather than use git use:
https://github.com/JonHurst/frankenstein/archive/master.zip
I'm not sure what the protocol for doing this is, beyond it being "supported and encouraged" to have the scans present in the fileset of a given "expression" -- Greg, please let me know what you would like to do.
You're saying you have the page scans that were used to make #41445, but they were not provided as part of the original file upload?
Usually, scans come from whoever submitted the rest of the book.
The only trick is there is a naming convention for the filenames. Did you already do that (sorry, I did not look at the files mentioned above)? There is a page at www.pgdp.net (somewhere) that describes the naming convention. I have it somewhere, too.
If so, just get in touch with the WWer who posted the files, to arrange delivery. If you don't know who this is, or none of this makes sense, email me directly or email pgww@lists.pglaf.org
Thanks!
The page images are from a "photo-reprint" of the 1818 Lacking, Hughes, et al. edition, which, judging by the scans, is code for microfiche/microfilm. They're at DP for now, and when archived will end up at OLS. http://www.pgdp.net/c/project.php?id=projectID508470307d6dd
Weren't there discussions at some point on automated or semi-automated uploading of DP scansets to PG? They all get archived, even the ones that aren't publicly displayed at OLS because of politeness/permission issues.
Yes, at my request. This discussion first happened in 2004, with me, Juliet Sutherland & Charles Franks. Suffice to say that DP has not adopted the process yet. Most individual uploaders don't provide their scans either. -- Greg
participants (6)
-
Al Haines
-
Greg Newby
-
James Adcock
-
Jon Hurst
-
Lee Passey
-
Robert Cicconetti