Hello Al, Juliet and Jim: Thanks for your detailed replies. I now start to understand the train of dependencies. The credit line was what had bothered me in particular, given that a PDF file cannot be altered after submission. More generally, I now understand that, if I can contribute at all, it would be to help in packaging content for others to convert to PDF, XML, or whatever other format may become fashionable In this connection, the reply from Juliet looks very enticing. To explain: I am one of the XML true believers and TEI, or something like it, is ultimately the way to go. But TEI seeks to be all-inclusive and doomed to be very big and complicated (think SGML). And working with XML is so painful and error-prone for humans. I don't know how big the PGTEI subset is, but there is a good chance that it might be expressible in lightly marked-up text, which can easily be parsed into XML. If that were the case, I can become usefully involved at the DP end. To state Basil Fawlty's bleedingly obvious, it might then become PG's long-term aim to provide PGTEI versions of all texts, from which all styled versions can be derived--and the only one version to be maintained. But where is a spec for PGTEI? And samples? If I could have a look at them, I could very quickly decide whether I could be of any use. In summary, without being very clear about it, I had thought that I might be able to contribute to PG by generating more refined documents from existing books (gratuitous, I admit); but now I suspect that I might be more useful by wrestling with the software. (Notes for Juliet: 1. I could not find a spec for PGTEI on the pgdp.net site. Is one available? 2. I am a Linux user by choice, but should I presume that all software is required for Windows? ) Note for all responders: Thanks for your thoughtful responses; I am starting to learn the issues! In conclusion: many thanks to Al, Juliet and Jim for their detailed responses. John Redmond On Wed, 2009-12-16 at 15:58 -0800, Al Haines (shaw) wrote:
If a PDF, or any other format, is generated from an existing PG text, it won't get a new number. It would be bundled in with all other files for that etext number, and would appear in PG's catalog as an addition filetype.
To use Copperfield as an example, if it was in PG originally as only a text file, then at a later date an HTML version was generated from the text file, the text and HTML files would appear as two filetype entries under that particular Copperfield. If a PDF file was then added, generated from either that Copperfield's text or HTML file, the PDF file would appear as another filetype.
New numbers are given to ebooks that are new to PG, or are created from a different edition, with significant enhancements/differences, than a current PG ebook.
On occasion, a new number is assigned if a new set of files is created from the same source edition, but the new version has significant enhancements, e.g. illustrations, an index, etc, that may have been omitted from the current PG version. This usually applies only to PG's oldest texts, before HTML/images/ISO files were commonly provided.
One other point, again with Copperfield as the example. You say that yours was generated from one of PG's editions, but you appear to have stripped out the producer's credit line ("Produced by..."). Some PG files, usually the older ones, may not have originally had such a credit line, but if the file is cleaned up and reposted some time after its original submission, it's standard practice to add "Produced by an anonymous Project Gutenberg volunteer".
Whatever the case, stripping out a credit line is a distinct no-no. The original producers always get credit for the original production, with the producer of the new format getting additional credit. For example, PG#552 (The People that Time Forgot) was produced in 1996 by Judith Boss. In July 2008, I created an HTML version from her text file. She retains basic credit; I took credit only for the HTML file. If you created a PDF file from either of those two files, your credit would be added to the other two. These credit lines are respected by most harvesters of PG files.
----- Original Message ----- From: "John Redmond" <john_redmond@optusnet.com.au> To: "Al Haines (shaw)" <ajhaines@shaw.ca> Sent: Wednesday, December 16, 2009 2:42 PM Subject: Re: [gutvol-p] Getting Involved
Hello Al:
Thanks for responding. I will certainly work through all the links that you have listed. I can't help feeling, though, that what I want to do is somewhat different from the usual:
1. I see my contribution, apart from providing the software, is to value-add on existing books. For example, the files on my site (www.limpidsoft.com) are derived from PG books, but I presume that they will have new catalog numbers.
2. I can provide XHTML files -- although there is no shortage of them in PG. So my particular contribution would be PDF files, possibly with the associated LaTeX files. Now, because PDF files are locked, it will not be possible to include any statements after they are built. As I see it, then, I would need to tie up all this detail before submitting.
3. Plain text versions are automatically accounted for (see 1. above), but it would probably be appropriate to identify these somewhere in the PDF.
John Redmond