[gutvol-p] Re: Getting Involved

19 Dec 2009

      Hello Al, Juliet and Jim:

Thanks for your detailed replies. I now start to understand the train of
dependencies. The credit line was what had bothered me in particular,
given that a PDF file cannot be altered after submission.

More generally, I now understand that, if I can contribute at all, it
would be to help in packaging content for others to convert to PDF, XML,
or whatever other format may become fashionable In this connection, the
reply from Juliet looks very enticing.

To explain: I am one of the XML true believers and TEI, or something
like it, is ultimately the way to go. But TEI seeks to be all-inclusive
and doomed to be very big and complicated (think SGML). And working with
XML is so painful and error-prone for humans. I don't know how big the
PGTEI subset is, but there is a good chance that it might be expressible
in lightly marked-up text, which can easily be parsed into XML. If that
were the case, I can become usefully involved at the DP end.

To state Basil Fawlty's bleedingly obvious, it might then become PG's
long-term aim to provide PGTEI versions of all texts, from which all
styled versions can be derived--and the only one version to be
maintained. But where is a spec for PGTEI? And samples? If I could have
a look at them, I could very quickly decide whether I could be of any
use.

In summary, without being very clear about it, I had thought that I
might be able to contribute to PG by generating more refined documents
from existing books (gratuitous, I admit); but now I suspect that I
might be more useful by wrestling with the software.

(Notes for Juliet:
1. I could not find a spec for PGTEI on the pgdp.net site. Is one
available?
2. I am a Linux user by choice, but should I presume that all software
is required for Windows?
)

Note for all responders: Thanks for your thoughtful responses; I am
starting to learn the issues!

In conclusion: many thanks to Al, Juliet and Jim for their detailed
responses.

John Redmond 

On Wed, 2009-12-16 at 15:58 -0800, Al Haines (shaw) wrote:
...
If a PDF, or any other format, is generated from an existing PG text, it 
won't get a new number.  It would be bundled in with all other files for 
that etext number, and would appear in PG's catalog as an addition filetype.
To use Copperfield as an example, if it was in PG originally as only a text 
file, then at a later date an HTML version was generated from the text file, 
the text and HTML files would appear as two filetype entries under that 
particular Copperfield.  If a PDF file was then added, generated from either 
that Copperfield's text or HTML file, the PDF file would appear as another 
filetype.
New numbers are given to ebooks that are new to PG, or are created from a 
different edition, with significant enhancements/differences, than a current 
PG ebook.
On occasion, a new number is assigned if a new set of files is created from 
the same source edition, but the new version has significant enhancements, 
e.g. illustrations, an index, etc, that may have been omitted from the 
current PG version.  This usually applies only to PG's oldest texts, before 
HTML/images/ISO files were commonly provided.
One other point, again with Copperfield as the example.  You say that yours 
was generated from one of PG's editions, but you appear to have stripped out 
the producer's credit line ("Produced by...").  Some PG files, usually the 
older ones, may not have originally had such a credit line, but if the file 
is cleaned up and reposted some time after its original submission, it's 
standard practice to add "Produced by an anonymous Project Gutenberg 
volunteer".
Whatever the case, stripping out a credit line is a distinct no-no.  The 
original producers always get credit for the original production, with the 
producer of the new format getting additional credit.  For example, PG#552 
(The People that Time Forgot) was produced in 1996 by Judith Boss.  In July 
2008, I created an HTML version from her text file.  She retains basic 
credit; I took credit only for the HTML file.  If you created a PDF file 
from either of those two files, your credit would be added to the other two. 
These credit lines are respected by most harvesters of PG files.
----- Original Message ----- 
From: "John Redmond" <john_redmond@optusnet.com.au>
To: "Al Haines (shaw)" <ajhaines@shaw.ca>
Sent: Wednesday, December 16, 2009 2:42 PM
Subject: Re: [gutvol-p] Getting Involved
...
Hello Al:
Thanks for responding. I will certainly work through all the links that
you have listed. I can't help feeling, though, that what I want to do is
somewhat different from the usual:
1. I see my contribution, apart from providing the software, is to
value-add on existing books. For example, the files on my site
(www.limpidsoft.com) are derived from PG books, but I presume that they
will have new catalog numbers.
2. I can provide XHTML files -- although there is no shortage of them in
PG. So my particular contribution would be PDF files, possibly with the
associated LaTeX files. Now, because PDF files are locked, it will not
be possible to include any statements after they are built. As I see it,
then, I would need to tie up all this detail before submitting.
3. Plain text versions are automatically accounted for (see 1. above),
but it would probably be appropriate to identify these somewhere in the
PDF.
John Redmond