Greetings to all,
I am excited about the potential of Project Gutenberg of Canada.
But so far, I don't know much about it.
So I've put together a list of questions regarding various
aspects of it, generally trying to see if there is anything
we can learn by comparison with PG and PG-Au.
I believe it would be worth-while to consider these issues
and have a consistent plan ready rather than just wait until
a situation comes up and then deal with it on an ad hoc basis.
I acknowledge some of these may already be in place, and some
may not need to be addressed for some time yet.
Some of these topics have shown themselves to cause strong
differences of opinion on the gutvol-d mailing list. I
would request that we avoid flame wars about them here.
Filenames and Directory structure
As the post-10,000 changeover at PG shows, this is worth
taking the time to consider carefully. I like the way
the current PG system puts all the files relating to
one eBook in the same subdirectory. PG-au is using
one subdirectory for each release year, with each filename
including the year, ebook number in that year, and version.
However, that will not scale upwards well if ebook
production increases. Given a choice between these two,
I personally prefer the post-10,000 PG method.
Will we have one basic, authoritative file format?
>From postings I've seen on the gutvol-d mailing list, I would
hazard a guess that some form of pgxml will likely be proposed
by James. If so, that would seem to me to indicate a likelihood
of using UTF-8 as a default encoding. Another seemingly logical
choice would be to use mostly ISO-Latin-1, and other parts of
ISO 8895 when appropriate.
Do we restrict or encourage certain formats or encodings?
At PG-US the preference is still to go with having a plain 7-bit
ASCII version available whenever possible. At PG-AU, Col has said
posting just an ISO-Latin-1 version is fine. My own preference is
to go for consistency in whatever route is taken.
How will we handle corrections?
At PG-US, when there are a small number of corrections, the file is
changed with just a "last updated" line to show it was changed. For a
larger number of corrections a new file is posted to superced the old
one, and the old one is still kept in the archive. At PG-Au, a corrected
eBook has "last updated" info added and the old file is deleted, with
the new one taking its place.
Selection criteria: exclusively Canadiana?
As I understand PG-Ca is hoping to have funding (from a
government source?) does that mean we will have a mandate
to pursue just Canadiana?
Selection criteria: Overlap with PG and PG-au
PG-au is likely to have some similar texts as their copyright laws
are similar to ours. Would we want to have a policy of trying to
include everything found there--or keeping a distinct collection
with no duplication--or just letting things happen as they may?
Assuming a focus on Canadiana, would we want to make an effort
to include all Canadiana in PG in this collection?
What metadata stored in database
I would love to have, right from the start, a clear expectation
of what metadata we hope to record for each item, and as
consistent as possible a way to record it.
Will we have an official copyright clearance procedure?
Just recently, I was thinking that one of the strong points
of the Project Gutenberg collection is that it may be the
largest collection of documented public domain material
in existence. From what I understand, all copyright
clearances are saved, so they can be referred back to
if needed. In contrast, at PG-au, from what I've seen,
you can just submit an eBook without getting formal
clearance before-hand.
Mirroring? Backup? Long-term stability.
One reason I feel my time is well-spent as a PG volunteer
is that the collection appears very permanent, and I feel
that my contributions are not going to be lost over the years.
I could not envision the PG archive disappearing as long as
there is an internet. What plans would be ideal to encourage
the same for PG-ca?
Andrew