James, would it be possible to install a Wiki on your website so that many of the rest of us can start helping to create a few pages? Is this a good idea? I'd just like to keep some momentum going. In other news... Don Johnston who has been wonderful so far with the free legal advice has suggested we set up an informal organization (much like a church or scout group would) and create a bank account for it. Do we already have such a thing? We should put any money we are able to collect into it. Then when we incorporate we wind down this first entity. There is information for Federally incorporated non-profits here: http://strategis.gc.ca/epic/internet/incd-dgc.nsf/en/h_cs02147e.html for anybody who is interested. I was also contacted by someone at UOttawa faculty of law and the Canadian Internet Policy and Public Interest Clinic (www.cippic.ca). I told her all about Project Gutenberg, at least all that I could being an admitted newbie myself :-). I think I did a fairly good job explaining it all to her, and I pointed her to this list so hopefully she will be monitoring our activity. I am very hopeful we will get assistance from these sources as well. I received an email from Marcello who maintains the American PG site. There is about 28MB of data for the website there. I do not know what is the best option here and I know James has expressed a dislike for implementing the US code. Can we have a little more discussion on this and get something implemented? I'd really like to know more about what people think in this area. Anybody have any thoughts on what more we should do to keep this train moving along? Lastly... I know I am a newbie here so tell me if I am stepping on anybodys toes. I have significant enthusiasm for this project right now and I don't want my own momentum to wain. From what I've seen here I think there is a lot of enthusiasm from a great many people and I want to encourage that as well. I don't want anybody to feel pushed out, and I know if left on my own I'll just pick it all up and run with it until I burn out. So please tell me how I can help rather than just letting me take over. Thanks again, darryl
Hi Darryl I love the tone of the quoted paragraph below. How can you tell we are stereotypical Canadians? We worry about "stepping on others' toes". I've been a fairly long-time volunteer contributor to PG (about 5 years) so if you have any questions regarding that organization feel free to ask. I feel that anything we can take from PG-US, which has already been proven, will only make the birthing of PG of Canada easier on us all. I must say that from the point of view of technical details about the texts themselves, James' preferance for xml as a master file does make me wonder somewhat. As we don't have another project (that I am aware of) to base our initial efforts in this format on, we will be working out the bugs on our own... Also, I believe the collection will grow much slower than if we were to use un-marked-up text files. However, the alluring vision of a collection of consistently marked-up texts is tempting. Perhaps I lack the prior experience to really know how long it will take to get to that point. James, convince me! :) On Wed, 10 Nov 2004, Darryl Moore wrote:
Lastly... I know I am a newbie here so tell me if I am stepping on anybodys toes. I have significant enthusiasm for this project right now and I don't want my own momentum to wain. From what I've seen here I think there is a lot of enthusiasm from a great many people and I want to encourage that as well. I don't want anybody to feel pushed out, and I know if left on my own I'll just pick it all up and run with it until I burn out. So please tell me how I can help rather than just letting me take over.
Andrew wrote:
I must say that from the point of view of technical details about the texts themselves, James' preferance for xml as a master file does make me wonder somewhat. As we don't have another project (that I am aware of) to base our initial efforts in this format on, we will be working out the bugs on our own... Also, I believe the collection will grow much slower than if we were to use un-marked-up text files.
Why the rush? Initially it may grow slower, but it is much better to do it right from the start, including proper metadata collection and structuring. Notice the spinning wheels PG-MS (MS == Mother Ship) has in cleaning up (both text and metadata) its 10,000+ texts (especially the older pre-DP-era texts.) The well-known adage applies here: "If you don't have the time to do it right the first time, when are you ever going to have the time?" PG-MS has learned a lot over 30 years, especially in the realm of "what NOT to do" (such as ignoring source metadata and allowing the combining of several sources -- things which are BAD for several reasons -- PG should NOT be a publisher per se, but rather preserve the past and to make the texts useful in the present and future by accurate digitization.) (This "what NOT to do" reminds me of the Monty Python TV sketch: "How Not To Be Seen" <laugh/>) Note that DP itself is seriously considering switching to an XML-based system, likely to be based on a selected subset of TEI or TEI-Lite. I know James will disagree, but PG-Canada should do likewise. So long as the subset chosen is well-defined and structurally/semantically-oriented (with its own defined RelaxNG Schema with its own namespace), so things can be kept under strict control, yet take advantage of all the tools and expertise out there in TEI-Land. I'm not sure how PGTEI fits into this equation, but it certainly merits study. Of course, working with DP is important. And it is important that trained librarians, such as Alev, play an important role in the design and collection of metadata/catalog-info.
However, the alluring vision of a collection of consistently marked-up texts is tempting. Perhaps I lack the prior experience to really know how long it will take to get to that point.
Yes, it is tempting. And there's nothing wrong with going about things slow and methodically to debug things and to build a strong and robust system (such as DP-Canada?) to eventually speed things up. Here's what I see the process, which is different in some ways to how PG-MS now does it: 1) Select and find the texts (books, periodicals, etc.) which are relevant to PG-Canada's interests (it is important PG-Canada define what its focus will be.) 2) Copyright clear them. 3) Scan these texts, collect the metadata/catalog-info, and place the page scans online. (Optionally, OCR can be done on these scans, and the raw, uncorrected OCR text can be used to enable a "temporary" full-text-search capability of the collection of page scans.) 4) Start converting selected texts into XML, prioritizing them based on various criteria (to be determined.) Eventually this will be done via the next generation DP, but for the start do it manually (maybe run the text itself through the current DP to remove scanning errors, and then mark it up afterwards.) It is clear that PG-Canada may build a big library of page scans, while the production of XML texts from them will lag. That's not an issue since copyright clearing and scanning are themselves very important, and the intermediate product, page scans, are themselves useful for the interim. PG-Canada can work with Brewster Kahle and the Internet Archive on its "Million Book Project" (to scan one million books and place the scans online.) It might even be possible to acquire the scans from Brewster, in addition to supplying scans to Brewster. Note that Brewster is now focusing his book scanning efforts in, where else: CANADA ! Jon Noring
Michael asked:
Jon Noring wrote:
Note that Brewster is now focusing his book scanning efforts in, where else: CANADA !
I hadn't heard this. Do you have, or can you point to, more information?
There's a dearth of information online (based on my Google search -- maybe there's more and I missed it): http://docbug.com/Writings/npuc2004/ http://www.digital-copyright.ca/discuss/3794 When I last visited Brewster a couple months ago, he mentioned they are doing a scanning project in Canada (Toronto I think he said) which is an experiment with a robotic scanner (he said it costed him about $100,000). Apparently the India scanning project failed, for reasons that are unknown to me (although I speculate it had to do with quality control), so they're now studying robotics. I hope to find out more when James and I visit Brewster next Friday. Jon
Andrew, Darryl, On 07:46:55 Andrew Sly wrote:
I feel that anything we can take from PG-US, which has already been proven, will only make the birthing of PG of Canada easier on us all.
For my part, I think I'd rather emulate PG-Australia in a few ways. Col has adopted a simple and functional header and a mercifully abridged trailer that I would sooner borrow than PG-USA's I also feel that his directory structure is more practical, not to mention sane. Darryl indicated his preference for XML. While that sounds nice, I'd guess that he hasn't had any experience using Gutcheck on XML. I for one intend to continue to crank out plain text documents in the format PG-US and PG-AU commonly hold. I'd recommend that he has a go at marking up his first text - GWTW - into XML as a trial and see how that goes before committing to this path. All that said, I hafta say that I am grateful that Darryl has taken the initiative on this. Thank you. When the time comes I think my cheque book will also thank you/PG-CA. ============================================================ Gardner Buchanan <gbuchana@rogers.com> Ottawa, ON FreeBSD: Where you want to go. Today.
Regarding header(s) and footer(s)... according to one lawyer I talked with, we'd only be legally required to put a short legal notice in each text, assumably, with a URL to the full license text, etc. I've always disliked the PG header/footer, so I'm all for keeping it short. As for the directory structure, I'm hoping to come up with a hybrid of PGAUS and PGUSA, to implement for PGCAN... more on that later though. For PGCAN, I'm personally setting my sights on a fully XML based system, and totally ignoring plain text except as an output format. It's the right way to do things. I don't care about the academic usage argument (one way or the other), but from a purely technical point, a unified XML based system is the obvious choice. Yes, it takes more work to setup, but it's really not any more work to USE once it's setup properly. -------------------------- So far, Michael has agreed with _every_single_ idea I've talked to him about, and based on that and his offer to let me get PGCAN rolling (IE: using trademark, etc), I've gone on the assumption that I was going the right direction. If I start becoming a hinderance to PGCAN or the group believes I'm going in the wrong direction, I'd like everyone to tell me. If this happens, I will disappear, for the good of the whole - no hard feelings to anyone. -- James -----Original Message----- From: pgcanada-bounces@lists.pglaf.org [mailto:pgcanada-bounces@lists.pglaf.org]On Behalf Of Gardner Buchanan Sent: Thursday, November 11, 2004 9:55 PM To: Project Gutenberg of Canada Subject: Re: [PGCanada] James website and more news Andrew, Darryl, On 07:46:55 Andrew Sly wrote:
I feel that anything we can take from PG-US, which has already been proven, will only make the birthing of PG of Canada easier on us all.
For my part, I think I'd rather emulate PG-Australia in a few ways. Col has adopted a simple and functional header and a mercifully abridged trailer that I would sooner borrow than PG-USA's I also feel that his directory structure is more practical, not to mention sane. Darryl indicated his preference for XML. While that sounds nice, I'd guess that he hasn't had any experience using Gutcheck on XML. I for one intend to continue to crank out plain text documents in the format PG-US and PG-AU commonly hold. I'd recommend that he has a go at marking up his first text - GWTW - into XML as a trial and see how that goes before committing to this path. All that said, I hafta say that I am grateful that Darryl has taken the initiative on this. Thank you. When the time comes I think my cheque book will also thank you/PG-CA. ============================================================ Gardner Buchanan <gbuchana@rogers.com> Ottawa, ON FreeBSD: Where you want to go. Today. _______________________________________________ Project Gutenberg of Canada Website: http://www.projectgutenberg.ca/ List: pgcanada@lists.pglaf.org Archives: http://lists.pglaf.org/private.cgi/pgcanada/
Thanks for the message Gardner. I think what this helps make apparant is that once you get five people gathered together, you will have at least six different opinions on any given topic. :) I've rather curious to see how our various ideas about the possibilities for PG of Canada will (hopefully) converge together. Andrew On Thu, 11 Nov 2004, Gardner Buchanan wrote:
For my part, I think I'd rather emulate PG-Australia in a few ways. Col has adopted a simple and functional header and a mercifully abridged trailer that I would sooner borrow than PG-USA's I also feel that his directory structure is more practical, not to mention sane.
Darryl indicated his preference for XML. While that sounds nice, I'd guess that he hasn't had any experience using Gutcheck on XML. I for one intend to continue to crank out plain text documents in the format PG-US and PG-AU commonly hold. I'd recommend that he has a go at marking up his first text - GWTW - into XML as a trial and see how that goes before committing to this path.
All that said, I hafta say that I am grateful that Darryl has taken the initiative on this. Thank you. When the time comes I think my cheque book will also thank you/PG-CA.
participants (6)
-
Andrew Sly
-
Darryl Moore
-
Gardner Buchanan
-
James Linden
-
Jon Noring
-
Michael Dyck