Re: [gutvol-d] Project Gutenberg Original Directory Structure

----- Original Message ----- From: "Michael Hart" <hart@pglaf.org>
On Thu, 6 Jan 2005, Joshua Hutchinson wrote:
No disrespect, but ...
That seems like a collosal waste of time.
It's not your time, or much of our time, it is the time of voluneers who have asked to do this project, and there have been multiple requests.
I agree. However, Jared was basically asking for comment on the idea, which I provided. Personally, I see it as effort that would be better spent elsewhere. I am the last person to dictate how a volunteer should spend his/her time, though (since I'm sure plenty of people see TEI as a waste of time as well, which is my pet project). I apologize, Jared, if I sounded like I was trying to stop you from working on this. I was simply stating my opinion of the project. Josh

Joshua Hutchinson wrote:
----- Original Message ----- From: "Michael Hart" <hart@pglaf.org>
On Thu, 6 Jan 2005, Joshua Hutchinson wrote:
No disrespect, but ...
That seems like a collosal waste of time.
It's not your time, or much of our time, it is the time of voluneers who have asked to do this project, and there have been multiple requests.
I agree. However, Jared was basically asking for comment on the idea, which I provided. Personally, I see it as effort that would be better spent elsewhere. I am the last person to dictate how a volunteer should snpend his/her time, though (since I'm sure plenty of people see TEI as a waste of time as well, which is my pet project).
I apologize, Jared, if I sounded like I was trying to stop you from working on this. I was simply stating my opinion of the project.
Wouldn't it be easier to just create a web page that listed the original names in the original directory structure, and then linked to the current book, wherever it is. It wouldn't require as much space as a full copy of all the books, and would probably be easier to keep in sync with any updated files.

Wouldn't it be easier to just create a web page that listed the original names in the original directory structure, and then linked to the current book, wherever it is. It wouldn't require as much space as a full copy of all the books, and would probably be easier to keep in sync with any updated files.
Except when domains expire, sites go down, directory structures get moved around, and dozens of other situations where this is not the best approach, when you're relying on external sites to maintain their own content. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com

David A. Desrosiers wrote:
Wouldn't it be easier to just create a web page that listed the original names in the original directory structure, and then linked to the current book, wherever it is. It wouldn't require as much space as a full copy of all the books, and would probably be easier to keep in sync with any updated files.
Except when domains expire, sites go down, directory structures get moved around, and dozens of other situations where this is not the best approach, when you're relying on external sites to maintain their own content.
I would think it would be as a part of the Gutenberg site, not on a seperate site. That way, all links would be relative. Any broken links would be because of a change in the master directory structure, or an update to one of the books, which would need to be handled anyway (I'm assuming you still want the latest versions of the books). If you could add the original ebook number/directory to the metadata stored at Gutenberg, then you could periodically re-generate the web page(s) automatically with a simple perl script. Or maybe just use a CGI script to create it on-the-fly, so it is automatic. If you want it as a seperate site, write the links to point at whichever mirror you want. If, however, you want a static copy of the site up to when it switched to the new format, ignoring all new and updated books; then a seperate site would probably be preferable. Or just grab a copy of the 10K special DVD, as it has the original directory structure, and mount it in a web-accessable way.

I would think it would be as a part of the Gutenberg site, not on a seperate site. That way, all links would be relative. Any broken links would be because of a change in the master directory structure, or an update to one of the books, which would need to be handled anyway (I'm assuming you still want the latest versions of the books).
That doesn't help the problem at all, because what do you do with any images that may be used in the work, such as a DocBook copy of a book or an HTML version of a book? Do you symlink those across the tree also? This is a management nightmare, especially if things move around in the tree (as they are now). It also doesn't remove the space constraints of having the full copy of the work in multiple formats. With the sheer size of the Gutenberg tree, this will rapidly become a full-time job to make sure everything is working right without breakage with thousands of symlinks all over the tree.
If you want it as a seperate site, write the links to point at whichever mirror you want.
Links don't point to remote servers, they point to local resources. Unless of course, these are replicated across some local filesystem and rsync'd from there.
If, however, you want a static copy of the site up to when it switched to the new format, ignoring all new and updated books; then a seperate site would probably be preferable. Or just grab a copy of the 10K special DVD, as it has the original directory structure, and mount it in a web-accessable way.
That DVD is pretty old at this point, and doesn't include the new directory structure, if I remember correctly. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com

David A. Desrosiers wrote:
switched to the new format, ignoring all new and updated books; then a seperate site would probably be preferable. Or just grab a copy of the 10K special DVD, as it has the original directory structure, and mount it in a web-accessable way.
If, however, you want a static copy of the site up to when it
That DVD is pretty old at this point, and doesn't include the new directory structure, if I remember correctly.
Isn't that what this discussion is all about? Going back to the old directory structure? I was just trying to clarify exactly to what level they wanted to go back. I must be missing something here. The DVD was created just before the new structure was started. The new structure went in around ebook 10000, and the DVD was created around ebook 9500 (so mising something under 500 books before the new structure went in), so it is almost exactly what I understood was wanted.

On Thu, 6 Jan 2005, David A. Desrosiers wrote:
I would think it would be as a part of the Gutenberg site, not on a seperate site. That way, all links would be relative. Any broken links would be because of a change in the master directory structure, or an update to one of the books, which would need to be handled anyway (I'm assuming you still want the latest versions of the books).
That doesn't help the problem at all, because what do you do with any images that may be used in the work, such as a DocBook copy of a book or an HTML version of a book? Do you symlink those across the tree also? This is a management nightmare, especially if things move around in the tree (as they are now).
It also doesn't remove the space constraints of having the full copy of the work in multiple formats. With the sheer size of the Gutenberg tree, this will rapidly become a full-time job to make sure everything is working right without breakage with thousands of symlinks all over the tree.
We are going to have that many links one of these days, anyway.
If you want it as a seperate site, write the links to point at whichever mirror you want.
Links don't point to remote servers, they point to local resources. Unless of course, these are replicated across some local filesystem and rsync'd from there.
Actually, anyone is free to mount these on any servers they like, as long as they are given away free of all charges.
If, however, you want a static copy of the site up to when it switched to the new format, ignoring all new and updated books; then a seperate site would probably be preferable. Or just grab a copy of the 10K special DVD, as it has the original directory structure, and mount it in a web-accessable way.
That DVD is pretty old at this point, and doesn't include the new directory structure, if I remember correctly.
I think that was the point. . . . mh

Why not simply use server redirects so that anyone linking to the original location will be redirected to the new location for each text? E.g. (assuming Apache): Redirect permanent /etext90/mayfl* \ http://www.gutenberg.org/etext/7 (Not sure if that's entirely correct, but you get the idea.) This would be boon to those who've set up links elsewhere to specific texts, only to have the trashed by the relocation. There are probably many tens of thousands of such links that are currently broken. Steve -- Stephen Thomas, Senior Systems Analyst, University of Adelaide Library UNIVERSITY OF ADELAIDE SA 5005 AUSTRALIA Phone: +61 8 830 35190 Fax: +61 8 830 34369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ CRICOS Provider Number 00123M ----------------------------------------------------------- This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient.

Why not simply use server redirects so that anyone linking to the original location will be redirected to the new location for each text?
mod_rewrite is the more-scalable approach. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com

David A. Desrosiers wrote:
Why not simply use server redirects so that anyone linking to the original location will be redirected to the new location for each text?
mod_rewrite is the more-scalable approach.
Sure. Whatever. It's really up to the server admin -- Marcello? Depends on what resources he has available on that server. Also, neither approach helps with mirror sites, because these things won't get mirrored. Maybe a simple (if tedious) use of symbolic links would be better, because that should flow on to mirror sites. Steve -- Stephen Thomas, Senior Systems Analyst, University of Adelaide Library UNIVERSITY OF ADELAIDE SA 5005 AUSTRALIA Phone: +61 8 830 35190 Fax: +61 8 830 34369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ CRICOS Provider Number 00123M ----------------------------------------------------------- This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient.

Steve Thomas wrote:
Sure. Whatever. It's really up to the server admin -- Marcello? Depends on what resources he has available on that server.
Try this one: http://www.gutenberg.net/etext03/napol10.txt -- Marcello Perathoner webmaster@gutenberg.org

That's great, exactly what I wanted -- and I see that napol10.zip also takes you to the catalog page. So I assume you are redirecting napol10.* Would it be possible to also redirect napol* -- so that you'd get the catalog page regardless of version? That would mean that anyone with an old link would then discover thru the catalog that there was a later version. I guess your main problem is that some works have been relocated to the new structure, while others have not, which is going to make things more complex. How hard would it be to just do a mass-migration of all works to the new structure, and then put in place a redirection for all old-style links? (I mean a migration only in the sense of moving the files. The updating work could be done later.) (That would actually be pretty nasty for mirror sites, because they'd suddenly need to update all the old works at once. But that has to happen some time.) Steve Marcello Perathoner wrote:
Steve Thomas wrote:
Sure. Whatever. It's really up to the server admin -- Marcello? Depends on what resources he has available on that server.
Try this one:
-- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/

On Sun, Jan 09, 2005 at 12:29:38PM +1030, Steve Thomas wrote:
... How hard would it be to just do a mass-migration of all works to the new structure, and then put in place a redirection for all old-style links? (I mean a migration only in the sense of moving the files. The updating work could be done later.)
The procedure is that files that get moved get updated: - new header - run through gutcheck and other checking It's a manual process. The advantage is that after things are moved, we have a certain level of confidence in their formatting & quality. -- Greg

Steve Thomas wrote:
That's great, exactly what I wanted -- and I see that napol10.zip also takes you to the catalog page. So I assume you are redirecting napol10.* Would it be possible to also redirect napol* -- so that you'd get the catalog page regardless of version? That would mean that anyone with an old link would then discover thru the catalog that there was a later version.
I can redirect any file that was in the old directories, provided I can get a mapping filename => etext-no. Some of these mappings have been deleted after the files were moved because I didn't think of this gimmick. The mappings of the files recently REPosted are still in the database. I'll try to get the old ones back from a backup, if there is one. -- Marcello Perathoner webmaster@gutenberg.org

Steve Thomas wrote:
Why not simply use server redirects so that anyone linking to the original location will be redirected to the new location for each text?
Because maintaining that will be a nightmare and a server performance hog. We already have an error.php which tries to figure out what the user wanted to get and redirects the user accordingly. Adding a lookup into the database would be easy. Regrettably I already deleted most of the old files from the database. I have to see if I can get them back in. -- Marcello Perathoner webmaster@gutenberg.org

That sounds more like what i had in mind :) I want it to be simple and easy to navigate, and yes, we'd save server space by simply linking to the file(s) wherever they currently are. Jared Buck ---------------------- Project Gutenberg editor http://www.gutenberg,net ----- Original Message ----- From: "Kevin Handy" <kth@srv.net> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Sent: Thursday, January 06, 2005 7:44 AM Subject: Re: [gutvol-d] Project Gutenberg Original Directory Structure
Joshua Hutchinson wrote:
----- Original Message ----- From: "Michael Hart" <hart@pglaf.org>
On Thu, 6 Jan 2005, Joshua Hutchinson wrote:
No disrespect, but ...
That seems like a collosal waste of time.
It's not your time, or much of our time, it is the time of voluneers who have asked to do this project, and there have been multiple requests.
I agree. However, Jared was basically asking for comment on the idea, which I provided. Personally, I see it as effort that would be better spent elsewhere. I am the last person to dictate how a volunteer should snpend his/her time, though (since I'm sure plenty of people see TEI as a waste of time as well, which is my pet project).
I apologize, Jared, if I sounded like I was trying to stop you from working on this. I was simply stating my opinion of the project.
Wouldn't it be easier to just create a web page that listed the original names in the original directory structure, and then linked to the current book, wherever it is. It wouldn't require as much space as a full copy of all the books, and would probably be easier to keep in sync with any updated files.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

No offense, Josh, this is a free country and you are welcome to provide your opinion as you see fit :) I'm trying to start a PG of Russia, I just got to get in touch with my girlfriend in Moscow, she'd be interested in helping once she's out of school later this month :) Prof. Hart and Greg Newby would like to see a Russian PG, and I have the time and resources to spend to work on one. Jared Buck ---------------------- Project Gutenberg editor http://www.gutenberg,net

Now a PG of Russia idea I like! You have my full support on that one! :) Jared Buck wrote:
No offense, Josh, this is a free country and you are welcome to provide your opinion as you see fit :) I'm trying to start a PG of Russia, I just got to get in touch with my girlfriend in Moscow, she'd be interested in helping once she's out of school later this month :) Prof. Hart and Greg Newby would like to see a Russian PG, and I have the time and resources to spend to work on one.
Jared Buck ---------------------- Project Gutenberg editor http://www.gutenberg,net
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

Heh, that's good to know you have my back :) I speak a little Russian, but with my girlfrienf working together with me on a PG-Russia, such a site would be both in English and in Russian (especially for people studying Russian who are required to read classic Russian works in their original language). Implementing a PG-Russia shouldn't be that hard, and both Greg and Michael have lended their support to me for doing this, including the possibilty of lending me some server space to host the site, which likely would be located at an address like http://www.gutenberg.ru or something similar. Jared Buck ---------------------- Project Gutenberg editor http://www.gutenberg.net ----- Original Message ----- From: "Joshua Hutchinson" <joshua@hutchinson.net> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Sent: Friday, January 07, 2005 3:36 PM Subject: Re: [gutvol-d] Project Gutenberg Original Directory Structure
Now a PG of Russia idea I like! You have my full support on that one! :)
Jared Buck wrote:
No offense, Josh, this is a free country and you are welcome to provide your opinion as you see fit :) I'm trying to start a PG of Russia, I just got to get in touch with my girlfriend in Moscow, she'd be interested in helping once she's out of school later this month :) Prof. Hart and Greg Newby would like to see a Russian PG, and I have the time and resources to spend to work on one.
Jared Buck ---------------------- Project Gutenberg editor http://www.gutenberg,net
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

A kind of PG-Russia already exists: http://lib.ru/ (not that I can really read it, but I can see the list of authors, and access the books). Carlo Traverso
participants (9)
-
Carlo Traverso
-
David A. Desrosiers
-
Greg Newby
-
Jared Buck
-
Joshua Hutchinson
-
Kevin Handy
-
Marcello Perathoner
-
Michael Hart
-
Steve Thomas