
Hello, (Hopefully this is the proper mailing list for such topic. Let me know otherwise.) I would like to build a local database of the Gutenberg catalog. The 'Gutenberg Feeds' page [1] lists the following resource to help achieve that programmatically: (1) All books in one huge file, in " superseded DCMI recommendation" format (2) A separate file for each book, in "current DCMI recommendation" format (3) A RSS Feed, in rss version="0.91" format So far: (1) sports all the Gutenberg assets, and is handy for the initial database build. But this looks a bit overkill for a day to day synchronization. (2) seems more appropriate than (1) for daily updates, but sports a different format: "current DCMI recommendation" vs. " superseded DCMI recommendation" (3) is a bit a blast from the past, but at least provides a list of new resources daily. Sadly there is no explicit link to (2), so one has to infer it from the <link> information. Questions: - Is there a version of (1) in the same format as (2)? Assuming the "current DCMI recommendation" is the canonical representation. That would save one from dealing with two different formats, or hacking (1) to get all the references to (2) and then hammer PG to get the individual files in format (2). - Why are (1) and (2) in different formats? - Is there an alternative feed that lists the rdf resource explicitly? An Atom feed perhaps? Apologies if these are FAQs, but I couldn't locate an unambiguous archive of this mailing list. Is GNAME a good proxy for the list postings? http://dir.gmane.org/gmane.culture.literature.e-books.gutenberg.volunteers Alternatively, is there a more straightforward way to build a local database of PG's assets? Perhaps I'm missing something :) Thanks in advance for any pointers. Cheers, PA. [1] http://www.gutenberg.org/wiki/Gutenberg:Feeds