
Thanks for your note, Tim. More: On Sun, Apr 29, 2018 at 11:55:54PM -0400, Tim Hare wrote:
Hello, I just joined this list but I'm interested in this current thread. What is the goal of moving to these static HTML pages / directories? Is it to eliminate the processing time necessary to generate these files with the current back-end processing? Seems you will be (possibly) using more disk space - a trade-off which may be worth it, I'm not arguing against it - to save CPU time?
The main motivitation is to make it easier to simultaneously mirror (copy) the metadata of a book, along with all desired formats of that book (HTML, text, epub, etc.). Currently, eBook landing pages such as https://www.gutenberg.org/ebooks/5000 are generated on the fly (with lots of caching) from a database that holds metadata, mainly via a server-side Python app. Downloadable items come from files, including auto-generated content (epub, mobi, and some upconversion of text and HTML) and by-hand content (mostly the HTML and/or text submitted by producers). Instead, I want to have a directory that includes all the content for an eBook, and all the metadata. I'm working on some samples for this, which I hope to share for this group's input fairly soon. Additional goals: - HTML5 (suitable for mobile devices etc.) with CSS - no Javascript (we don't need it) or other "active" content - friendlier filenames Those static HTML pages will be generated from the metadata database the same as currently (i.e., to a "cache"-type directory). That way, they can be persistent until/unless there is a change, such as a fix to metadata, a fix to the master eBook formats, or an improvement in our conversion. We will not need much extra space, since we can use links where desired to avoid duplication of items (and ensure consistency... i.e., pg5000.epub.images can also be "The Notebooks of Leonardo da Vinci.epub"). Al already addressed the rest of your note, below. You may also be interested to review these pages at https://www.gutenberg.org: mirroring how-to offline catalogs and, it might be informative to peruse the "raw" directory structure via a mirrorr, such as http://aleph.gutenberg.org/
I need to understand more - What is currently stored for each book? Is it stored in all of the formats at http://aleph.gutenberg.org/cache/epub/56843/ or are those created at the time someone requests them? In https://www.gutenberg.org/files/56843/ what's the difference between the -0 zip and the -h zip?
Thanks, Greg