Ideas about book landing page & metadata presentation

Hello, I just joined this list but I'm interested in this current thread. What is the goal of moving to these static HTML pages / directories? Is it to eliminate the processing time necessary to generate these files with the current back-end processing? Seems you will be (possibly) using more disk space - a trade-off which may be worth it, I'm not arguing against it - to save CPU time? I need to understand more - What is currently stored for each book? Is it stored in all of the formats at http://aleph.gutenberg.org/cache/epub/56843/ or are those created at the time someone requests them? In https://www.gutenberg.org/files/56843/ what's the difference between the -0 zip and the -h zip? Tim Hare Tallahassee, FL Interested Bystander, Non-Inc.

I can't speak to the first paragraph below, but can to the second. -0.zip - contains the zipped UTF8 text file for a given etext number, e.g. 56843-0.txt. "-8" is used for ISO-8859/Latin1 text files, while no designator is used for ASCII text files, e.g. 12345.txt -h.zip - contains the zipped HTML file and its accompanying /images folder, if there is one. To use 56843 as an example, it contains the entire contents of the folder 56843-h. The epub and kindle files are auto-created from the underlying posted files (in http://www.gutenberg.org/files/56843/) at the time of posting. They're stored elsewhere and are auto-regenerated should the posted files be corrected. The posted HTML file is used as the source; if there's no HTML file, the text file is used. Sometimes only a text file is submitted to PG--in such cases, an HTML file is auto-generated. Al Haines Project Gutenberg -----Original Message----- From: gutvol-d [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Tim Hare Sent: Sunday, April 29, 2018 8:56 PM To: gutvol-d@lists.pglaf.org Subject: [gutvol-d] Ideas about book landing page & metadata presentation Hello, I just joined this list but I'm interested in this current thread. What is the goal of moving to these static HTML pages / directories? Is it to eliminate the processing time necessary to generate these files with the current back-end processing? Seems you will be (possibly) using more disk space - a trade-off which may be worth it, I'm not arguing against it - to save CPU time? I need to understand more - What is currently stored for each book? Is it stored in all of the formats at http://aleph.gutenberg.org/cache/epub/56843/ or are those created at the time someone requests them? In https://www.gutenberg.org/files/56843/ what's the difference between the -0 zip and the -h zip? Tim Hare Tallahassee, FL Interested Bystander, Non-Inc.

Thanks for your note, Tim. More: On Sun, Apr 29, 2018 at 11:55:54PM -0400, Tim Hare wrote:
Hello, I just joined this list but I'm interested in this current thread. What is the goal of moving to these static HTML pages / directories? Is it to eliminate the processing time necessary to generate these files with the current back-end processing? Seems you will be (possibly) using more disk space - a trade-off which may be worth it, I'm not arguing against it - to save CPU time?
The main motivitation is to make it easier to simultaneously mirror (copy) the metadata of a book, along with all desired formats of that book (HTML, text, epub, etc.). Currently, eBook landing pages such as https://www.gutenberg.org/ebooks/5000 are generated on the fly (with lots of caching) from a database that holds metadata, mainly via a server-side Python app. Downloadable items come from files, including auto-generated content (epub, mobi, and some upconversion of text and HTML) and by-hand content (mostly the HTML and/or text submitted by producers). Instead, I want to have a directory that includes all the content for an eBook, and all the metadata. I'm working on some samples for this, which I hope to share for this group's input fairly soon. Additional goals: - HTML5 (suitable for mobile devices etc.) with CSS - no Javascript (we don't need it) or other "active" content - friendlier filenames Those static HTML pages will be generated from the metadata database the same as currently (i.e., to a "cache"-type directory). That way, they can be persistent until/unless there is a change, such as a fix to metadata, a fix to the master eBook formats, or an improvement in our conversion. We will not need much extra space, since we can use links where desired to avoid duplication of items (and ensure consistency... i.e., pg5000.epub.images can also be "The Notebooks of Leonardo da Vinci.epub"). Al already addressed the rest of your note, below. You may also be interested to review these pages at https://www.gutenberg.org: mirroring how-to offline catalogs and, it might be informative to peruse the "raw" directory structure via a mirrorr, such as http://aleph.gutenberg.org/
I need to understand more - What is currently stored for each book? Is it stored in all of the formats at http://aleph.gutenberg.org/cache/epub/56843/ or are those created at the time someone requests them? In https://www.gutenberg.org/files/56843/ what's the difference between the -0 zip and the -h zip?
Thanks, Greg
participants (3)
-
Al Haines
-
Greg Newby
-
Tim Hare