Re: Fwd: Programmatic fetching books from Gutenberg

newer
Re: Fwd: Programmatic fetching...

older
Book #25,000

Joshua Hutchinson

28 Jul 2009 28 Jul '09

2:16 p.m.

Attachments:

attachment.html (text/html — 5.1 KB)

Show replies by date

Greg Newby

28 Jul 28 Jul

2:33 p.m.

New subject: Fwd: Programmatic fetching books from Gutenberg

On Tue, Jul 28, 2009 at 02:16:15PM +0000, Joshua Hutchinson wrote:

...

Any chance of creating on the fly zips of some of the books?Â For instance, the audio books are huge and usually divided along chapter lines.Â Single file zips are very useful (and something we've done on some of them manually) but the space waste is huge.Â On the fly zipping of those files would save huge in storage space.

Josh

Somebody would need to write the software :) Zipping an mp3 is not a winning strategy: they really don't compress much, if at all. Putting multiple mp3 files for a single eBook in one file, on the fly, would be a great move - making it easier to download a group of files. A more general approach would be to let visitors to www.gutenberg.org put their selected files (including those generated on-the-fly) on a bookshelf (i.e., shopping cart), then download in one big file, or several small ones. This would involve some fairly significant additions to the current PHP-based back-end at www.gutenberg.org, but is certainly not a huge technical feat. -- Greg

...

On Jul 28, 2009, Greg Newby <gbnewby@pglaf.org> wrote:

On Tue, Jul 28, 2009 at 09:16:41AM +0200, Ralf Stephan wrote: > I confirm that neither the Plucker nor the Mobile formats > are mentioned in the catalog file. Do you have an > explanation, Marcello?

I believe Marcello is out on vacation for 2 weeks.

But I know the explanation: the epub, mobi and a few other formats are not part of the Project Gutenberg collection's files, so not part of the database.

They are generated on-demand (or cached if they were generated recently enough), from HTML or text.

We are planning many more "on the fly" conversion options for the future. I have one for a mobile eBook format (for cell phones), and hope to have a PDF converter (with lots of options). We've been working on some text-to-speech converters, too, but that work has gone slowly.

The catalog file only tracks the actual files that are stored as part of the collection (stuff you can view while navigating the directory tree via FTP or other methods). -- Greg

> On Jul 27, 2009, at 8:42 PM, David A. Desrosiers wrote: > >> On Mon, Jul 27, 2009 at 1:45 PM, Ralf Stephan<[1]ralf@ark.in-berlin.de> >> wrote: >>> My, can't we admit that XPath is a bit over our head, >>> so we prefer confronting the admin we're supposed >>> to be cooperating with? Wrt resources, my guess it's >>> about par traffic-wise (1-5k per book vs. megabytes >>> of RDF) but much better CPU-wise. That is, if you don't >>> want the RDF for other fine things like metadata etc. >> >> I think you've missed my point. >> >> The RDF flat-out cannot tell me which of the target _formats_ are >> available for immediate download to the users. I'm not looking for >> which _titles_ are available in the catalog, I'm looking for which >> _formats_ are available. Also note that I'm already parsing the feeds >> to see what the top 'n' titles are already, so parsing XML via >> whatever methods I need is not the blocker here. >> >> Let me give you an example of two titles available in the catalog: >> >> VergÃ¤nglichkeit by Sigmund Freud >> [2]http://www.gutenberg.org/cache/plucker/29514/29514 >> >> The Lost Word by Henry Van Dyke >> [3]http://www.gutenberg.org/cache/plucker/4384/4384 >> >> Both of these _titles_ are available in the Gutenberg catalog, but the >> second one is not available in the Plucker _format_ for immediate >> download. Big difference from parsing title availability from the >> catalog.rdf file. >> >> Make sense now? >> _______________________________________________ >> gutvol-d mailing list >> [4]gutvol-d@lists.pglaf.org >> [5]http://lists.pglaf.org/mailman/listinfo/gutvol-d > > Ralf Stephan > [6]http://www.ark.in-berlin.de > pub 1024D/C5114CB2 2009-06-07 [expires: 2011-06-06] > Key fingerprint = 76AE 0D21 C06C CBF9 24F8 7835 1809 DE97 C511 > 4CB2 > > > > > _______________________________________________ > gutvol-d mailing list > [7]gutvol-d@lists.pglaf.org > [8]http://lists.pglaf.org/mailman/listinfo/gutvol-d _______________________________________________ gutvol-d mailing list [9]gutvol-d@lists.pglaf.org [10]http://lists.pglaf.org/mailman/listinfo/gutvol-d

References

Visible links 1. mailto:ralf@ark.in-berlin.de 2. http://www.gutenberg.org/cache/plucker/29514/29514 3. http://www.gutenberg.org/cache/plucker/4384/4384 4. mailto:gutvol-d@lists.pglaf.org 5. http://lists.pglaf.org/mailman/listinfo/gutvol-d 6. http://www.ark.in-berlin.de/ 7. mailto:gutvol-d@lists.pglaf.org 8. http://lists.pglaf.org/mailman/listinfo/gutvol-d 9. mailto:gutvol-d@lists.pglaf.org 10. http://lists.pglaf.org/mailman/listinfo/gutvol-d

...

_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Joey Smith

29 Jul 29 Jul

12:24 a.m.

New subject: Fwd: Programmatic fetching books from Gutenberg

On Tue, Jul 28, 2009 at 07:33:06AM -0700, Greg Newby wrote:

...

Somebody would need to write the software :)

Zipping an mp3 is not a winning strategy: they really don't compress much, if at all.

Putting multiple mp3 files for a single eBook in one file, on the fly, would be a great move - making it easier to download a group of files.

A more general approach would be to let visitors to www.gutenberg.org put their selected files (including those generated on-the-fly) on a bookshelf (i.e., shopping cart), then download in one big file, or several small ones.

This would involve some fairly significant additions to the current PHP-based back-end at www.gutenberg.org, but is certainly not a huge technical feat. -- Greg

Where can one find the code for the "current PHP-based back-end at www.gutenberg.org" to begin doing looking into how feasible this would be?

Greg Newby

1:27 p.m.

New subject: Fwd: Programmatic fetching books from Gutenberg

On Tue, Jul 28, 2009 at 06:24:08PM -0600, Joey Smith wrote:

...

On Tue, Jul 28, 2009 at 07:33:06AM -0700, Greg Newby wrote:

...
Somebody would need to write the software :)

Zipping an mp3 is not a winning strategy: they really don't compress much, if at all.

Putting multiple mp3 files for a single eBook in one file, on the fly, would be a great move - making it easier to download a group of files.

A more general approach would be to let visitors to www.gutenberg.org put their selected files (including those generated on-the-fly) on a bookshelf (i.e., shopping cart), then download in one big file, or several small ones.

This would involve some fairly significant additions to the current PHP-based back-end at www.gutenberg.org, but is certainly not a huge technical feat. -- Greg

Where can one find the code for the "current PHP-based back-end at www.gutenberg.org" to begin doing looking into how feasible this would be?

Thanks for your interest :) It isn't bundled up for download anywhere. We'll probably need to wait for Marcello's return from vacation to provide details on how to add components like this. The current system is modular & (I think) well-organized, but complex...including lots of stuff that readers never see (such as the cataloger interface and various programs that add new files). Plus, as you know, there is a lot of stuff that is in the Wiki, rather than PHP. The Wiki might be where new features could be added, or there might be modules "out there" that could make it easier. I did grab catalog/world/bibrec.php , where bibrecs like this are made: http://www.gutenberg.org/etext/11 It is below. This should give you an idea how where various things are tied in from the database, the on-disk cached records, and stuff that is generated on the fly. The various .phh files it references (which cascade to include a whole bunch of stuff) are mostly for presentation (html and css), not functionality. A bookshelf/shopping cart would probably be a brand new set of files, with just a little overlap with the existing php. It would need to access the database, and presumably would need a table or two to keep track of bookshelf users & entries. (Maybe a separate database...maybe part of the Wiki instead of a standalone set of PHP programs.) Cookies, or something similar, could be used to track user sessions and their bookshelves/shopping carts/whatever, and add an entry to various pages at www.gutenberg.org for them to access it (sort of like a regular ecommerce site). -------- bibrec.php <?php include_once ("pgcat.phh"); $cli = php_sapi_name () == "cli"; if ($cli) { $fk_books = intval ($_SERVER['argv'][1]); } else { $devserver = preg_match ("/www-dev/", $_SERVER['HTTP_HOST']); if ($devserver) { nocache (); } getint ("fk_books"); } $db = $config->db (); $keywords = array (); $frontispiece = null; $category = 0; $newfilesys = false; $help = "/wiki/Gutenberg:Help_on_Bibliographic_Record_Page"; $helpicon = "<img src=\"/pics/help.png\" class=\"helpicon\" alt=\"[help]\"$config->endtag>"; $db->exec ("select * from mn_books_categories where fk_books = $fk_books order by fk_categories"); if ($db->FirstRow ()) { $category = $db->get ("fk_categories", SQLINT); } $friendlytitle = friendlytitle ($fk_books, 80); $config->description = htmlspecialchars ("Download the free {$category_descriptions[$category]}: $friendlytitle"); for ($i = 0; $i < 26; ++$i) { $base32[sprintf ("%05b", $i)] = chr (0x41 + $i); } for ($i = 26; $i < 32; ++$i) { $base32[sprintf ("%05b", $i)] = chr (0x32 + $i - 26); } // find best file for recode facility class recode_candidate { function recode_candidate () { $this->score = 0; $this->fk_files = null; $this->filename = null; $this->encoding = null; $this->type = null; } } function find_recode_candidate ($fk_books) { global $db; $candidate = new recode_candidate (); $db->exec ("select pk, filename, fk_encodings from files " . "where fk_books = $fk_books and fk_filetypes = 'txt' " . "and fk_compressions = 'none' and diskstatus = 0 and obsoleted = 0"); if ($db->FirstRow ()) { do { $tmp = new recode_candidate (); $tmp->fk_files = $db->get ("pk", SQLINT); $tmp->filename = $db->get ("filename", SQLCHAR); $tmp->encoding = $db->get ("fk_encodings", SQLCHAR); if ((!isset ($tmp->encoding) || $tmp->encoding == "us-ascii")) { $tmp->score = 1; $tmp->encoding = "ASCII"; } if ($tmp->encoding == "big5") { $tmp->score = 2; $tmp->encoding = "BIG-5"; } if ($tmp->encoding == "euc-kr") { $tmp->score = 2; $tmp->encoding = "EUC-KR"; } if ($tmp->encoding == "Shift_JIS") { $tmp->score = 2; $tmp->encoding = "SHIFT-JIS"; } if (!strncmp ($tmp->encoding, "iso-", 4)) { $tmp->score = 3; } if (!strncmp ($tmp->encoding, "windows-", 8)) { $tmp->score = 4; } if ($tmp->encoding == "utf-8") { $tmp->score = 5; $tmp->encoding = "UTF-8"; } if ($tmp->score > $candidate->score) { $candidate = $tmp; } } while ($db->NextRow ()); } return $candidate; } function find_plucker_candidate ($fk_books) { global $db; $candidate = new recode_candidate (); $db->exec ("select pk, filename, fk_encodings, fk_filetypes from files " . "where fk_books = $fk_books and (fk_filetypes = 'txt' or fk_filetypes = 'html')" . "and fk_compressions = 'none' and diskstatus = 0 and obsoleted = 0"); if ($db->FirstRow ()) { do { $tmp = new recode_candidate (); $tmp->fk_files = $db->get ("pk", SQLINT); $tmp->filename = $db->get ("filename", SQLCHAR); $tmp->encoding = $db->get ("fk_encodings", SQLCHAR); $tmp->type = $db->get ("fk_filetypes", SQLCHAR); if ((!isset ($tmp->encoding) || $tmp->encoding == "us-ascii")) { $tmp->score = 1; } if ($tmp->encoding == "iso-8859-1") { $tmp->score = 2; } /* if ($tmp->encoding == "windows-1252") { $tmp->score = 3; } */ if ($tmp->type == "html") { $tmp->score = 4; } if ($tmp->score > $candidate->score) { $candidate = $tmp; } } while ($db->NextRow ()); } return $candidate; } function base32_encode ($in) { global $base32; $bits = ""; $in = @pack ("H*", $in); $len = strlen ($in); for ($i = 0; $i < $len; $i++) { $bits .= sprintf ("%08b", ord ($in{$i})); } if ($mod = strlen ($bits) % 5) { $bits .= str_repeat ("0", 5 - $mod); } return strtr ($bits, $base32); } class DownloadColumn extends dbtSimpleColumn { function DownloadColumn () { global $help, $helpicon; parent::dbtSimpleColumn (null, "Download Links <a href=\"$help#Download_Links\" title=\"Explain Download Links.\">$helpicon</a>", "pgdbfilesdownload"); } function Data ($db) { global $config, $friendlytitle, $fk_books, $newfsbasedir; $filename = $db->get ("filename", SQLCHAR); $extension = ""; if (preg_match ("/(\.[^.]+)$/", $filename, $matches)) { $extension = $matches[1]; } $dir = etext2dir ($fk_books); if (preg_match ("!^$dir!", $filename)) { $symlink = preg_replace ("!^$dir!", $newfsbasedir, $filename); } else { $symlink = "$config->downloadbase/$filename"; } $links = array (); $links[] = "<a href=\"$symlink\" title=\"Download from ibiblio.org.\"><span style=\"font-weight: bold\">main site</span></a>"; $links[] = "<a href=\"$config->world/mirror-redirect?file=$filename\" title=\"Download from mirror site.\" rel=\"nofollow\">mirror sites</a>"; $sha1 = base32_encode ($db->get ("sha1hash", SQLCHAR)); $tt = base32_encode ($db->get ("tigertreehash", SQLCHAR)); $links[] = "<a href=\"magnet:?xt=urn:sha1:$sha1" . "&xt=urn:kzhash:" . $db->get ("kzhash", SQLCHAR) . "&xt=urn:ed2k:" . $db->get ("ed2khash", SQLCHAR) . "&xt=urn:bitprint:$sha1.$tt" . "&xs=http://$config->domain$symlink" . "&dn=" . urlencode ("$friendlytitle$extension") . "\" title=\"Magnetlink to download from P2P network.\">P2P</a>"; return "<td class=\"pgdbfilesdownload\">" . join (" ", $links) . "</td>"; } } $array = array (); $db->exec ("select * from books where pk = $fk_books;"); if (!$db->FirstRow ()) { error_msg ("No etext no. $fk_books."); } $release_date = $db->get ("release_date"); $copyrighted = $db->get ("copyrighted") ? "Copyrighted. You may download this ebook but you may be limited in other uses. Check the license inside the ebook." : "Not copyrighted in the United States. If you live elsewhere check the laws of your country before downloading this ebook."; $db->exec ( "select * from authors, roles, mn_books_authors where mn_books_authors.fk_books = $fk_books and authors.pk = mn_books_authors.fk_authors and roles.pk = mn_books_authors.fk_roles order by role, author" ); $db->calcfields ["c_author"] = new CalcFieldAuthorDate (); if ($db->FirstRow ()) { do { $pk = $db->get ("fk_authors", SQLINT); $name = $db->get ("c_author", SQLCHAR); $role = htmlspecialchars ($db->get ("role", SQLCHAR)); $array [] = preg_replace ("/ /", " ", $role); $array [] = "<a href=\"/browse/authors/" . find_browse_page ($name) . "#a$pk\">$name</a>"; $keywords [] = htmlspecialchars ($db->get ("author", SQLCHAR)); } while ($db->NextRow ()); } $db->exec ("select attributes.*, attriblist.name, attriblist.caption from attributes, attriblist " . "where attributes.fk_books = $fk_books and " . "attributes.fk_attriblist = attriblist.pk " . "order by attriblist.name;"); if ($db->FirstRow ()) { do { $note = htmlspecialchars ($db->get ("text", SQLCHAR)); $caption = htmlspecialchars ($db->get ("caption", SQLCHAR)); $note = preg_replace ("/\n/", "<br$config->endtag>", $note); if ($caption) { $name = $db->get ("name", SQLCHAR); switch (intval ($name)) { case 901: $note = "<a href=\"$note?nocount\"><img src=\"$note?nocount\" title=\"$caption\" alt=\"$caption\" $config->endtag></a>"; break; case 902: case 903: $note = "<a href=\"$note?nocount\">$caption</a>"; break; case 10: $note = "$note <img src=\"/pics/link.png\" alt=\"\" $config->endtag> <a href=\"http://lccn.loc.gov/$note\" title=\"Look up this book in the Library of Congress catalog.\">LoC catalog record</a>"; break; default: $note = strip_marc_subfields ($note); if (substr ($name, 0, 1) == '5') { $patterns = array ("/http:\/\/\S+/", "/#(\d+)/"); $replaces = array ("<a href=\"$0\">$0</a>", "<a href=\"/ebooks/$1\">$0</a>"); $note = preg_replace ($patterns, $replaces, $note); } } $array [] = preg_replace ("/ /", " ", $caption); $array [] = $note; } } while ($db->NextRow ()); } $db->exec ("select * from langs, mn_books_langs where langs.pk = mn_books_langs.fk_langs and mn_books_langs.fk_books = $fk_books;" ); if ($db->FirstRow ()) { do { $pk = $db->get ("pk", SQLCHAR); $lang = htmlspecialchars ($db->get ("lang", SQLCHAR)); $array [] = "Language"; if ($pk != 'en') { $array [] = "<a href=\"/browse/languages/$pk\">$lang</a>"; } else { $array [] = $lang; } } while ($db->NextRow ()); } $db->exec ("select * from loccs, mn_books_loccs where loccs.pk = mn_books_loccs.fk_loccs and mn_books_loccs.fk_books = $fk_books;" ); if ($db->FirstRow ()) { do { $pk = $db->get ("pk", SQLCHAR); $pkl = strtolower ($pk); $locc = htmlspecialchars ($db->get ("locc", SQLCHAR)); $array [] = "LoC Class"; $array [] = "<a href=\"/browse/loccs/$pkl\">$pk: $locc</a>"; $keywords [] = $locc; } while ($db->NextRow ()); } $db->exec ("select * from subjects, mn_books_subjects where subjects.pk = mn_books_subjects.fk_subjects and mn_books_subjects.fk_books = $fk_books;" ); if ($db->FirstRow ()) { do { $subject = htmlspecialchars ($db->get ("subject", SQLCHAR)); // $url = urlencode ($subject); $array [] = "Subject"; // $array [] = "<a href=\"$config->world/results?subject=$url\">$subject</a>"; $array [] = $subject; $keywords [] = $subject; } while ($db->NextRow ()); } $db->exec ("select * from categories, mn_books_categories where categories.pk = mn_books_categories.fk_categories and mn_books_categories.fk_books = $fk_books;"); if ($db->FirstRow ()) { do { $pk = $db->get ("pk", SQLINT); $category = $db->get ("category", SQLCHAR); $array [] = "Category"; $array [] = "<a href=\"/browse/categories/$pk\">$category</a>"; } while ($db->NextRow ()); } $array [] = "EText-No."; $array [] = $fk_books; $array [] = "Release Date"; $array [] = $release_date; $array [] = "Copyright Status"; $array [] = $copyrighted; $db->exec ("select count (*) as cnt from reviews.reviews where fk_books = $fk_books"); if (($cnt = $db->get ("cnt", SQLINT)) > 0) { $s = ($cnt == 1) ? "is a review" : "are $cnt reviews"; $array [] = "Reviews"; $array [] = "<a href=\"$config->world/reviews?fk_books=$fk_books\">There $s of this book available.</a>"; } $newfsbasedir = "$config->files/$fk_books/"; $db->exec ("select filename from files where fk_books = $fk_books and filename ~ '^[1-9]/'"); if ($db->FirstRow ()) { $newfilesys = true; $array [] = "Base Directory"; $array [] = "<a href=\"$newfsbasedir\">$newfsbasedir</a>"; } for ($i = 0; $i < count ($keywords); $i++) { $keywords[$i] = preg_replace ("/,\s*/", " ", $keywords[$i]); } $config->keywords = htmlspecialchars (join (", ", $keywords)) . ", $config->keywords"; $recode_candidate = find_recode_candidate ($fk_books); $plucker_candidate = find_plucker_candidate ($fk_books); $offer_recode = $recode_candidate->score > 0; $offer_plucker = $plucker_candidate->score > 0; /////////////////////////////////////////////////////////////////////////////// // start page output pageheader (htmlspecialchars ($friendlytitle)); $manubar = array (); $menubar[] = "<a href=\"$help\" title=\"Explain this page.\" rel=\"Help\">Help</a>"; if ($offer_recode) { $menubar[] = "<a href=\"$config->world/readfile?fk_files=$recode_candidate->fk_files\" title=\"Read this book online.\"rel=\"nofollow\">Read online</a>"; } p (join (" — ", $menubar)); echo ("<div class=\"pgdbdata\">\n\n"); $table = new BibrecTable (); $table->summary = "Bibliographic data of author and book."; $table->toprows = $array; $table->PrintTable (null, "Bibliographic Record <a href=\"$help#Table:_Bibliographic_Record\" title=\"Explain this table.\">$helpicon</a>"); echo ("</div>\n\n"); $db->exec ("select filetype, sortorder, compression, " . "case files.fk_filetypes when 'txt' then fk_encodings when 'mp3' then fk_encodings else null end as fk_encodings, " . "edition, filename, filesize, sha1hash, kzhash, tigertreehash, ed2khash " . "from files " . "left join filetypes on files.fk_filetypes = filetypes.pk " . "left join compressions on files.fk_compressions = compressions.pk " . "where fk_books = $fk_books and obsoleted = 0 and diskstatus = 0 " . "order by edition desc, sortorder, filetype, fk_encodings, compression, filename;"); $db->calcfields ["c_hrsize"] = new CalcFieldHRSize (); echo ("<div class=\"pgdbfiles\">\n\n"); echo ("<h2>Download this ebook for free</h2>\n\n"); class FilesTable extends ListTable { function FilesTable () { global $newfilesys, $offer_recode, $help, $helpicon; if (!$newfilesys) { $this->AddSimpleColumn ("edition", "Edition", "narrow pgdbfilesedition"); } $footnote = ($offer_recode) ? " \xC2\xB9" : ""; $this->AddSimpleColumn ("filetype", "Format <a href=\"$help#Format\" title=\"Explain Format.\">$helpicon</a>", "pgdbfilesformat"); $this->AddSimpleColumn ("fk_encodings", "Encoding$footnote <a href=\"$help#Encoding\" title=\"Explain Encoding.\">$helpicon</a>", "pgdbfilesencoding"); $this->AddSimpleColumn ("compression", "Compression <a href=\"$help#Compression\" title=\"Explain Compression.\">$helpicon</a>", "pgdbfilescompression"); $this->AddSimpleColumn ("c_hrsize", "Size", "right narrow pgdbfilessize"); $this->AddColumnObject (new DownloadColumn ()); $this->limit = -1; } } $array = array (); function epub_file ($fk_books) { return "/cache/epub/$fk_books/pg$fk_books.epub"; } function epub_images_file ($fk_books) { return "/cache/epub/$fk_books/pg${fk_books}-images.epub"; } function mobi_file ($fk_books) { return "/cache/epub/$fk_books/pg$fk_books.mobi"; } function mobi_images_file ($fk_books) { return "/cache/epub/$fk_books/pg${fk_books}-images.mobi"; } $epub = epub_file ($fk_books); $epub_images = epub_images_file ($fk_books); $mobi = mobi_file ($fk_books); $mobi_images = mobi_images_file ($fk_books); // epub stuff if (is_readable ("$config->documentroot$epub") && filesize ("$config->documentroot$epub") > 1024) { if (!$newfilesys) { $array [] = ""; } $array [] = "EPUB (experimental) <a href=\"$help#EPUB\" title=\"Explain EPUB.\">$helpicon</a>"; $array [] = ""; $array [] = ""; $array [] = human_readable_size (filesize ("$config->documentroot$epub")); $array [] = "<a href=\"$epub\" title=\"Download from ibiblio.org.\"><span style=\"font-weight: bold\">main site</span></a>"; } if (is_readable ("$config->documentroot$epub_images") && filesize ("$config->documentroot$epub_images") > 1024) { if (!$newfilesys) { $array [] = ""; } $array [] = "EPUB with images (experimental) <a href=\"$help#EPUB\" title=\"Explain EPUB.\">$helpicon</a>"; $array [] = ""; $array [] = ""; $array [] = human_readable_size (filesize ("$config->documentroot$epub_images")); $array [] = "<a href=\"$epub_images\" title=\"Download from ibiblio.org.\"><span style=\"font-weight: bold\">main site</span></a>"; } // mobi stuff if (is_readable ("$config->documentroot$mobi") && filesize ("$config->documentroot$mobi") > 1024) { if (!$newfilesys) { $array [] = ""; } $array [] = "MOBI (experimental) <a href=\"$help#MOBI\" title=\"Explain MOBI.\">$helpicon</a>"; $array [] = ""; $array [] = ""; $array [] = human_readable_size (filesize ("$config->documentroot$mobi")); $array [] = "<a href=\"$mobi\" title=\"Download from ibiblio.org.\"><span style=\"font-weight: bold\">main site</span></a>"; } if (is_readable ("$config->documentroot$mobi_images") && filesize ("$config->documentroot$mobi_images") > 1024) { if (!$newfilesys) { $array [] = ""; } $array [] = "MOBI with images (experimental) <a href=\"$help#MOBI\" title=\"Explain MOBI.\">$helpicon</a>"; $array [] = ""; $array [] = ""; $array [] = human_readable_size (filesize ("$config->documentroot$mobi_images")); $array [] = "<a href=\"$mobi_images\" title=\"Download from ibiblio.org.\"><span style=\"font-weight: bold\">main site</span></a>"; } // plucker stuff if ($offer_plucker) { if (!$newfilesys) { $array [] = ""; } $array [] = "Plucker <a href=\"$help#Plucker\" title=\"Explain Plucker.\">$helpicon</a>"; $array [] = ""; $array [] = ""; $array [] = "unknown"; $array [] = "<a href=\"/cache/plucker/$fk_books/$fk_books\" title=\"Download from ibiblio.org.\"><span style=\"font-weight: bold\">main site</span></a>"; # gbn: mobile ebooks. If Plucker conversion works, this should work, too: if (!$newfilesys) { $array [] = ""; } $array [] = "Mobile eBooks <a href=\"$help#Mobile\" title=\"Explain Mobile.\">$helpicon</a>"; $array [] = ""; $array [] = ""; $array [] = "unknown"; $array [] = "<a href=\"mobile/mobile.php?fk_books=$fk_books\" title=\"Download from ibiblio.org.\"><span style=\"font-weight: bold\">main site</span></a>"; } $table = new FilesTable (); $table->summary = "Table of available file types and sizes."; $table->toprows = $array; $table->PrintTable ($db, "Formats Available For Download <a href=\"$help#Table:_Formats_Available_For_Download\" title=\"Explain this table.\">$helpicon</a>", "pgdbfiles"); echo ("</div>\n\n"); if ($offer_recode) { $recode_encoding = strtoupper ($recode_candidate->encoding); p ("\xC2\xB9 If you need a special character set, try our " . "<a href=\"$config->world/recode?file=$recode_candidate->filename" . "&from=$recode_encoding\" rel=\"nofollow\">" . "online recoding service</a>."); } pagefooter (0); // implements a page cache // if this page is viewed it will write a static version // into the etext cache directory // MultiViews and mod_rewrite then take care to serve // the static page to the next requester $cachedir = "$config->documentroot/cache/bibrec/$fk_books"; umask (0); mkdir ($cachedir); $cachefile = "$cachedir/$fk_books.html.utf8"; $hd = fopen ($cachefile, "w"); if ($hd) { fwrite ($hd, $output); fclose ($hd); } $hd = gzopen ("$cachefile.gz", "w9"); if ($hd) { gzwrite ($hd, $output); gzclose ($hd); } exit (); ?>

Joey Smith

3:48 p.m.

New subject: Fwd: Programmatic fetching books from Gutenberg

On Wed, Jul 29, 2009 at 06:27:08AM -0700, Greg Newby wrote:

...

Thanks for your interest :)

It isn't bundled up for download anywhere. We'll probably need to wait for Marcello's return from vacation to provide details on how to add components like this. The current system is modular & (I think) well-organized, but complex...including lots of stuff that readers never see (such as the cataloger interface and various programs that add new files). Plus, as you know, there is a lot of stuff that is in the Wiki, rather than PHP. The Wiki might be where new features could be added, or there might be modules "out there" that could make it easier.

I did grab catalog/world/bibrec.php , where bibrecs like this are made: http://www.gutenberg.org/etext/11

It is below. This should give you an idea how where various things are tied in from the database, the on-disk cached records, and stuff that is generated on the fly. The various .phh files it references (which cascade to include a whole bunch of stuff) are mostly for presentation (html and css), not functionality.

A bookshelf/shopping cart would probably be a brand new set of files, with just a little overlap with the existing php. It would need to access the database, and presumably would need a table or two to keep track of bookshelf users & entries. (Maybe a separate database...maybe part of the Wiki instead of a standalone set of PHP programs.) Cookies, or something similar, could be used to track user sessions and their bookshelves/shopping carts/whatever, and add an entry to various pages at www.gutenberg.org for them to access it (sort of like a regular ecommerce site).

You know, now that I look at this code, I recall looking over this stuff with Marcello once, years ago...doesn't look like it has changed much. I'll drop a note to Marcello and wait to hear from him. Thanks, Greg!

David A. Desrosiers

4:13 p.m.

New subject: Fwd: Programmatic fetching books from Gutenberg

On Wed, Jul 29, 2009 at 11:48 AM, Joey Smith<joey@joeysmith.com> wrote:

...

You know, now that I look at this code, I recall looking over this stuff with Marcello once, years ago...doesn't look like it has changed much. I'll drop a note to Marcello and wait to hear from him.

That code could easily be 1/3 the size, but if it works... no need to go breaking things. :)

Marcello Perathoner

9 Aug 9 Aug

8 p.m.

New subject: Fwd: Programmatic fetching books from Gutenberg

Greg Newby wrote:

...

A more general approach would be to let visitors to www.gutenberg.org put their selected files (including those generated on-the-fly) on a bookshelf (i.e., shopping cart), then download in one big file, or several small ones.

Or we could tell them how to use the download manager in their browsers ... Seriously, whenever I download a couple of books, I just click away and let the downloads complete in the background. I concur that there has something to be done for the audio files. Even just clicking on 10+ files can be bothersome. But I always favor the lowest-tech solution. So why not add a M3U playlist to the directory? M3U just a text file with a list of urls and is supported by Winamp and many other players. http://gonze.com/playlists/playlist-format-survey.html

Greg Newby

11 Aug 11 Aug

5:25 a.m.

New subject: Fwd: Programmatic fetching books from Gutenberg

On Sun, Aug 09, 2009 at 10:00:23PM +0200, Marcello Perathoner wrote:

...

Greg Newby wrote:

...
A more general approach would be to let visitors to www.gutenberg.org put their selected files (including those generated on-the-fly) on a bookshelf (i.e., shopping cart), then download in one big file, or several small ones.

Or we could tell them how to use the download manager in their browsers ...

Seriously, whenever I download a couple of books, I just click away and let the downloads complete in the background.

A bookshelf (or shopping cart, or call it something else) isn't just about managing downloads started immediately upon finding something of interest. It's about keeping track of books to download in the future, books already downloaded, and books to start downloading later (say, overnight or when someone goes from home to a location with higher bandwidth). Sure, people can use browser bookmarks for some of this, but: - bookmarks need to then be managed, for after the book is downloaded - bookmarking the bibrec page is easy, but right-click "bookmark target as..." to actually select a file format is harder for our many naive users - A larger # of bookmarks for future reading won't have very useful functionality; a shopping cart model at www.gutenberg.org could have some more useful features Probably another site will implement this for the Gutenberg content, eventually. Some of the resellers already have this type of thing. -- Greg PS: I really like adding an m3u playlist file to audio content

...

I concur that there has something to be done for the audio files. Even just clicking on 10+ files can be bothersome.

But I always favor the lowest-tech solution.

So why not add a M3U playlist to the directory? M3U just a text file with a list of urls and is supported by Winamp and many other players.

http://gonze.com/playlists/playlist-format-survey.html

Marcello Perathoner

9 Aug 9 Aug

7:01 p.m.

New subject: Fwd: Programmatic fetching books from Gutenberg

Joshua Hutchinson wrote:

...

Any chance of creating on the fly zips of some of the books? For instance, the audio books are huge and usually divided along chapter lines. Single file zips are very useful (and something we've done on some of them manually) but the space waste is huge. On the fly zipping of those files would save huge in storage space.

The other way round would be much more useful: store only a zipped version of everything. All browsers support unzipping on-the-fly (all bibrec pages we serve are in a compressed format since years now) so the only people who'd notice anything was the ftp users. (But he who has mastered to type "ftp" can also be made to type "unzip", so no problem there.)

5800

Age (days ago)

5814

Last active (days ago)

List overview

Download

8 comments

5 participants

participants (5)

David A. Desrosiers
Greg Newby
Joey Smith
Joshua Hutchinson
Marcello Perathoner