Any chance of creating on the fly zips of some of the books? For instance, the audio books are huge and usually divided along chapter lines. Single file zips are very useful (and something we've done on some of them manually) but the space waste is huge. On the fly zipping of those files would save huge in storage space.
Josh
On Jul 28, 2009, Greg Newby <gbnewby@pglaf.org> wrote:
On Tue, Jul 28, 2009 at 09:16:41AM +0200, Ralf Stephan wrote:
> I confirm that neither the Plucker nor the Mobile formats
> are mentioned in the catalog file. Do you have an
> explanation, Marcello?
I believe Marcello is out on vacation for 2 weeks.
But I know the explanation: the epub, mobi and a few other
formats are not part of the Project Gutenberg collection's
files, so not part of the database.
They are generated on-demand (or cached if they were generated
recently enough), from HTML or text.
We are planning many more "on the fly" conversion options for
the future. I have one for a mobile eBook format (for cell
phones), and hope to have a PDF converter (with lots of options).
We've been working on some text-to-speech converters, too, but
that work has gone slowly.
The catalog file only tracks the actual files that are stored
as part of the collection (stuff you can view while navigating
the directory tree via FTP or other methods).
-- Greg
> On Jul 27, 2009, at 8:42 PM, David A. Desrosiers wrote:
>
>> On Mon, Jul 27, 2009 at 1:45 PM, Ralf Stephan<ralf@ark.in-berlin.de>
>> wrote:
>>> My, can't we admit that XPath is a bit over our head,
>>> so we prefer confronting the admin we're supposed
>>> to be cooperating with? Wrt resources, my guess it's
>>> about par traffic-wise (1-5k per book vs. megabytes
>>> of RDF) but much better CPU-wise. That is, if you don't
>>> want the RDF for other fine things like metadata etc.
>>
>> I think you've missed my point.
>>
>> The RDF flat-out cannot tell me which of the target _formats_ are
>> available for immediate download to the users. I'm not looking for
>> which _titles_ are available in the catalog, I'm looking for which
>> _formats_ are available. Also note that I'm already parsing the feeds
>> to see what the top 'n' titles are already, so parsing XML via
>> whatever methods I need is not the blocker here.
>>
>> Let me give you an example of two titles available in the catalog:
>>
>> Vergänglichkeit by Sigmund Freud
>> http://www.gutenberg.org/cache/plucker/29514/29514
>>
>> The Lost Word by Henry Van Dyke
>> http://www.gutenberg.org/cache/plucker/4384/4384
>>
>> Both of these _titles_ are available in the Gutenberg catalog, but the
>> second one is not available in the Plucker _format_ for immediate
>> download. Big difference from parsing title availability from the
>> catalog.rdf file.
>>
>> Make sense now?
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d@lists.pglaf.org
>> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>
> Ralf Stephan
> http://www.ark.in-berlin.de
> pub 1024D/C5114CB2 2009-06-07 [expires: 2011-06-06]
> Key fingerprint = 76AE 0D21 C06C CBF9 24F8 7835 1809 DE97 C511
> 4CB2
>
>
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/mailman/listinfo/gutvol-d