
David A. Desrosiers wrote:
On Mon, Jul 27, 2009 at 1:45 PM, Ralf Stephan<ralf@ark.in-berlin.de> wrote:
My, can't we admit that XPath is a bit over our head, so we prefer confronting the admin we're supposed to be cooperating with? Wrt resources, my guess it's about par traffic-wise (1-5k per book vs. megabytes of RDF) but much better CPU-wise. That is, if you don't want the RDF for other fine things like metadata etc.
I think you've missed my point.
The RDF flat-out cannot tell me which of the target _formats_ are available for immediate download to the users. I'm not looking for which _titles_ are available in the catalog, I'm looking for which _formats_ are available. Also note that I'm already parsing the feeds to see what the top 'n' titles are already, so parsing XML via whatever methods I need is not the blocker here.
Let me give you an example of two titles available in the catalog:
Vergänglichkeit by Sigmund Freud http://www.gutenberg.org/cache/plucker/29514/29514
The Lost Word by Henry Van Dyke http://www.gutenberg.org/cache/plucker/4384/4384
Both of these _titles_ are available in the Gutenberg catalog, but the second one is not available in the Plucker _format_ for immediate download. Big difference from parsing title availability from the catalog.rdf file.
So you are doing a HEAD on the cache location? I hope you don't have many of these in the field, because you're going to look very sorry whenever the location of the cache changes. (It will! I give you fair notice for free :-) )
Make sense now?
No. Why is that "immediate download" bit so important for you? You will get a completely random set of files. (A cached plucker file expires 7 days after *generation* not after the last *access*. So all you get is the set of files generated in the last 7 days.) And a wrong set too. The first file could have been deleted on the server long before you finished your barrage of HEAD request.