
On Wed, February 15, 2012 3:59 am, Robert Gibbins wrote:
Having recently downloaded the PG 2010 DVD (thank you very much PG), I started looking at the books: - The DVD seems to contain about 34,000 zip files. - A small number of these (about 50) seem to be zipped mp3 files (*m.zip). - About 1800 files seem to be zipped html files (*h.zip).
Does this mean that PG (at that time) only had about 1800 'real' (as opposed to generated-from-the-text-file) html books, or have I misunderstood something?
I haven't looked at the DVD, but I have looked at the mirrored file system, and have learned a few things. Yesterday I grabbed, more or less randomly, 10 HTML files from the "Top 100 (Last 30 Days) list. Of these 10, only one was generated from the Impoverished Text File. Of the remaining 9, only 2 were zipped HTML, the remaining 7 were single files handcrafted to the HTML v. 3.2 spec. These HTML files are found in the n/n/n/nnn-h folder, which is usually a peer to the zip file when it exists. I don't think searching for *h.zip will give you a true picture of the non-generated HTML files -- but I could be wrong; it's possible that the DVD is more, or different, from the file system image.