re: [gutvol-d] Categorizing PG content

14 Jul 2006

      joey said:
...
I have a 100Mb/s municipal fiber connection 
   and almost 2 terabytes of disk space available, 
   and "download[ing] vast portions of the library" 
   is not an option for me.
well joey, i do look forward to your tool,
when you find time to create it, because
these general discussions we are having
around this topic have a lot of fuzziness
about them, which must all be resolved
when one starts writing code.

so i won't respond to all your points until
i can see exactly what you meant by them.

but this point here is quite easy to deal with.
downloading the project gutenberg library
-- even the whole thing -- can be a breeze.

first of all, as is always the default with me,
i'm only concerned with one version of each
-- the "master version", in z.m.l. format --
as the other versions can be spun out of it.

second, as i said, it's reasonable to eliminate
big classes of e-texts from the downloading,
such as the human genome files, audio/video,
and books in languages that you don't read...

third, there are a lot of duplicate files where
pieces of a volume were presented separately,
and then the volume as a whole in another file.
now that we have the information (thanks greg),
those separate-piece files can easily be ignored.

fourth, there are some people who will not want
the magazines that are being added increasingly.

once you've eliminated all of these files from your
download queue, you find the list is much smaller.

on to the next step...   i have written a program that
lets a person click one button to start downloading
e-texts as a background process on their machine.

as soon as one e-text has been completely received,
the next one is requested, thus the downloading is
_relentless_, and you'd be surprised how fast it goes.

for a d.s.l. person like myself, after doing the deletions
i mentioned above, it will merely take _a_few_days_ to
download all the e-texts.   to get the _whole_ library,
it might take you a week or so.   but remember, during
this whole time, you will not have to do a single thing.
all you had to do was click that one button.

plus, you do have to enter a code every 108 minutes,
but it's just this sequence of 6 numbers, no big deal.      ;+)
...
I also don't agree with the implied assertion here 
   that having the full (or even "vast portions of the") 
   library means that users don't want help identifying 
   and locating content within that collection.
it was only because i knew some might _infer_ such an
"assertion" that i closed my post with the explicit note
that this later purpose _is_ still "handy", and therefore
should be the _focus_ of this task.   did you read that?
...
I generally avoid topics once you start weighing in on them, 
   so I may have missed the applicable portions from the last time 
   this topic came up.
well that's a remarkable admission.

since i "weigh in" on every topic that is _interesting_
and usually "start" doing so fairly early in the thread,
that must mean you're "avoiding" most of the posts,
and all the interesting threads.   life must be sad.         :+)

at any rate, i thank you for your candor.

perhaps you will thank me for mine when i tell you that
if you didn't read what i have written on this topic before,
you're likely to take a path that will end up biting your ass.

***

anyway, as i read your proposal, it's a social tagging scheme.
as a general approach, that would be one way of doing things.
again, the specifics are vital, so let us know when you have 'em.

-bowerbird

Bowerbird＠aol.com

tags

participants (1)