Re: [gutvol-d] Categorizing PG content

14 Jul 2006

      Sorry for the length, everyone, but I wanted to try and cover
in words what I was unable to cover in production software.

On Thu, Jul 13, 2006 at 05:42:29PM -0400, Bowerbird@aol.com wrote:
	...
...
finally, i'm not sure that y'all understand the major need here.
and i'm quite certain that library-school students will miss it.
answer this question:   why should we categorize the e-texts?
if your response runs along the lines of "so end-users can find
the book they want, and download it", you're on the wrong path.
that's the function catalogs used to serve, in the dead-tree world.
   ...
but in our new era of high-bandwidth and terrabyte hard-drives,
it's silly for a person to spend even mere seconds trying to decide
_whether_or_not_ to download a book.   it's _far_ more convenient
to download vast portions of the library, since they can have their
computer do it automatically while they are partying, or sleeping...
I disagree. I have a 100Mb/s municipal fiber connection and almost
2 terabytes of disk space available, and "download[ing] vast portions
of the library" is not an option for me. I don't find it difficult to
imagine that if I have a hard time accepting this answer, there are
going to be others who do so as well, with far fewer resources at
their command.
...
even the dial-up people can request the d.v.d., for free, and have
the entire p.g. library sitting on their hard-disk in a week or so...
I also don't agree with the implied assertion here that having the
full (or even "vast portions of the") library means that users don't
want help identifying and locating content within that collection.

	Of course, this means that we'll want to help people who download
the library get the catalog data that matches their portion of the
library!
...
not only is it not wise to make people spend any time "choosing",
it's at odds with the important concept of _unlimited_distribution_.
Having a catalog does not equate to making people use it. It's a
tool for those who want to make use of it. That said, let's make sure
that whatever tool(s) we come up with fit as many of the percieved needs
as we possibly can! You clearly have different ideas of the use of a
catalog than do I. As you've already enumerated some of the points of
*my* use, perhaps you could elaborate on your ideas?

	(On the other hand, if you already did this, ignore this request.
I generally avoid topics once you start weighing in on them, so I may
have missed the applicable portions from the last time this topic
came up.)

---

So, on to my proposal. I had hoped to actually be able to provide a
tool demonstrating it, but my day job interfered too much this week
to allow me to realize that hope. So instead, let me see if I can lay
out the concept.

It's based on the tagging system known as the "Debian Package
Browser" [1].

Some important parts of the idea that might be missed initially:
* Every book gets tagged initially with a placeholder value
* Wherever we can identify existing valuable tags, they are
	added to the initial load. Some examples of tags I'd want
	include: year published in PG; Author/Creator; Language;
	LoC Class; Copyright Status (sounding familiar to anyone?)
* Tags need to be nestable. This is something the Debian system
	is not able to support, but I think it's very important. One
	example Browerbird already pointed out is the Amazon.com
	categorization scheme.
* The default behaviour of the tagging system should be marking
	which of the existing tags are best applied to this book, but
	it also needs to be flexible enough to add new tags (and
	hierarchies thereof). Setting the default behaviour this way
	is one way of preventing the "del.icio.us syndrome" found in
	many folksonomies, where there are as many different ways of
	tagging a piece of content as there are users of the system.
* It should be easy, when viewing a particular ebook, to do any 
	of the following actions: view tags already on this book;
	see a list of "suggested tags", based on a weighted list of
	tags attached to content that has other tags in common with
	the current content; view other content tagged in common;
	add / remove tags.
* It needs to be easy to see all content with a particular tag
	or tagset. I'm envisioning something akin to the Flamenco [2]
	system here.

I envision a lot of things coming out of this effort, including
an easier way for people to suggest content for the "Best Of"
DVDs so that Greg doesn't have to do so much of the leg-work
himself. As people come across suggestions, they tag them, then
Greg can just pull a list of ebooks with that tag.

I've done some work on a prototype, but as I said, the real world
invaded and sapped my time. Then again, I know there are many
others on this list that are talented software developers, so
perhaps one of you will beat me to it...or propose an even better
system.

[1]: http://debian.vitavonni.de/packagebrowser/
[2]: http://flamenco.berkeley.edu/
	 If you'd like to see Flamenco at work, but don't have the
	 resources to set it up yourself, drop me a line off-list
	 and I'll provide you with a URL to one I've setup.

Re: [gutvol-d] Categorizing PG content

joey