>so if you want to find "sun tzu", you'd search
for "sun tzu", and if that didn't work, then
you'd search for "sun" and "tzu" separately...
Okay, let’s get specific. My favorite machine lists
these things alphabetical by authorlastname, authorfirstname and I currently on
my favorite machine have about 100 books on it whereas on the previous generation
machine which I use less nowadays I have about 500 books. So I get to
scroll through the list of books three times to perform your “search
algorithm” example.
But, more importantly, in the case of a reader to picks up e-books
from PG and from other publishing houses, say someone who wants to collect and
read everything ever written by Sir Arthur Conan Doyle, finds that his or her
e-book library instead of being correctly sorted and cataloged by author now
finds instead Sir Arthur Conan Doyle spread out at about five factorial
locations on his or her e-book bookshelf. Or more likely, Sherlock ends
up all in one place if from one of a variety of professional publishing house,
and at another location if the e-book is coming from PG. Or god knows where if
purchased from Amazon “published” there by one of an infinite
number of bottom feeding garage shops.
And why am I “o.c.d.” on these issues? Because
I have converted a few tens of thousands of PG books to e-book format and have
found, *in practice*, that the issue of author names and how to “correctly”
extract them from the data PG provides – or doesn’t provide –
in practice, not in theory, ends up being one of the real stumbling
blocks. Certainly an extensible format like TEI, if it contained correctly
coded authorlastname, authorfirstname information, would make extraction of
correct “spine” information trivial. Then the problem reduces
to how in the PG system to get a “correct” canonical form of authorlastname,
authorfirstname, and the answer is some real human being has to do that
research -- which is perhaps most appropriately done as part of the
copyright clearance process which I think frequently refers to LoC in the first
place?
Or as another simply example of these issues based on an author
I have recently worked on, enter “James Henry” in the PG home page
author slot, and compare what you get to when you enter “Henry James”
and then try “Henry, James” and then try “James, Henry”
and then rationalize to the readers of this list your results and why those
results are the “correct” result ???