>so if you want to find "sun tzu", you'd search
for "sun tzu", and if that didn't work, then
you'd search for "sun" and "tzu" separately...

 

Okay, let’s get specific.  My favorite machine lists these things alphabetical by authorlastname, authorfirstname and I currently on my favorite machine have about 100 books on it whereas on the previous generation machine which I use less nowadays I have about 500 books.  So I get to scroll through the list of books three times to perform your “search algorithm” example.

 

But, more importantly, in the case of a reader to picks up e-books from PG and from other publishing houses, say someone who wants to collect and read everything ever written by Sir Arthur Conan Doyle, finds that his or her e-book library instead of being correctly sorted and cataloged by author now finds instead Sir Arthur Conan Doyle spread out at about five factorial locations on his or her e-book bookshelf.  Or more likely, Sherlock ends up all in one place if from one of a variety of professional publishing house, and at another location if the e-book is coming from PG. Or god knows where if purchased from Amazon “published” there by one of an infinite number of bottom feeding garage shops.

 

And why am I “o.c.d.” on these issues?  Because I have converted a few tens of thousands of PG books to e-book format and have found, *in practice*, that the issue of author names and how to “correctly” extract them from the data PG provides – or doesn’t provide – in practice, not in theory, ends up being one of the real stumbling blocks.  Certainly an extensible format like TEI, if it contained correctly coded authorlastname, authorfirstname information, would make extraction of correct “spine” information trivial.  Then the problem reduces to how in the PG system to get a “correct” canonical form of authorlastname, authorfirstname, and the answer is some real human being has to do that research --  which is perhaps most appropriately done as part of the copyright clearance process which I think frequently refers to LoC in the first place?

 

Or as another simply example of these issues based on an author I have recently worked on, enter “James Henry” in the PG home page author slot, and compare what you get to when you enter “Henry James” and then try “Henry, James” and then try “James, Henry” and then rationalize to the readers of this list your results and why those results are the “correct” result ???