>and we see yet another excellent example of how
>the "metadata" b.s. is such an unproductive path.

>the o.c.d. people love to focus on these minute
>details, which make very little difference at all
>-- who cares how "van holst" is sorted?,

 

You make great big assumptions about the nature of the machines that people are reading on, and then make incorrect conclusions based on those assumptions.  Yes, if all readers are reading on desktop computers running some flavor of *nix then your conclusions may be correct.

 

But, not all readers of PG books are running *nix, or even desktops.  Many of these machines have a very different notion of “sorting” than you have in mind.

 

Which is why we just had this conversation a couple days ago, but, I guess many people didn’t get it.

 

On my favorite class of machine, which something like a million+ other readers are reading on, and more every day, “sorts” are typically done on authorlastname, where authorlastname is something provided within the book file.  That part which does not correspond to authorlastname is stored by convention in authorfirstname.  This sort information is displayed to the reader in one of two ways, both of which ought to appear sensible:

 

Authorlastname, authorfirstname

 

And

 

Authorfirstname authorlastname

 

In either case the actual sort should be on authorlastname

 

This class of machine has no notion of the idea that you can type in part of an authors’ name and search on that. Rather all the books on the machine are sorted and displayed in order by authorlastname, and you find a book by scrolling for the authorlastname in sort order within that list.

 

Why does this matter?  Consider the famous author name Sun Tzu

 

What is the last name?  Sun

 

What is the first name?  Well, no one actually knows, but historically “Tzu” which is actually an honorarium is stuck in the authorfirstname slot.  But now look what happens:

 

In the authorlastname, authorfirstname case you get:

 

Sun, Tzu

 

Which is not a bad result

 

In the Authorfirstname authorlastname case you get:

 

Tzu Sun

 

Which is an error.  Thus, perhaps, one concludes with names where family name needs to display first the encoding has to be:

 

Authorlastname: Sun Tzu

Authorfirstname: null

 

In which case both displays work out right.

 

How does one write an automatic algorithm to figure these things out from an existing gut authorlist?

 

Answer, again, is that one can not write an automatic algorithm to figure these things out because currently there isn’t enough information stored about author names, and further, how author names are sorted and displayed are based in part on library tradition, perhaps best found by researching Library of Congress for a particular author.

 

Another way of saying this is, let’s say you make the mistake of wandering into a Barnes and Noble when you were actually trying to enter the Starbucks next door.  But while in there you decide to look at the fiction stacks just for fun to see if they have your favorite author.  Where in the stacks do you look? Well, that depends on how B&N sorts on your favorite author, which in turn is based on library tradition for that particular author.

 

Yes you can try to write an algorithm to do this but then you will find that surprisingly often it breaks, because it seems that having an unusual family name is a prereq for writing a book. You can then say “oh well this is PG we really don’t care why be o.c.d.?”  But then you are producing books that work inferior, in practice, for customers, on customer’s machines, compared to the other publishing houses, making PG look like amateur hour.  You might say “well then they shouldn’t have bought that machine rather they should buy my favorite choice of machine.”  But customers tend to consider that attitude towards their choice of machine a sign of hostility towards the customer by PG – which I guess is why PG already provides literally about 80 different file formats for customers.  I believe PG needs to remain agnostic towards the customers’ choice of machine if PG wants to retain the customer, which means that PG needs to understand how the differing classes of machines actually work, and what their constraints are.

 

Getting authors, titles, and sort orders “correct” IS pretty basic.  Not easy, but basic.