>and we see yet another excellent example
of how
>the "metadata" b.s. is such an
unproductive path.
>the o.c.d. people love to focus on these
minute
>details, which make very little difference
at all
>-- who cares how "van holst" is
sorted?, …
You make great
big assumptions about the nature of the machines that people are reading on,
and then make incorrect conclusions based on those assumptions. Yes, if
all readers are reading on desktop computers running some flavor of *nix then
your conclusions may be correct.
But, not all
readers of PG books are running *nix, or even desktops. Many of these
machines have a very different notion of “sorting” than you have in
mind.
Which is why we
just had this conversation a couple days ago, but, I guess many people didn’t
get it.
On my favorite class
of machine, which something like a million+ other readers are reading on, and
more every day, “sorts” are typically done on authorlastname, where
authorlastname is something provided within the book file. That part
which does not correspond to authorlastname is stored by convention in
authorfirstname. This sort information is displayed to the reader in one
of two ways, both of which ought to appear sensible:
Authorlastname,
authorfirstname
And
Authorfirstname
authorlastname
In either case
the actual sort should be on authorlastname
This class of
machine has no notion of the idea that you can type in part of an authors’
name and search on that. Rather all the books on the machine are sorted and
displayed in order by authorlastname, and you find a book by scrolling for the
authorlastname in sort order within that list.
Why does this
matter? Consider the famous author name Sun Tzu
What is the
last name? Sun
What is the
first name? Well, no one actually knows, but historically “Tzu”
which is actually an honorarium is stuck in the authorfirstname slot. But
now look what happens:
In the
authorlastname, authorfirstname case you get:
Sun, Tzu
Which is not a
bad result
In the Authorfirstname
authorlastname case you get:
Tzu Sun
Which is an
error. Thus, perhaps, one concludes with names where family name needs to
display first the encoding has to be:
Authorlastname:
Sun Tzu
Authorfirstname:
null
In which case
both displays work out right.
How does one
write an automatic algorithm to figure these things out from an existing gut
authorlist?
Answer, again, is
that one can not write an automatic algorithm to figure these things out
because currently there isn’t enough information stored about author
names, and further, how author names are sorted and displayed are based in part
on library tradition, perhaps best found by researching Library of Congress for
a particular author.
Another way of
saying this is, let’s say you make the mistake of wandering into a Barnes
and Noble when you were actually trying to enter the Starbucks next door.
But while in there you decide to look at the fiction stacks just for fun to see
if they have your favorite author. Where in the stacks do you look? Well,
that depends on how B&N sorts on your favorite author, which in turn is
based on library tradition for that particular author.
Yes you can try
to write an algorithm to do this but then you will find that surprisingly often
it breaks, because it seems that having an unusual family name is a prereq for
writing a book. You can then say “oh well this is PG we really don’t
care why be o.c.d.?” But then you are producing books that work
inferior, in practice, for customers, on customer’s machines, compared to
the other publishing houses, making PG look like amateur hour. You might
say “well then they shouldn’t have bought that machine rather they
should buy my favorite choice of machine.” But customers tend to
consider that attitude towards their choice of machine a sign of hostility
towards the customer by PG – which I guess is why PG already provides
literally about 80 different file formats for customers. I believe PG
needs to remain agnostic towards the customers’ choice of machine if PG
wants to retain the customer, which means that PG needs to understand how the
differing classes of machines actually work, and what their constraints are.
Getting authors,
titles, and sort orders “correct” IS pretty basic. Not easy,
but basic.