Name lists and Big-endianism

Without due respect for the dead hand of history, or the dead heads of aesthetes trying to impose attractive schemes devoid of logic or practicality, it would be nice if we could agree on some scheme to sequence our author indexes. It won't happen of course, and I am not silly enough to think that this brief note contains anything conclusive, but give it a thunk, anyone interested. Anyone uninterested is sternly forbidden to consider the matter or read this remark (it hardly hopes to attain the dignity of a suggestion.) Let us assume that we have authors such as the famous Johanna Kakebeenwania van der Merwe O'Brien, Jolien Gertina van der Poel O'Mally, Paulette Marmorella Bridhedia Paul-Ewen Truupsvor Theooseov Swizarminife Neville McSnurtle Quentin Urtel Xavier Ypres Zulrich Ürtur Aspoestertjie Sinnerella Katrina van Aswagen Gehardus Johannes Katwimpers Janse van Vuuren van den Heever Johannes Gehardus du Toit van der Vyfer Jakobus Johannes Joumoerus Vandaaigoed Lelie Belladonna Nerina Vanderker Otto Werther von und zu Bismarkharing The problem is notionally to sequence them according to a comprehensible and totally unambiguous scheme, with the least sensitivity to uncertain spellling and concentrations of initial letters etc. The best approach is to write each name, as much as desired in normal internal sequence as above, then split each name immediately after the last non-alphabetic character (including spaces). The bit at the end is what you sequence by, NOT the full name, NOT necessarily the full surname, and without consideration of case or diacritical signs. In our by no means random, but hardly unrealistic example,several questions arise, including the role of various non-alphabetic characters, and the artificial concentration of surnames under the initial letters of prefixes such as de, der, du, van, van der, von den, and no end of etcs. By sorting by the terminal alphabetic string, we remove ambiguity and even out the spread of names through the alphabet. In simple information theory this optimises search time and sort efficiency. The above example becomes: Aswagen Aspoestertjie Sinnerella Katrina van Bismarkharing Otto Werther von und zu Brien Johanna Kakebeenwania van der Merwe O' Ewen Paulette Marmorella Bridhedia Paul- Heever Gehardus Johannes Katwimpers Janse van Vuuren van den Mally Jolien Gertina van der Poel O' Swizarminife Truupsvor Theooseov Urtel Neville McSnurtle Quentin Ürtur Xavier Ypres Zulrich Vandaaigoed Jakobus Johannes Joumoerus Vanderker Lelie Belladonna Nerina Vyfer Johannes Gehardus du Toit van der The head benefit is in the de tailing. Not that anyone asked. Cheers, Jon

Jon Richfield schreef:
In our by no means random, but hardly unrealistic example,several questions arise, including the role of various non-alphabetic characters, and the artificial concentration of surnames under the initial letters of prefixes such as de, der, du, van, van der, von den, and no end of etcs. By sorting by the terminal alphabetic string, we remove ambiguity and even out the spread of names through the alphabet. In simple information theory this optimises search time and sort efficiency. The above example becomes:
Aswagen Aspoestertjie Sinnerella Katrina van
Bismarkharing Otto Werther von und zu
Brien Johanna Kakebeenwania van der Merwe O'
Ewen Paulette Marmorella Bridhedia Paul-
Heever Gehardus Johannes Katwimpers Janse van Vuuren van den
Mally Jolien Gertina van der Poel O'
Swizarminife Truupsvor Theooseov
Urtel Neville McSnurtle Quentin
Ürtur Xavier Ypres Zulrich
Vandaaigoed Jakobus Johannes Joumoerus
Vanderker Lelie Belladonna Nerina
Vyfer Johannes Gehardus du Toit van der
The head benefit is in the de tailing.
Not that anyone asked.
Since you've picked a bunch of mostly Dutch and German authors or at least authors whose ancestors happened to be Dutch or German, I'd like to point out that a rather common way in Dutch databases is to do it slightly different: Sinerella Katrina van Aswagen Aspostertjie would become: Aswagen Aspostertjie, van, Sinerella Katrina This prevents alphabetically sorting all surnames from becoming a massive series of entries starting with a 'V'. I'm rather sure Marcello can provide the answer on whether our Eastern brethren do it the same. Regards, Walter van Holst

Dag Walter, bly te kenne! In South Africa there are indeed strong Dutch as well as other Germanic influences, and nowhere more so than in our surnames (especially Afrikaans surnames of course). Van, van der, von, van den, ter, ten, etc. Van and van der are easily the leaders though. We do however have a strong Huguenot influences (de, du, even a few le etc) and don't forget the Irish O', though they are not as prominent as in say, the US. Also, for similar reasons some black names begin with U, N, or M. We also have Portuguese names (Del...) And yes, the reason you mention is exactly the one I had in mind. Especially in certain districts where certain families settled and established a patronymic dominance that became a local source of pervasive inconvenience and perverse pride. (There sometimes are problems with the family forenames as well; schools and universities have been driven to distinguish between particular students by date of birth!) And thereby hang various tales, variously amusing... I am not quite certain of the DB convention you mention though. Are you sure that you didn't have some finger trouble? "Aspoestertjie Sinnerella Katrina van Aswagen" becomes "Aswagen Aspostertjie, van, Sinerella Katrina"??? Isn't that a bit pointlessly arbitrary, devious, even obscure? If it is indeed the convention, then so be it, but I would think that the rotation scheme I proposed has major advantages. For one thing it puts the Driscols Benny O' in their places, along with the Drifters Benny Smith- and the Diemans Benny van. Mooi bly! Jon
Jon Richfield schreef:
In our by no means random, but hardly unrealistic example,several questions arise, including the role of various non-alphabetic characters, and the artificial concentration of surnames under the initial letters of prefixes such as de, der, du, van, van der, von den, and no end of etcs. By sorting by the terminal alphabetic string, we remove ambiguity and even out the spread of names through the alphabet. In simple information theory this optimises search time and sort efficiency. The above example becomes:
Aswagen Aspoestertjie Sinnerella Katrina van
Bismarkharing Otto Werther von und zu
Brien Johanna Kakebeenwania van der Merwe O'
Ewen Paulette Marmorella Bridhedia Paul-
Heever Gehardus Johannes Katwimpers Janse van Vuuren van den
Mally Jolien Gertina van der Poel O'
Swizarminife Truupsvor Theooseov
Urtel Neville McSnurtle Quentin
Ürtur Xavier Ypres Zulrich
Vandaaigoed Jakobus Johannes Joumoerus
Vanderker Lelie Belladonna Nerina
Vyfer Johannes Gehardus du Toit van der
The head benefit is in the de tailing.
Not that anyone asked.
Since you've picked a bunch of mostly Dutch and German authors or at least authors whose ancestors happened to be Dutch or German, I'd like to point out that a rather common way in Dutch databases is to do it slightly different:
Sinerella Katrina van Aswagen Aspostertjie would become:
Aswagen Aspostertjie, van, Sinerella Katrina
This prevents alphabetically sorting all surnames from becoming a massive series of entries starting with a 'V'.
I'm rather sure Marcello can provide the answer on whether our Eastern brethren do it the same.
Regards,
Walter van Holst
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

And don't forget that other national traditions you can have more confusion. For example: For Hungarian names, the preferred order is [Family name] [Given name] So that in the main text of PG#19433 the author's name is given as: Balazs Bela, with the understanding that the first name that appears is the one we alphbetize by. And in Icelandic names, what looks to us as a "last name" is not actually a family name, but a patrynomic. It is incorrect to alphabetize by that, so the given name is used instead. --Andrew On Fri, 18 Sep 2009, Jon Richfield wrote:
Without due respect for the dead hand of history, or the dead heads of aesthetes trying to impose attractive schemes devoid of logic or practicality, it would be nice if we could agree on some scheme to sequence our author indexes. It won't happen of course, and I am not silly enough to think that this brief note contains anything conclusive, but give it a thunk, anyone interested. Anyone uninterested is sternly forbidden to consider the matter or read this remark (it hardly hopes to attain the dignity of a suggestion.)
Let us assume that we have authors such as the famous
Johanna Kakebeenwania van der Merwe O'Brien, Jolien Gertina van der Poel O'Mally, Paulette Marmorella Bridhedia Paul-Ewen Truupsvor Theooseov Swizarminife Neville McSnurtle Quentin Urtel Xavier Ypres Zulrich Ürtur Aspoestertjie Sinnerella Katrina van Aswagen Gehardus Johannes Katwimpers Janse van Vuuren van den Heever Johannes Gehardus du Toit van der Vyfer Jakobus Johannes Joumoerus Vandaaigoed Lelie Belladonna Nerina Vanderker Otto Werther von und zu Bismarkharing
The problem is notionally to sequence them according to a comprehensible and totally unambiguous scheme, with the least sensitivity to uncertain spellling and concentrations of initial letters etc. The best approach is to write each name, as much as desired in normal internal sequence as above, then split each name immediately after the last non-alphabetic character (including spaces). The bit at the end is what you sequence by, NOT the full name, NOT necessarily the full surname, and without consideration of case or diacritical signs.
In our by no means random, but hardly unrealistic example,several questions arise, including the role of various non-alphabetic characters, and the artificial concentration of surnames under the initial letters of prefixes such as de, der, du, van, van der, von den, and no end of etcs. By sorting by the terminal alphabetic string, we remove ambiguity and even out the spread of names through the alphabet. In simple information theory this optimises search time and sort efficiency. The above example becomes:
Aswagen Aspoestertjie Sinnerella Katrina van
Bismarkharing Otto Werther von und zu
Brien Johanna Kakebeenwania van der Merwe O'
Ewen Paulette Marmorella Bridhedia Paul-
Heever Gehardus Johannes Katwimpers Janse van Vuuren van den
Mally Jolien Gertina van der Poel O'
Swizarminife Truupsvor Theooseov
Urtel Neville McSnurtle Quentin
Ürtur Xavier Ypres Zulrich
Vandaaigoed Jakobus Johannes Joumoerus
Vanderker Lelie Belladonna Nerina
Vyfer Johannes Gehardus du Toit van der
The head benefit is in the de tailing.
Not that anyone asked.
Cheers,
Jon
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Yes, I have a book by one Peter Rosza. It took me some time to realise that Peter was in fact a woman, and a well-known mathematician at that, whom we might have called Rose Peter. Tsk! These Magyars...! You'd think they would have come to us for advice. As for the Icelandic convention, I knew that there was something funny about all their terminal "-sons" and "-dotters" (sp?) but don't they have any family name at all? Some of the Slavic names might be troublesome too, because they vary the suffix of what I take to be the family name, according to gender: -ski vs -ska and so on. But maybe I have that mixed up as in the Icelandic names. Could it be that the Icelandic convention derives from the fact that they are dealing with a smallish population? Anyway, It seems to me that the indexing convention I proposed would still be easy to apply by anyone that understands the naming convention of the language and the population in question. Simply write the complete name (or whatever part suits the DB in question) in the lexically normal way according to the favoured convention, then rotate it till the first letter after the last non-alphabetic character is first in the string, and voila! Go well, Jon Andrew Sly wrote:
And don't forget that other national traditions you can have more confusion.
For example:
For Hungarian names, the preferred order is [Family name] [Given name] So that in the main text of PG#19433 the author's name is given as: Balazs Bela, with the understanding that the first name that appears is the one we alphbetize by.
And in Icelandic names, what looks to us as a "last name" is not actually a family name, but a patrynomic. It is incorrect to alphabetize by that, so the given name is used instead.
--Andrew

Whole can of worms. In the Netherlands Walter van Holst (If you allow me to use your name, Walter) would be sorted by "Holst" . In Belgium, using the same language, it would be sorted by "van Holst" (and the "van" would be capitalised). Frits Op 20-sep-09, om 13:19 heeft Jon Richfield het volgende geschreven:
Yes, I have a book by one Peter Rosza. It took me some time to realise that Peter was in fact a woman, and a well-known mathematician at that, whom we might have called Rose Peter. Tsk! These Magyars...! You'd think they would have come to us for advice.
As for the Icelandic convention, I knew that there was something funny about all their terminal "-sons" and "-dotters" (sp?) but don't they have any family name at all? Some of the Slavic names might be troublesome too, because they vary the suffix of what I take to be the family name, according to gender: -ski vs -ska and so on. But maybe I have that mixed up as in the Icelandic names. Could it be that the Icelandic convention derives from the fact that they are dealing with a smallish population? Anyway, It seems to me that the indexing convention I proposed would still be easy to apply by anyone that understands the naming convention of the language and the population in question. Simply write the complete name (or whatever part suits the DB in question) in the lexically normal way according to the favoured convention, then rotate it till the first letter after the last non- alphabetic character is first in the string, and voila!
Go well,
Jonm
Andrew Sly wrote:
And don't forget that other national traditions you can have more confusion.
For example:
For Hungarian names, the preferred order is [Family name] [Given name] So that in the main text of PG#19433 the author's name is given as: Balazs Bela, with the understanding that the first name that appears is the one we alphbetize by.
And in Icelandic names, what looks to us as a "last name" is not actually a family name, but a patrynomic. It is incorrect to alphabetize by that, so the given name is used instead.
--Andrew
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
participants (4)
-
Andrew Sly
-
Frits Devos
-
Jon Richfield
-
Walter van Holst