
I hope you have figured out my point by now: Namely, IF one wants to make "correct" e-book files in a number of formats, including EPUB and MOBI, it is not possible algorithmically to determine the "correct" encoding of Author Lastname, Firstname from data currently found in either the PG HTML encodings nor the PG TXT encodings. It is also not possible to make "correct" encodings of Author Lastname, Firstname from the information currently recorded in the PG catalog. One would like to have "correct" encodings of Author Lastname, Firstname so that if a customer adds a PG text in say EPUB or MOBI to their existing collection of e-book titles in their e-book library, it would be nice if the Author Lastname, Firstname sorts and displays correctly next to any other e-books they might already possess from other sources. Sun Tzu Sun is the author's family name, or what is represented as an authors "Lastname" in western cultures. Tzu is a romanization of an honorarium such as "Sir" or "Mr" Sun Tzu 孫子; Sūn Zǐ; Which is listed in a westernized corrupted form in the PG catalog as "Sunzi" which shows lack of cultural respect -- combining the family name with the honorarium in a way to artificially form an apparent feminine. However, I believe the transcriber needs to transcribe the book as written, including the spelling or representation of the author name found there, which means that the book transcription in HTML or PG TXT cannot be used as a reliable source of author name -- nor should the spelling given in transcription necessarily be how the author is listed in the PG catalog. Nor can it algorithmically be thus possible to figure out what part therein is the "last name [family name]" So therefore in addition to the coding in the HTML or the PG TXT there also needs to be a "spine" representation that gives a correct canonical identification of author "Lastname: Sun Firstname: Tzu" where again Tzu isn't really the first name, but by traditional this slot gets used for that part of the canonical author name representation which isn't the lastname. "Art of War" also being known simply as "The Sun Tzu." Miguel de Cervantes Last name of author is actually most often canonically represented as "Cervantes Saavedra", with the "firstname" part typically represented as "Miguel de". Saavedra being mother's last name in a culture where children bear their mother's name but when the book is sold in other cultures that are uncomfortable with this convention then the Saavedra tends to get dropped -- but shouldn't be because it IS the author's last name. Marquis de Sade Last name of author = Sade. First name part is "Donatien Alphonse François". But by tradition customers are probably expecting the firstname part to be represented as "Marquis de" -- they almost certainly will not recognize "Donatien Alphonse François". So its not real clear how the firstname part ought be coded, but if the lastname part is coded as Sade then at least the book will show up about the right place in the possessor's library listing. Again, the point being neither the PG catalog nor the literal transcription can be used as a reliable source of the author lastname, firstname information -- which DOES need to be reliably included in the e-book file so that the e-book will show up at correct location in the customer's e-book library sort.