
On 9/18/2012 10:37 AM, Greg Newby wrote:
I don't remember your proposal specifically, but most proposals for improvements actually involve one or two things:
1) Telling people what they can no longer do (i.e., limiting choices), or
2) Finding "someone" to do something -- write software, create standards, develop policy, etc.
The usual answer from me, and from Michael, is that we will fully support and encourage your effort. It's never clear to me why this isn't satisfying.
A little over 40 years ago, ABC aired what was apparently its first animated Movie of the Week: The Point. This story is of a kingdom where everyone has a pointed head, except for a young boy named Oblio. The evil count declares that because Oblio does not have a pointed head he is an outlaw, and he is banished to the Pointless Forest where nothing has a point. During his adventures in the Pointless Forest, Oblio meets a man who is completely covered in arrows, pointing every which way. When asked how this Pointed Man could exist in the supposedly Pointless Forest, he replies "A point in every direction is the same as no point at all." Project Gutenberg is pointless. The Gutenberg newsletters have consistent advertised that Project Gutenberg e-texts are "Readable By Both Humans and Computers." This assertion is, in fact, untrue. Computer programs require predictable data in order to process them. Because they conform to no known standard, Project Gutenberg e-texts are not useable by computers, only by humans; despite being converted to ASCII, PG e-texts are just one small step removed from their paper antecedents, and given today's computer processing power have no advantage over displaying raw page scans, à la Internet Archive or Google Books. Setting standards is not about limiting choices, nor is it about limiting opportunity: it is simply about transparency and truth in labeling. If a text is submitted that meets a certain standard, and it is labeled as meeting that standard, that does not mean that texts that do /not/ meet the standard cannot be stored and served, it just means that they should not be labeled as meeting the standard. When I say that Project Gutenberg e-texts adhere to no known standard, I do not mean to say that /all/ e-texts are devoid of standards adherence; I am simply saying that there is no mechanism to determine whether a text adheres to a standard or what that standard is. Thus, there is no effective way to build a tool chain to automate the conversion of Project Gutenberg texts to other formats, because there is no way to predict the starting point for any given text. It is never clear to /me/ why some people think that establishing standards implies limiting choice or opportunity. That is a straw man argument, designed, I can only imagine, to obscure the fact that the majority of the Project Gutenberg corpus is simply hopelessly outdated and of very limited value.