
On 10/29/2011 02:18 PM, Greg M. Johnson wrote:
... reasonable speculations about 200 years from now, ...
Hilarious. Could a person from 1800 reasonably have foreseen portable devices that could communicate with every person on earth and ebooks delivered over cellular networks? (In 1800 they used gas for illumination, and electricity, thou known to scientists, had no practical application. Radio telegraphy came a hundred years later.)
Microsoft Word 1995 format (if there were such a thing) is probably out. WordPerfect for 3.1 Windows format is probably out.
The works worth preserving -- those that age slower than the substrate they are written on -- were preserved by copying them again and again to new substrates. Every file format will age and become unreadable eventually. The only hope for preserving the file's contents will be timely conversion into a newer format. Plain text lacks some important information about the text structure and does not convert gracefully to any other format. Many have tried, all have failed. By the time OCR software will recognize more textual features than plain text, plain text will be dead. Plain text sucks as archival format because it forgets too much. Plain text sucks as distribution format too. It displays well on an 80-column teletype (some 20 people still have one in the basement) and nowhere else (some 6 billion people use something else). "Reasonable speculation:" in 100 years, machine OCR will be so accurate that it will find and correct typos in the printed book, it will infer semantics and mark them up, and, while PG plain text files can still be found in the Google cache, nobody will use them, as it will take about 2ms to OCR any scanned book from the IA.
But I am wondering who wins the elimination round between: i) PG's current format for "TXT" with 80 chars ii) a format that is still all ASCII but removes all the end-of-column CR's, and only has double returns at the end of paragraphs.
So you think that TXT *without* hard CRs is better than text with hard CRs. After all that pomp about "objective truth" and "200 years from now" I expected something more thoroughly enlightening. -- Marcello Perathoner webmaster@gutenberg.org