
On Sun, Jul 17, 2005 at 12:17:25PM +0200, Marcello Perathoner wrote:
I'm just trying to pre-empt problems that may appear down the line once we have a considerable number of page image files posted. Applications may pop up that we don't imagine yet. I don't want to weep over the lost page numbers as we are weeping today over the lost accents.
I second that. Losing (meta-)information is always a bad idea. This means markup, non-ASCII characters, non-English texts, ... [*] In many historical books and essays, the author directs the reader to some other such work, page so and so. If/when PG had set up a way to recognize those and link to them, it will be possible to create links to the right part of the other e-text. Phony example: John Smith writes an _Essay on the History of Rome_ This is PG text number 23456 page 123, footnote 4, he says: [...] see James King, _Study of ancient Greece_, page 456. Now suppose that in a few years time 1/ this _Study of ancient Greece_ gets into PG, number 78967 (in the same edition John Smith used) 2/ some text-crawling program detects its fuzzy quotation in the Smith's essay 3/ some robot and/or human reworks Smith's essay to include hyperlinks to PG text 78967, at the correct page number (which would land at the right paragraph in the HTML) and checks them one by one That would be a killer ultimate library! [*] For this reason I am happy about the recent change at PGDP-US and creation of PGDP-EU, even though more work/details remain to be done (mostly, but not only, regarding the existing database of e-texts).