Re: [gutvol-d] RFC: Posting Page Scans in DJVU Format

17 Jul 2005

      On Sun, Jul 17, 2005 at 12:17:25PM +0200, Marcello Perathoner wrote:
...
I'm just trying to pre-empt problems that may appear down the line once 
we have a considerable number of page image files posted. Applications 
may pop up that we don't imagine yet. I don't want to weep over the lost 
page numbers as we are weeping today over the lost accents.
I second that. Losing (meta-)information is always a bad idea. This
means markup, non-ASCII characters, non-English texts, ... [*]

In many historical books and essays, the author directs the reader to
some other such work, page so and so.

If/when PG had set up a way to recognize those and link to them, it will
be possible to create links to the right part of the other e-text.

Phony example:

John Smith writes an _Essay on the History of Rome_
  This is PG text number 23456
page 123, footnote 4, he says:
  [...] see James King, _Study of ancient Greece_, page 456.

Now suppose that in a few years time
1/ this _Study of ancient Greece_ gets into PG, number 78967
   (in the same edition John Smith used)
2/ some text-crawling program detects its fuzzy quotation
   in the Smith's essay
3/ some robot and/or human reworks Smith's essay to include
   hyperlinks to PG text 78967, at the correct page number
   (which would land at the right paragraph in the HTML)
   and checks them one by one

That would be a killer ultimate library!

[*] For this reason I am happy about the recent change at PGDP-US and
creation of PGDP-EU, even though more work/details remain to be done
(mostly, but not only, regarding the existing database of e-texts).

Re: [gutvol-d] RFC: Posting Page Scans in DJVU Format

Sebastien Blondeel