
There is a new experimental online reader available. Start from any bibliographic record page, eg. http://www.gutenberg.net/etext/4300 Basically this paginates the txt file and remembers your last position in a cookie so you can later resume reading where you left off. Please test it. It should work with any book that has a text file where the encoding is known. -- Marcello Perathoner webmaster@gutenberg.net

I picked "Two Years Before the Mast" from the the Top 100, and got nothing but blank pages. Joel
There is a new experimental online reader available. Start from any bibliographic record page, eg.
http://www.gutenberg.net/etext/4300
Basically this paginates the txt file and remembers your last position in a cookie so you can later resume reading where you left off.
Please test it. It should work with any book that has a text file where the encoding is known.

Hey, thanks a lot, Marcello, that's fantastic! Have you spammed the newsletter people (who's doing it nowadays?) about this? I think it's well worth a mention... Incidentally, I notice that the prefatory materials, in particular, page-break neatly on the *** START OF THE PROJECT GUTENBERG ETEXT... line. Is this pure coincidence, or deliberate? If deliberate, is it a one-time heuristic that recognises the asterisks (or something similar), or do you do other heuristics in an attempt to locate chapter headings? How good do you find they are? Meredydd On Wednesday 22 September 2004 14:44, Marcello Perathoner wrote:
There is a new experimental online reader available. Start from any bibliographic record page, eg.
http://www.gutenberg.net/etext/4300
Basically this paginates the txt file and remembers your last position in a cookie so you can later resume reading where you left off.
Please test it. It should work with any book that has a text file where the encoding is known.

Meredydd wrote:
Incidentally, I notice that the prefatory materials, in particular, page-break neatly on the *** START OF THE PROJECT GUTENBERG ETEXT... line. Is this pure coincidence, or deliberate? If deliberate, is it a one-time heuristic that recognises the asterisks (or something similar), or do you do other heuristics in an attempt to locate chapter headings? How good do you find they are?
The script just goes 50 lines down and then breaks on the first empty line. -- Marcello Perathoner webmaster@gutenberg.org

I'm putting this in today's Newsletter. Michael On Wed, 22 Sep 2004, Meredydd wrote:
Hey, thanks a lot, Marcello, that's fantastic!
Have you spammed the newsletter people (who's doing it nowadays?) about this? I think it's well worth a mention...
Incidentally, I notice that the prefatory materials, in particular, page-break neatly on the *** START OF THE PROJECT GUTENBERG ETEXT... line. Is this pure coincidence, or deliberate? If deliberate, is it a one-time heuristic that recognises the asterisks (or something similar), or do you do other heuristics in an attempt to locate chapter headings? How good do you find they are?
Meredydd
On Wednesday 22 September 2004 14:44, Marcello Perathoner wrote:
There is a new experimental online reader available. Start from any bibliographic record page, eg.
http://www.gutenberg.net/etext/4300
Basically this paginates the txt file and remembers your last position in a cookie so you can later resume reading where you left off.
Please test it. It should work with any book that has a text file where the encoding is known.
gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

Not to rain on your parade, but ... compare with my script for converting PG .txt to html on the fly: http://isis.library.adelaide.edu.au/cgi-bin/pg-html/pg/etext03/ulyss12.txt Some attempt at reformatting would be nice. Your cookie idea for remembering where you got to is a nice touch -- but this seems to be the only justification for splitting a work into 50-line segments. 50 seems completely arbitrary -- 25 would probably fit the whole page into my screen, so I wouldn't need to scroll. 200 -- or 2000 -- would save me clicking on Next Page so often. I like the My Bookmarks feature. But I'd still rather download to my Palm, which gives me all these features and lets me take it away from my desk. Steve Marcello Perathoner wrote:
There is a new experimental online reader available. Start from any bibliographic record page, eg.
http://www.gutenberg.net/etext/4300
Basically this paginates the txt file and remembers your last position in a cookie so you can later resume reading where you left off.
Please test it. It should work with any book that has a text file where the encoding is known.
-- Stephen Thomas, Senior Systems Analyst, University of Adelaide Library UNIVERSITY OF ADELAIDE SA 5005 AUSTRALIA Phone: +61 8 830 35190 Fax: +61 8 830 34369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ Free books at eBooks@Adelaide, http://etext.library.adelaide.edu.au/ CRICOS Provider Number 00123M ----------------------------------------------------------- This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient.

Steve Thomas wrote:
Not to rain on your parade, but ... compare with my script for converting PG .txt to html on the fly:
http://isis.library.adelaide.edu.au/cgi-bin/pg-html/pg/etext03/ulyss12.txt
Some attempt at reformatting would be nice.
I'll take your text to show why I am opposed to purely automatic reformatting of text. Your text (in the 2nd paragraph) says: —Introibo Ad Altare Dei. where it should say —Introibo ad altare Dei. Capitalization rules for English titles should not be applied to Latin text were they are completely inadequate. Also: Amoroso Ma Non Troppo. Von Der Sirenen Listigkeit Tun Die Poeten Dichten. Und Alle Schiffe Brucken. Tete-A-Tete The missing accents combined with the erroneous capitalization make the last 2 examples a really outstanding example of text corruption. Its better IMO to present an ugly but correct text than a pretty but corrupted one. This is the reason I decided against purely automatic reformatting. N.B. I'm not against automatic reformatting (into TEI) and then proofing the text again.
Your cookie idea for remembering where you got to is a nice touch -- but this seems to be the only justification for splitting a work into 50-line segments. 50 seems completely arbitrary -- 25 would probably fit the whole page into my screen, so I wouldn't need to scroll. 200 -- or 2000 -- would save me clicking on Next Page so often.
That also could be stored in a preferences cookie. -- Marcello Perathoner webmaster@gutenberg.org

Marcello Perathoner wrote:
I'll take your text to show why I am opposed to purely automatic reformatting of text.
Your text (in the 2nd paragraph) says:
—Introibo Ad Altare Dei.
where it should say
—Introibo ad altare Dei.
Capitalization rules for English titles should not be applied to Latin text were they are completely inadequate. Also:
Amoroso Ma Non Troppo.
Von Der Sirenen Listigkeit Tun Die Poeten Dichten. Und Alle Schiffe Brucken.
Tete-A-Tete
The missing accents combined with the erroneous capitalization make the last 2 examples a really outstanding example of text corruption.
Its better IMO to present an ugly but correct text than a pretty but corrupted one. This is the reason I decided against purely automatic reformatting.
Hmmm -- interesting point. My script reformats the PG text, which contains UND ALLE SCHIFFE BRUCKEN TETE_A_TETE etc. So, I regret to inform you that the ORIGINAL PG text is "corrupt" -- nothing to do with my script. GIGO. You have a point about capitalisation rules -- but when the original text uses ALL CAPS to represent italicised words, there's a limit to what can be done -- but we've had this discussion before, so I'll not repeat it here. Suffice to say that it is perfectly possible to apply at least some minimal formatting to improve readability, without corrupting the text. -- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/

Steve Thomas wrote:
Suffice to say that it is perfectly possible to apply at least some minimal formatting to improve readability, without corrupting the text.
But that would be very minimal: - replace fixed font with proportional line per line - replace leading spaces with fixed width one And even this would mangle some tables. -- Marcello Perathoner webmaster@gutenberg.org

Steve Thomas wrote:
So, I regret to inform you that the ORIGINAL PG text is "corrupt" -- nothing to do with my script. GIGO.
Nope. There is a difference between claiming: I have not recorded the capitalization of this sentence like the original PG text does and the capitalization of this sentence is so and so like your version does. Everybody who sees "INTROIBO AT ALTARE DEI" recognizes that some information has been lost. But a reader not familiar with Latin might not be aware that "Introibo Ad Altare Dei" is all wrong. Bottom line: if information has been lost, never use guessing to recover it but go back to the source. -- Marcello Perathoner webmaster@gutenberg.org

Hello. I looked at the online reader and find it useful but I see one thing which needs to be fixed. There needs to be a way to jump to a specific page. I had to follow the "next page" link several times to get past the standard PG header. I agree that the PG header is important for legal reasons and the public should know as much about PG as possible, but for the older ebooks this can be very long and many people won't have the patience to try to scroll through. I know that this would break usual procedure, but could the reader be set to skip the header entirely? I am thinking that all of the ebooks will be reposted eventually with a shortened PG header anyway and I wouldn't want people drawn away. It might also be nice to let the user decide a page size since it apparently works by deciding that a page is x bytes in the file and adjusting the offset accordingly.

Tony Baechler wrote:
Hello. I looked at the online reader and find it useful but I see one thing which needs to be fixed. There needs to be a way to jump to a specific page. I had to follow the "next page" link several times to get past the standard PG header.
Two issues here: - do we really want people to ignore "skip" the header? - The "standard header" is not standard at all. You need guessing to skip the header (something computers are not very good at.) But this could be fixed.
I agree that the PG header is important for legal reasons and the public should know as much about PG as possible, but for the older ebooks this can be very long and many people won't have the patience to try to scroll through.
The reposting of all texts will make this issue go away pretty soon. -- Marcello Perathoner webmaster@gutenberg.org
participants (6)
-
Joel A. Erickson
-
Marcello Perathoner
-
Meredydd
-
Michael Hart
-
Steve Thomas
-
Tony Baechler