Re: In search of a more-vanilla vanilla TXT

Marcello wrote:
The problem is to decide which LF to retain.
So you have to go to a program like Word, turn all instances of "^p^p" to "QQQQ", then delete "^p", then turn "QQQQ" back to "^p^p". This happened to work for me fairly well just now with "Ten Nights in a Bar Room". The novel itself looked okay but the license at the end had some spacing irregularites to it. However: i) There will be cases where folks are software limited. I cannot see anyone being able to do this on the Ipod Touch. I've tried to look at 80-TXT files a couple times from an Apple store in cases where there was no HTML version. This may be a silly example, but I think it's about making an impression on such cursory visitors to PG. ii) There will be cases where folks are skills limited. Would the stereotypical impoverished child in Honduras be able to do that? iii) What about Shakespeare? Michael wrote:
Trouble reading on a 12" screen?
Yes. Why anyone ever came up with a 7.5 x 12.5 inch screen is beyond me, but you sort of have to choose a small pixel size to get some things in your workflow vertically all on the same page. And there are font sizes that are fine for reading things you never really need to **read**, like "File Edit View," and then there are font sizes which you'd want if you're forcing your eyes to actually read a whole book. Notepad might be fine for a short shopping list or work to-do list but not for an entire novel in monospace. Hence also wanting to redirect the viewing experience into an HTML browser with Ctrl + font-size-changing capability. (Edit: I just learned just now that Notepad has an option for changing font face!) Okay, I'll stop stirring the pot (if yall'd prefer I do), but here are two last ideas on this topic: i) If you are producing a book, *please* consider making an HTML version to be as important as the 80-TXT one, certainly more important than PDF, PUB, and MOBI. In my mind, the ones without HTML (and put the entire legalese at the front of the doc) are in some sense "lost to history" because they aren't nearly as readable. ii) Rather than curse the darkness, someone should light a candle. My response to my allegation of 80-TXT readability was to compile a DVD of 3850 books-- hopefully more books than any reasonable person would ever want to read in a lifetime-- all in ****unzipped**** HTML format-- structured with HTML which operates as I imagine the ideal book reading hand-held device ought (if I were ever to see one in operation). I've sent a workable draft to Michael; I'm now looking at squeezing in a mite more books and maybe setting up editor's picks. -- Greg M. Johnson http://pterandon.blogspot.com

Greg M. Johnson wrote:
i) If you are producing a book, *please* consider making an HTML version to be as important as the 80-TXT one, certainly more important than PDF, PUB, and MOBI. In my mind, the ones without HTML (and put the entire legalese at the front of the doc) are in some sense "lost to history" because they aren't nearly as readable.
That could easily be done. We have to make HTML on the way to producing EPUB. So technically we just could spew out the HTML before packaging the EPUB. But I don't know if it *should* be done ... The problem is: Nobody has ever been able to generate even barely palatable HTML from PG TXT. For EPUB we can justify the ugly conversion because on most ebook readers and small screens ill-formatted EPUB is still better than TXT. But HTML is supposed to be viewed on browsers and big screens, so ill-formatted HTML will be worse than TXT. -- Marcello Perathoner webmaster@gutenberg.org

In case anyone really wants to do it right, what PG needs is to have each book (and other documents) marked up semanticly. Of all of the exsting SGML/XML applications, TEI seems best for what PG is doing. Combined with SVG and X3D for graphics, xcite for any citations, etc. The best way to mark up existing PG texts may be to put the docuemnts in a wiki alongside scans and encourage the public to add the markup. Wiki-style markup seems to be easier to comprehend for most of the public. (And with reason.) In this model, incidently, each work could be served as a single file, complete with images and the like included inline. And the plain text version can be readily extracted using a stylesheet. TEI is at: http://www.tei-c.org/ -JimC -- James Cloos <cloos@jhcloos.com> OpenPGP: 1024D/ED7DAEA6

Yes, TEI has been discussed in this group a number of times before. And there are some contributors using it. When I go to gutenberg.org and do an advanced search, looking for TEI as filetype, I find 210 results. One volunteer's guideline for using TEI can be found at: http://pgtei.pglaf.org/marcello/0.4/doc/20000-h.html In short, it is there, and is being used, but not by many people. Would you like to help contribute more TEI texts to the project? Thanks, Andrew On Sun, 13 Sep 2009, James Cloos wrote:
In case anyone really wants to do it right, what PG needs is to have each book (and other documents) marked up semanticly.
Of all of the exsting SGML/XML applications, TEI seems best for what PG is doing. Combined with SVG and X3D for graphics, xcite for any citations, etc.

The problem with PGTEI (the PG dialect of TEI for which PG has automatic conversion to several end-user formats) is that the final output is considered ugly by many contributors. A second problem is that there is no automatic conversion tool to get (almost) working PGTEI from DP internal markup. I believe that both problems could be solved with little effort. Carlo Traverso
participants (5)
-
Andrew Sly
-
Greg M. Johnson
-
James Cloos
-
Marcello Perathoner
-
traverso@posso.dm.unipi.it