Re: [gutvol-d] XML version of some books of PG (and other formats)

3 Dec 2004

      The problem I've always run into is where the table tries to grow beyond 80 characters wide.

For instance, say that one row looks like this in the original book.

Data label that       Now we have a column      Now we have a column
is extremely long     of data that is also      of data that is also
and is broken up      very long and broken      very long and broken
accordingly over      up over multiple lines.   up over multiple lines.
multiple lines.

Most automated text converters will put each cell on one line with no line breaks.

A web browser will generate line breaks within cells so that the table will end up looking very similar to the above.  I haven't tried w3m ... will it handle the above scenario?  I've tried lynx dumping to a text file and IE/Mozilla dumping to a text, and they all fail miserably.

Josh

----- Original Message -----
From: "Sebastien Blondeel" <blondeel@clipper.ens.fr>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] XML version of some books of PG (and other formats)
Date: Fri, 3 Dec 2004 17:53:49 +0100
...
On Fri, Dec 03, 2004 at 10:52:25AM -0500, Joshua Hutchinson wrote:
...
The hard part is getting the table info within PG text 80 column width.
It is not always possible of course.
...
FYI, this table becomes this in TEI markup (NOTE: I made the second
That looks simple enough.
...
Club column just continue under the first for simplicities sake):
Change it to HTML:
-=-=-=
<table border="0">
<tr>
   <td colspan="4" align="center">THE RECORD OF 1875.</td>
[...]
-=-=-=
then replace:
   row  -> tr
   cell -> td
then "w3m -dump table.html" gives:
$ w3m -dump table.html
       THE RECORD OF 1875.
Club.           Won. Lost. P.C.
Boston          71   8     .809
Athletic        55   28    .756
Hartford        54   28    .639
St. Louis       29   39    .574
Philadelphia    37   31    .544
Chicago         30   37    .448
Mutual          29   38    .426
St. Louis Reds  4    14    .222
Washington      4    22    .156
New Haven       7    39    .152
Centennial      2    13    .133
Western         1    12    .077
Atlantic        2    42    .065
(the star after St. Louis has disappeared).
If you need it embedded in a program I can try to code the algorithm,
depending on the programming language you want (Perl should be easy).
Then you can detect cells with just numbers in them should be
right-aligned, etc.
It should also be easy to translate this to LaTeX for PDF/DVI/PS output.
_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d