
The problem I've always run into is where the table tries to grow beyond 80 characters wide. For instance, say that one row looks like this in the original book. Data label that Now we have a column Now we have a column is extremely long of data that is also of data that is also and is broken up very long and broken very long and broken accordingly over up over multiple lines. up over multiple lines. multiple lines. Most automated text converters will put each cell on one line with no line breaks. A web browser will generate line breaks within cells so that the table will end up looking very similar to the above. I haven't tried w3m ... will it handle the above scenario? I've tried lynx dumping to a text file and IE/Mozilla dumping to a text, and they all fail miserably. Josh ----- Original Message ----- From: "Sebastien Blondeel" <blondeel@clipper.ens.fr> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Subject: Re: [gutvol-d] XML version of some books of PG (and other formats) Date: Fri, 3 Dec 2004 17:53:49 +0100
On Fri, Dec 03, 2004 at 10:52:25AM -0500, Joshua Hutchinson wrote:
The hard part is getting the table info within PG text 80 column width.
It is not always possible of course.
FYI, this table becomes this in TEI markup (NOTE: I made the second
That looks simple enough.
Club column just continue under the first for simplicities sake):
Change it to HTML:
-=-=-= <table border="0"> <tr> <td colspan="4" align="center">THE RECORD OF 1875.</td> [...] -=-=-=
then replace: row -> tr cell -> td
then "w3m -dump table.html" gives:
$ w3m -dump table.html THE RECORD OF 1875. Club. Won. Lost. P.C. Boston 71 8 .809 Athletic 55 28 .756 Hartford 54 28 .639 St. Louis 29 39 .574 Philadelphia 37 31 .544 Chicago 30 37 .448 Mutual 29 38 .426 St. Louis Reds 4 14 .222 Washington 4 22 .156 New Haven 7 39 .152 Centennial 2 13 .133 Western 1 12 .077 Atlantic 2 42 .065
(the star after St. Louis has disappeared).
If you need it embedded in a program I can try to code the algorithm, depending on the programming language you want (Perl should be easy).
Then you can detect cells with just numbers in them should be right-aligned, etc.
It should also be easy to translate this to LaTeX for PDF/DVI/PS output. _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d