Re: [gutvol-d] XML version of some books of PG (and other formats)

The hard part is getting the table info within PG text 80 column width. A typical table might be 4 columns wide and 5 rows tall. Here is a fairly simple one from a Basebal Guide text I'm working on... Club. Won. Lost. P.C. Chicago 42 14 .788 Hartford 47 21 .691 St. Louis 45 19 .703 Boston 39 31 .557 Louisville 30 36 .455 Mutual 21 35 .375 Athletic 14 45 .237 Cincinnati 9 56 .135 Here is one a little more complex... It has more text columns. THE RECORD OF 1875. Club. Won. Lost. P.C. Club. Won. Lost. P.C. Boston ........ 71 8 .809 St. Louis Reds .... 4 14 .222 Athletic ...... 55 28 .756 Washington ........ 4 22 .156 Hartford ...... 54 28 .639 New Haven ......... 7 39 .152 St. Louis* .... 29 39 .574 Centennial......... 2 13 .133 Philadelphia .. 37 31 .544 Western ........... 1 12 .077 Chicago ....... 30 37 .448 Atlantic .......... 2 42 .065 Mutual ........ 29 38 .426 FYI, this table becomes this in TEI markup (NOTE: I made the second Club column just continue under the first for simplicities sake): <table rows="16" cols="4"> <row> <cell cols="4" role="label">THE RECORD OF 1875.</cell> </row> <row> <cell role="label">Club.</cell><cell role="label">Won.</cell><cell role="label">Lost.</cell><cell role="label">P.C.</cell> </row> <row> <cell role="data">Boston</cell><cell role="data">71</cell><cell role="data">8</cell><cell role="data">.809</cell> </row> <row> <cell role="data">Athletic</cell><cell role="data">55</cell><cell role="data">28</cell><cell role="data">.756</cell> </row> <row> <cell role="data">Hartford</cell><cell role="data">54</cell><cell role="data">28</cell><cell role="data">.639</cell> </row> <row> <cell role="data">St. Louis</cell><cell role="data"><sic corr="39">29</sic></cell><cell role="data"><sic corr="29">39</sic></cell><cell role="data">.574</cell> </row> <row> <cell role="data">Philadelphia</cell><cell role="data">37</cell><cell role="data">31</cell><cell role="data">.544</cell> </row> <row> <cell role="data">Chicago</cell><cell role="data">30</cell><cell role="data">37</cell><cell role="data">.448</cell> </row> <row> <cell role="data">Mutual</cell><cell role="data">29</cell><cell role="data">38</cell><cell role="data">.426</cell> </row> <row> <cell role="data">St. Louis Reds</cell><cell role="data">4</cell><cell role="data">14</cell><cell role="data">.222</cell> </row> <row> <cell role="data">Washington</cell><cell role="data">4</cell><cell role="data">22</cell><cell role="data">.156</cell> </row> <row> <cell role="data">New Haven</cell><cell role="data">7</cell><cell role="data">39</cell><cell role="data">.152</cell> </row> <row> <cell role="data">Centennial</cell><cell role="data">2</cell><cell role="data">13</cell><cell role="data">.133</cell> </row> <row> <cell role="data">Western</cell><cell role="data">1</cell><cell role="data">12</cell><cell role="data">.077</cell> </row> <row> <cell role="data">Atlantic</cell><cell role="data">2</cell><cell role="data">42</cell><cell role="data">.065</cell> </row> </table> ----- Original Message ----- From: "Sebastien Blondeel" <blondeel@clipper.ens.fr> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Subject: Re: [gutvol-d] XML version of some books of PG (and other formats) Date: Fri, 3 Dec 2004 16:16:02 +0100
On Fri, Dec 03, 2004 at 08:46:10AM -0500, Joshua Hutchinson wrote:
I'm curious to see if your script can handle tables. That is our current biggest bugaboo when it comes to transforming to PG TXT format.
My DTD doesn't mention them (yet?). It focuses mainly on the French books of the ebooksgratuits site. I guess it can very easily be injected in a more complete DTD (TEI, Docbook, whatever).
I already did Perl (not XSLT!) translations of XML tables (Docbook, for example) to other formats (HTML: easy; LaTeX: harder...; TXT: w3m -dump of the HTML version is usually good enough) for other projects.
I heard there were now Perl modules able to deal with XML and XSLT so it should be even easier to take care of. XSLT-style of programming is not for me...
How complex are your tables and what do you need to do with them? Any example of (input, output desired, and constraints [API, language...] of the transformation)? _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

On Fri, Dec 03, 2004 at 10:52:25AM -0500, Joshua Hutchinson wrote:
The hard part is getting the table info within PG text 80 column width.
It is not always possible of course.
FYI, this table becomes this in TEI markup (NOTE: I made the second
That looks simple enough.
Club column just continue under the first for simplicities sake):
Change it to HTML: -=-=-= <table border="0"> <tr> <td colspan="4" align="center">THE RECORD OF 1875.</td> [...] -=-=-= then replace: row -> tr cell -> td then "w3m -dump table.html" gives: $ w3m -dump table.html THE RECORD OF 1875. Club. Won. Lost. P.C. Boston 71 8 .809 Athletic 55 28 .756 Hartford 54 28 .639 St. Louis 29 39 .574 Philadelphia 37 31 .544 Chicago 30 37 .448 Mutual 29 38 .426 St. Louis Reds 4 14 .222 Washington 4 22 .156 New Haven 7 39 .152 Centennial 2 13 .133 Western 1 12 .077 Atlantic 2 42 .065 (the star after St. Louis has disappeared). If you need it embedded in a program I can try to code the algorithm, depending on the programming language you want (Perl should be easy). Then you can detect cells with just numbers in them should be right-aligned, etc. It should also be easy to translate this to LaTeX for PDF/DVI/PS output.

Joshua Hutchinson wrote:
<table rows="16" cols="4"> <row> <cell cols="4" role="label">THE RECORD OF 1875.</cell> </row>
Shouldn't that be <table rows="15" cols="4"> <head> THE RECORD OF 1875. </head> ? -- Marcello Perathoner webmaster@gutenberg.org

As I recall the 80 colllumn rule didn't used to be a hard and fast rule for tables. When the table contained too much information, one was supposed to expand it the minmum amount necessary, at least that is what I recall MH as saying. nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest ----- Original Message ----- From: "Joshua Hutchinson" <joshua@hutchinson.net> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Sent: Friday, December 03, 2004 10:52 AM Subject: Re: [gutvol-d] XML version of some books of PG (and other formats)
The hard part is getting the table info within PG text 80 column width.
A typical table might be 4 columns wide and 5 rows tall.
Here is a fairly simple one from a Basebal Guide text I'm working on...
Club. Won. Lost. P.C. Chicago 42 14 .788 Hartford 47 21 .691 St. Louis 45 19 .703 Boston 39 31 .557 Louisville 30 36 .455 Mutual 21 35 .375 Athletic 14 45 .237 Cincinnati 9 56 .135
Here is one a little more complex... It has more text columns.
THE RECORD OF 1875. Club. Won. Lost. P.C. Club. Won. Lost. P.C. Boston ........ 71 8 .809 St. Louis Reds .... 4 14 .222 Athletic ...... 55 28 .756 Washington ........ 4 22 .156 Hartford ...... 54 28 .639 New Haven ......... 7 39 .152 St. Louis* .... 29 39 .574 Centennial......... 2 13 .133 Philadelphia .. 37 31 .544 Western ........... 1 12 .077 Chicago ....... 30 37 .448 Atlantic .......... 2 42 .065 Mutual ........ 29 38 .426
FYI, this table becomes this in TEI markup (NOTE: I made the second Club
column just continue under the first for simplicities sake): > > <table rows="16" cols="4"> > <row> > <cell cols="4" role="label">THE RECORD OF 1875.</cell> > </row> > <row> > <cell role="label">Club.</cell><cell role="label">Won.</cell><cell role="label">Lost.</cell><cell role="label">P.C.</cell> > </row> > <row> > <cell role="data">Boston</cell><cell role="data">71</cell><cell role="data">8</cell><cell role="data">.809</cell> > </row> > <row> > <cell role="data">Athletic</cell><cell role="data">55</cell><cell role="data">28</cell><cell role="data">.756</cell> > </row> > <row> > <cell role="data">Hartford</cell><cell role="data">54</cell><cell role="data">28</cell><cell role="data">.639</cell> > </row> > <row> > <cell role="data">St. Louis</cell><cell role="data"><sic corr="39">29</sic></cell><cell role="data"><sic corr="29">39</sic></cell><cell role="data">.574</cell> > </row> > <row> > <cell role="data">Philadelphia</cell><cell role="data">37</cell><cell role="data">31</cell><cell role="data">.544</cell> > </row> > <row> > <cell role="data">Chicago</cell><cell role="data">30</cell><cell role="data">37</cell><cell role="data">.448</cell> > </row> > <row> > <cell role="data">Mutual</cell><cell role="data">29</cell><cell role="data">38</cell><cell role="data">.426</cell> > </row> > <row> > <cell role="data">St. Louis Reds</cell><cell role="data">4</cell><cell role="data">14</cell><cell role="data">.222</cell> > </row> > <row> > <cell role="data">Washington</cell><cell role="data">4</cell><cell role="data">22</cell><cell role="data">.156</cell> > </row> > <row> > <cell role="data">New Haven</cell><cell role="data">7</cell><cell role="data">39</cell><cell role="data">.152</cell> > </row> > <row> > <cell role="data">Centennial</cell><cell role="data">2</cell><cell role="data">13</cell><cell role="data">.133</cell> > </row> > <row> > <cell role="data">Western</cell><cell role="data">1</cell><cell role="data">12</cell><cell role="data">.077</cell> > </row> > <row> > <cell role="data">Atlantic</cell><cell role="data">2</cell><cell role="data">42</cell><cell role="data">.065</cell> > </row> > </table> > > ----- Original Message ----- > From: "Sebastien Blondeel" <blondeel@clipper.ens.fr> > To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> > Subject: Re: [gutvol-d] XML version of some books of PG (and other formats) > Date: Fri, 3 Dec 2004 16:16:02 +0100 > > > > > On Fri, Dec 03, 2004 at 08:46:10AM -0500, Joshua Hutchinson wrote: > > > I'm curious to see if your script can handle tables. That is our > > > current biggest bugaboo when it comes to transforming to PG TXT > > > format. > > > > My DTD doesn't mention them (yet?). It focuses mainly on the French > > books of the ebooksgratuits site. I guess it can very easily be injected > > in a more complete DTD (TEI, Docbook, whatever). > > > > I already did Perl (not XSLT!) translations of XML tables (Docbook, for > > example) to other formats (HTML: easy; LaTeX: harder...; TXT: w3m -dump > > of the HTML version is usually good enough) for other projects. > > > > I heard there were now Perl modules able to deal with XML and XSLT so it > > should be even easier to take care of. XSLT-style of programming is not > > for me... > > > > How complex are your tables and what do you need to do with them? Any > > example of (input, output desired, and constraints [API, language...] of > > the transformation)? > > _______________________________________________ > > gutvol-d mailing list > > gutvol-d@lists.pglaf.org > > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > >
participants (4)
-
Joshua Hutchinson
-
Marcello Perathoner
-
Norm Wolcott
-
Sebastien Blondeel