restructured-text -- the good, the bad, and the ugly

i don't get over to the d.p. forums very often these days, so i didn't realize they started rolling out their notice on the shift to restructured-text last saturday, december 11th. i also didn't know this initiative was so far along, in that two r.s.t. e-books are already mounted (december 16th):
http://www.gutenberg.org/ebooks/34654 http://www.gutenberg.org/ebooks/34605
here's my r.s.t. summary: the good, the bad, and the ugly. *** _restructured-text_at_project_gutenberg_ _the_good_ project gutenberg is officially using a light-markup format! _the_bad_ the implementation uptake at d.p. will likely be _very_slow_... _the_ugly_ this change proves that it is now marcello making p.g. policy. *** i'll elaborate on this summary in the coming days... in the meantime... as you can imagine, over the many years, i have been an advocate of light-markup all over cyberspace, so now i can go out and tell the world that project gutenberg has finally endorsed my position and is using light-markup! of course, this leaves considerable egg on the faces of people like walter, who -- just last december 26th -- posted that i was a "hopeless fool" and that my proposals for light-markup were "fundamentally impossible". looks like you were wrong, walter. way wrong. way way wrong. even marcello has given it all up. -bowerbird

I read the instructions how to make rst, and downloaded the .rst books bowerbird listed. My Calibre won't convert them, my browser won't open them. What are they supposed to work with? Linda M. Everhart codmolly@embarqmail.com On Dec 18, 2010, at 3:15 PM, Bowerbird@aol.com wrote:
i don't get over to the d.p. forums very often these days, so i didn't realize they started rolling out their notice on the shift to restructured-text last saturday, december 11th.
i also didn't know this initiative was so far along, in that two r.s.t. e-books are already mounted (december 16th):
http://www.gutenberg.org/ebooks/34654 http://www.gutenberg.org/ebooks/34605
here's my r.s.t. summary: the good, the bad, and the ugly.
***
_restructured-text_at_project_gutenberg_
_the_good_
project gutenberg is officially using a light-markup format!
_the_bad_
the implementation uptake at d.p. will likely be _very_slow_...
_the_ugly_
this change proves that it is now marcello making p.g. policy.
***
i'll elaborate on this summary in the coming days...
in the meantime... as you can imagine, over the many years, i have been an advocate of light-markup all over cyberspace, so now i can go out and tell the world that project gutenberg has finally endorsed my position and is using light-markup!
of course, this leaves considerable egg on the faces of people like walter, who -- just last december 26th -- posted that i was a "hopeless fool" and that my proposals for light-markup were "fundamentally impossible". looks like you were wrong, walter. way wrong. way way wrong. even marcello has given it all up.
-bowerbird _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

The RST files can be opened with any UTF-8 capable editor--SCUnipad, Windows Notepad, MS Word, etc. (I can't speak to Mac software.) If you click on the reStructuredText link, you should get a prompt to open or save the file. You may have to set up an association of the .rst extension with the editor of your choice. They're meant only as a master file from which other formats are generated--plain text, HTML, epub, etc, etc. They are not an end-user ebook format. BTW--it's probably more accurate to say that PG doesn't actually _use_ formats. Ebook producers (DP, independents) _use_ formats--PG decides which ones it will accept for posting. That's how I see it, anyway--dissenters can argue among themselves. <g> Al -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Linda M. Everhart Sent: Saturday, December 18, 2010 2:20 PM To: Project Gutenberg Volunteer Discussion Subject: [gutvol-d] Re: restructured-text -- the good, the bad, and the ugly I read the instructions how to make rst, and downloaded the .rst books bowerbird listed. My Calibre won't convert them, my browser won't open them. What are they supposed to work with? Linda M. Everhart codmolly@embarqmail.com On Dec 18, 2010, at 3:15 PM, Bowerbird@aol.com wrote: i don't get over to the d.p. forums very often these days, so i didn't realize they started rolling out their notice on the shift to restructured-text last saturday, december 11th. i also didn't know this initiative was so far along, in that two r.s.t. e-books are already mounted (december 16th):
http://www.gutenberg.org/ebooks/34654 http://www.gutenberg.org/ebooks/34605
here's my r.s.t. summary: the good, the bad, and the ugly. *** _restructured-text_at_project_gutenberg_ _the_good_ project gutenberg is officially using a light-markup format! _the_bad_ the implementation uptake at d.p. will likely be _very_slow_... _the_ugly_ this change proves that it is now marcello making p.g. policy. *** i'll elaborate on this summary in the coming days... in the meantime... as you can imagine, over the many years, i have been an advocate of light-markup all over cyberspace, so now i can go out and tell the world that project gutenberg has finally endorsed my position and is using light-markup! of course, this leaves considerable egg on the faces of people like walter, who -- just last december 26th -- posted that i was a "hopeless fool" and that my proposals for light-markup were "fundamentally impossible". looks like you were wrong, walter. way wrong. way way wrong. even marcello has given it all up. -bowerbird _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Al, sorry I didn't explain that well. I can open the rst files, and read them with all the markup, but I don't know what to use to convert them into something usable like html or mobi. Someone point me to a webpage that explains this. I'd like to learn more about it. Linda M. Everhart codmolly@embarqmail.com On Dec 18, 2010, at 5:01 PM, Al Haines wrote:
The RST files can be opened with any UTF-8 capable editor--SCUnipad, Windows Notepad, MS Word, etc. (I can't speak to Mac software.) If you click on the reStructuredText link, you should get a prompt to open or save the file. You may have to set up an association of the .rst extension with the editor of your choice.
They're meant only as a master file from which other formats are generated--plain text, HTML, epub, etc, etc. They are not an end- user ebook format.
BTW--it's probably more accurate to say that PG doesn't actually _use_ formats. Ebook producers (DP, independents) _use_ formats--PG decides which ones it will accept for posting. That's how I see it, anyway--dissenters can argue among themselves. <g>
Al
-----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org ] On Behalf Of Linda M. Everhart Sent: Saturday, December 18, 2010 2:20 PM To: Project Gutenberg Volunteer Discussion Subject: [gutvol-d] Re: restructured-text -- the good, the bad, and the ugly
I read the instructions how to make rst, and downloaded the .rst books bowerbird listed. My Calibre won't convert them, my browser won't open them. What are they supposed to work with?
Linda M. Everhart codmolly@embarqmail.com
On Dec 18, 2010, at 3:15 PM, Bowerbird@aol.com wrote:
i don't get over to the d.p. forums very often these days, so i didn't realize they started rolling out their notice on the shift to restructured-text last saturday, december 11th.
i also didn't know this initiative was so far along, in that two r.s.t. e-books are already mounted (december 16th):
http://www.gutenberg.org/ebooks/34654 http://www.gutenberg.org/ebooks/34605
here's my r.s.t. summary: the good, the bad, and the ugly.
***
_restructured-text_at_project_gutenberg_
_the_good_
project gutenberg is officially using a light-markup format!
_the_bad_
the implementation uptake at d.p. will likely be _very_slow_...
_the_ugly_
this change proves that it is now marcello making p.g. policy.
***
i'll elaborate on this summary in the coming days...
in the meantime... as you can imagine, over the many years, i have been an advocate of light-markup all over cyberspace, so now i can go out and tell the world that project gutenberg has finally endorsed my position and is using light-markup!
of course, this leaves considerable egg on the faces of people like walter, who -- just last december 26th -- posted that i was a "hopeless fool" and that my proposals for light-markup were "fundamentally impossible". looks like you were wrong, walter. way wrong. way way wrong. even marcello has given it all up.
-bowerbird _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Google "restructuredtext" (no quotes, all lower case, one word). You'll get a bunch of hits--the ones starting with "docutils..." are RST's home pages (as it were). That's about as far as I can take you. -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Linda M. Everhart Sent: Saturday, December 18, 2010 3:57 PM To: Project Gutenberg Volunteer Discussion Subject: [gutvol-d] Re: restructured-text -- the good, the bad, and the ugly Al, sorry I didn't explain that well. I can open the rst files, and read them with all the markup, but I don't know what to use to convert them into something usable like html or mobi. Someone point me to a webpage that explains this. I'd like to learn more about it. Linda M. Everhart codmolly@embarqmail.com On Dec 18, 2010, at 5:01 PM, Al Haines wrote: The RST files can be opened with any UTF-8 capable editor--SCUnipad, Windows Notepad, MS Word, etc. (I can't speak to Mac software.) If you click on the reStructuredText link, you should get a prompt to open or save the file. You may have to set up an association of the .rst extension with the editor of your choice. They're meant only as a master file from which other formats are generated--plain text, HTML, epub, etc, etc. They are not an end-user ebook format. BTW--it's probably more accurate to say that PG doesn't actually _use_ formats. Ebook producers (DP, independents) _use_ formats--PG decides which ones it will accept for posting. That's how I see it, anyway--dissenters can argue among themselves. <g> Al -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Linda M. Everhart Sent: Saturday, December 18, 2010 2:20 PM To: Project Gutenberg Volunteer Discussion Subject: [gutvol-d] Re: restructured-text -- the good, the bad, and the ugly I read the instructions how to make rst, and downloaded the .rst books bowerbird listed. My Calibre won't convert them, my browser won't open them. What are they supposed to work with? Linda M. Everhart codmolly@embarqmail.com On Dec 18, 2010, at 3:15 PM, Bowerbird@aol.com wrote: i don't get over to the d.p. forums very often these days, so i didn't realize they started rolling out their notice on the shift to restructured-text last saturday, december 11th. i also didn't know this initiative was so far along, in that two r.s.t. e-books are already mounted (december 16th):
http://www.gutenberg.org/ebooks/34654 http://www.gutenberg.org/ebooks/34605
here's my r.s.t. summary: the good, the bad, and the ugly. *** _restructured-text_at_project_gutenberg_ _the_good_ project gutenberg is officially using a light-markup format! _the_bad_ the implementation uptake at d.p. will likely be _very_slow_... _the_ugly_ this change proves that it is now marcello making p.g. policy. *** i'll elaborate on this summary in the coming days... in the meantime... as you can imagine, over the many years, i have been an advocate of light-markup all over cyberspace, so now i can go out and tell the world that project gutenberg has finally endorsed my position and is using light-markup! of course, this leaves considerable egg on the faces of people like walter, who -- just last december 26th -- posted that i was a "hopeless fool" and that my proposals for light-markup were "fundamentally impossible". looks like you were wrong, walter. way wrong. way way wrong. even marcello has given it all up. -bowerbird _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Hmm... the whole idea is that different formats are created automatically on the PG server, so that you don't have to. I've just taken a quick look on the discussion thread at DP to see if Marcello provided links to the software he is using, but I would not find any. I have the impression that the implementation being used for PG is not finalized yet, and is still a work in progress. But you might find some useful information here: http://docutils.sourceforge.net/rst.html It looks like you may need to install Python to use it. --Andrew On Sat, 18 Dec 2010, Linda M. Everhart wrote:
Al, sorry I didn't explain that well. I can open the rst files, and read them with all the markup, but I don't know what to use to convert them into something usable like html or mobi. Someone point me to a webpage that explains this. I'd like to learn more about it.
Linda M. Everhart codmolly@embarqmail.com

On Sat, Dec 18, 2010 at 09:20:04PM -0800, Andrew Sly wrote:
Hmm... the whole idea is that different formats are created automatically on the PG server, so that you don't have to.
Not exactly. The idea of the investigation is to find out under what conditions we can have conversion from a master format without human intervention, and get good results. I don't think anyone believes there is a single master format that will work for all situations (at least, with all the caveats and variations we've identified, not least of which is the DP and PG toolset & workflow). More likely, we'll end up with a continuing need for hand-crafted files in various formats, plus a variety of auto-conversions.
I've just taken a quick look on the discussion thread at DP to see if Marcello provided links to the software he is using, but I would not find any.
The current auto-converter software is out there somewhere, but I don't have the URL in front of me. I'm sure there will be new software at DP, for the WWers, and at gutenberg.org, assuming things keep moving forward as they have been. -- Greg
I have the impression that the implementation being used for PG is not finalized yet, and is still a work in progress.
But you might find some useful information here: http://docutils.sourceforge.net/rst.html
It looks like you may need to install Python to use it.
--Andrew
On Sat, 18 Dec 2010, Linda M. Everhart wrote:
Al, sorry I didn't explain that well. I can open the rst files, and read them with all the markup, but I don't know what to use to convert them into something usable like html or mobi. Someone point me to a webpage that explains this. I'd like to learn more about it.
Linda M. Everhart codmolly@embarqmail.com
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On 12/18/2010 11:19 PM, Linda M. Everhart wrote:
I read the instructions how to make rst, and downloaded the .rst books bowerbird listed. My Calibre won't convert them, my browser won't open them. What are they supposed to work with?
You are supposed to run them past the PG converter, which is called EpubMaker (for historical reasions). It runs on python, so you'll have to install that first. Caveat emptor: While theoretically it can run everywhere python runs, it has been tried out on linux only, so far. You can find the sources here: http://www.gutenberg.org/tools/ -- Marcello Perathoner webmaster@gutenberg.org

> http://www.gutenberg.org/ebooks/34654 > http://www.gutenberg.org/ebooks/34605 IF one actually looks at the quality of the above resulting EPUB and Mobi actually generated by this approach, one would see why RST and other txt based approaches make many of us book submitters so unhappy! HTML and therefore EPUB sucks as a method for coding books -- but even then the results end up looking better than this RST! I would suggest instead of "standardizing" on RST instead "standardize" on EPUB for the input submission format, and move to EPUB3 when that comes out. Txt70 and HTML formats can be "easily" downconverted from EPUB rather than trying to guess info that isn't there when trying to move from Txt70 to EPUB and Mobi. EPUB would ideally be extended by PG conventions to cover issues that come up frequently when trying to encode books to fairly represent the Author's and/or Publisher's intent.
[quote: re "just encoding the words of an author"]
I can't find that email again, but, "just encoding the words of an author" works great if the author's book "just" consists of a string of words. I'm not sure I've ever seen such a book, but I assume some exist -- representing an author "just" encoding a purely aural tradition presumably. I was thinking that Rudyard Kipling's "The Jungle Book" might be such a book "encoding a purely aural tradition", but, now that I've looked it up the answer is NO: Not even "The Jungle Book" is simply an encoding of a "string of words." The Story "The Blue Hotel" from Stephen Crane's "The Monster" comes close to being simply an aural encoding -- but even there the author cannot help but include some visual representations in his book that do not have an aural equivalent -- it's NOT just "a string of words." If I might be so bold as to try to more correctly state the job of a "modern" contributor to PG: To encode, as simply but as accurately as possible, the intent of the original author and/or publisher, in a way that can be as correctly represented as possible, on the greatest number of display devices as possible of actual people who want to read PG texts, and to the extent possible also predict the future so that future customers can also so enjoy PG texts. And do this while minimizing the download and storage size of the resulting downloadable file so that the customers can actually store and read the book on their reader devices. In practice how much of the submitters job is "just encoding the words of an author"? In my experience one is lucky if "just encoding the words of an author" represents even half the total amount of time and effort one puts into a PG book submission. Every time a submission requires more than one file -- and PG requires at least three such submissions per book -- the more chances there are for things to get screwed up -- and they DO get screwed up! When the encoding language doesn't match the common job of representing the things one commonly actually finds in real world common books, then things WILL surely get screwed up! Txt70 and RST being a case in point as being too weak. HTML and therefore EPUB being both too weak AND too rich [too permissive whilst at the same time not having the common elements to encode those things one commonly finds in real books] And further, in the real world PG contributors need submission formats that have ACTUAL not THEORETICAL good "authoring" tools AND good rendering tools, so that they can see in advance what their efforts are going to look like on real world customers' reader displays.

In a Nut shell. the task is impossible! To many factors which are opposing. The optimal solution would be then a facsimile or the book itself. Yet, only if one considers the printed book as the true intent of the author. The problem is age old, not only in the digital world. One of the oldest "encodings" is TeX(DVI) as a device independent representation. Then there was and is PDF. HTML, EPUB, MOBI and whatever you throw into the lot are all created with a purpose in mind and therefore will never meet your criteria. All make compromises which in the end render them suboptimal. Yet, the problem is not only the problem of the encoding itself, but lies in the devices themselves. As proof of concept consider books for young children: large type face. there is no way that can be properly displayed on the small screens that most ereaders have today. at least not in a pleasing manner. regards Keith. Am 28.12.2010 um 07:11 schrieb Jim Adcock:
http://www.gutenberg.org/ebooks/34654 http://www.gutenberg.org/ebooks/34605
IF one actually looks at the quality of the above resulting EPUB and Mobi actually generated by this approach, one would see why RST and other txt based approaches make many of us book submitters so unhappy!
HTML and therefore EPUB sucks as a method for coding books -- but even then the results end up looking better than this RST!
I would suggest instead of "standardizing" on RST instead "standardize" on EPUB for the input submission format, and move to EPUB3 when that comes out. Txt70 and HTML formats can be "easily" downconverted from EPUB rather than trying to guess info that isn't there when trying to move from Txt70 to EPUB and Mobi. EPUB would ideally be extended by PG conventions to cover issues that come up frequently when trying to encode books to fairly represent the Author's and/or Publisher's intent.
[quote: re "just encoding the words of an author"]
I can't find that email again, but, "just encoding the words of an author" works great if the author's book "just" consists of a string of words. I'm not sure I've ever seen such a book, but I assume some exist -- representing an author "just" encoding a purely aural tradition presumably. I was thinking that Rudyard Kipling's "The Jungle Book" might be such a book "encoding a purely aural tradition", but, now that I've looked it up the answer is NO: Not even "The Jungle Book" is simply an encoding of a "string of words." The Story "The Blue Hotel" from Stephen Crane's "The Monster" comes close to being simply an aural encoding -- but even there the author cannot help but include some visual representations in his book that do not have an aural equivalent -- it's NOT just "a string of words."
If I might be so bold as to try to more correctly state the job of a "modern" contributor to PG:
To encode, as simply but as accurately as possible, the intent of the original author and/or publisher, in a way that can be as correctly represented as possible, on the greatest number of display devices as possible of actual people who want to read PG texts, and to the extent possible also predict the future so that future customers can also so enjoy PG texts. And do this while minimizing the download and storage size of the resulting downloadable file so that the customers can actually store and read the book on their reader devices.
In practice how much of the submitters job is "just encoding the words of an author"? In my experience one is lucky if "just encoding the words of an author" represents even half the total amount of time and effort one puts into a PG book submission.
Every time a submission requires more than one file -- and PG requires at least three such submissions per book -- the more chances there are for things to get screwed up -- and they DO get screwed up! When the encoding language doesn't match the common job of representing the things one commonly actually finds in real world common books, then things WILL surely get screwed up! Txt70 and RST being a case in point as being too weak. HTML and therefore EPUB being both too weak AND too rich [too permissive whilst at the same time not having the common elements to encode those things one commonly finds in real books]
And further, in the real world PG contributors need submission formats that have ACTUAL not THEORETICAL good "authoring" tools AND good rendering tools, so that they can see in advance what their efforts are going to look like on real world customers' reader displays.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

In a Nut shell. the task is impossible!
Nonsense. Many DP books and many independently produced books on PG are written (HTML) in a way that they may be very successfully read on many many machines, including EPUB and Mobi machines. Also, the EPUB and MOBI generated automatically from txt70 for the older files on PG is often also pretty good, given the constraints. But, other books are not good, and some of the tools proposed for PG are not good. For "impossible" substitute "some just really don't care." Certainly creating an automated tool to do a good job on 100% of all books IS an "impossible" task.

I don't see any problem with either of these books in their mobi versions, at least. What is it you're referring to? I guess the one comment I'd make is that the PG Header cruft could be formatted better, but other than that, I have no problems. I think the biggest benefit to using RST is that it leads to _uniformity_, which is something that at least I personally would like to see. What parts of RST make it too weak for you? On Tue, Dec 28, 2010 at 1:11 AM, Jim Adcock <jimad@msn.com> wrote:
http://www.gutenberg.org/ebooks/34654 > http://www.gutenberg.org/ebooks/34605
IF one actually looks at the quality of the above resulting EPUB and Mobi actually generated by this approach, one would see why RST and other txt based approaches make many of us book submitters so unhappy!

On Mon, Dec 27, 2010 at 10:11:04PM -0800, Jim Adcock wrote:
http://www.gutenberg.org/ebooks/34654 http://www.gutenberg.org/ebooks/34605
IF one actually looks at the quality of the above resulting EPUB and Mobi actually generated by this approach, one would see why RST and other txt based approaches make many of us book submitters so unhappy! ...
Alex asked the same question that I had: please describe what you see as the shortcomings or limitations of those books. Either as displayed in HTML or ePub or Mobi. I looked at all of them, also text. I didn't see problems. I've also looked at the RST (which is reminiscent of LaTeX, to me). TIA. -- Greg

Sorry, just back from a trip. The Kindle stuff all comes out left aligned when chapter titles, images, etc ought to be centered. The EPUB stuff all comes run-together with no vertical whitespace or other "reasonable" indication of paragraph breaks.
I looked at all of them, also text. I didn't see problems. I've also looked at the RST (which is reminiscent of LaTeX, to me). TIA.

On Sat, Dec 18, 2010 at 04:15:38PM -0500, Bowerbird@aol.com wrote:
_the_ugly_
this change proves that it is now marcello making p.g. policy.
Absolute rubbish. While I cannot take credit for the technical solution, nor for the hard work of the handful of people who have done the investigation into different approaches for by-hand and auto-generated ePubs and other output files, I do know all about how this initiative was started. It was started during the DP board telecon on November 13. Find the minutes here: http://www.pgdp.net/phpBB2/viewtopic.php?t=44456 The policy of PG, such that it is, is a policy of "yes." See, for example, http://www.gutenberg.org/wiki/Gutenberg:Administrivia_by_Michael_Hart But the DP board, and the WWers, recognized that there would need to be a focused effort to bring about changes in the DP->PG workflow for ePub. The initiative is, in a nutshell, to do the work needed to identify this new workflow. Then, the DP board and whitewashers and PPVers and other stakeholders will tune and adopt. This is the sort of top-down plus bottom-up approach that I like to see. Top-down, the DP board (of which I am a member) saw the need, and I was able to employ my dual role with the WWers to help encourage people to work on the initiative. Bottom up, we have lots of interest & input from people who produce eBooks, from those who manage the DP infrastructure, and others. In short, there is massive buy-in to this effort, and the discussion on gutvol-d and the DP forums leaves no doubt as to the community interest in the initiative. Whether there is buy-in to the solution remains to be seen, but so far it looks very good. -- Greg PS: I don't want to downplay Marcello's contribution to this. He's one of the handful mentioned above, and has done a lot of great work to move things forward. He's part of the team.
participants (9)
-
Al Haines
-
Alex Buie
-
Andrew Sly
-
Bowerbird@aol.com
-
Greg Newby
-
Jim Adcock
-
Keith J. Schultz
-
Linda M. Everhart
-
Marcello Perathoner