Re: Removing spurious break lines

jim said:
In general, if derived formats including ePUB and MOBI from HTML, also HTML from txt, also unwrapping txt from wrapped txt, are to work “correctly” then there needs to be *some* degree of expectation on the formatting of the incoming texts. Otherwise these tasks cannot be successfully automated.
that's true. but i'm not talking just about "derivative formats", because there's no need to create a "derivative" if you'd rather just use the .txt file itself to drive the display, a la "eucalyptus". however, the .txt file does have to be formatted "correctly" if it is to be _displayed_ correctly. that's what's driving my motivation...
Going the other way, the automated wrapping of txt is has built-in support by most (all?) modern text tools, including web browsers, e-book readers, text editors, etc.
stop trying to derail the thread, jim. there's no way that project gutenberg is going to mount files that don't have mid-paragraph hard linebreaks... _no_way_... so that's not what we're talking about here. and we aren't _going_ to talk about that here, no matter how many times you try to bring it up. so stop trying. what we _are_ talking about now is formatting the .txt files _correctly_, so that they can be unwrapped automatically... -bowerbird

what we _are_ talking about now is formatting the .txt files _correctly_, so that they can be unwrapped automatically...
In any case PG already "owns" a txt unwrapper, since PG is in some cases generating HTML from pgtxt70, and that requires unwrapping the text (txt?), which is being done, as one can tell by opening just about any PG HTML that was autogenerated from the submitted pgtxt70 file format. The text not correctly unwrapped in this case was an HTML submitted that had the linebreaks forced -- which is not usual PG convention (to the extent PG *has* a HTML convention.) Perhaps you should start by examining what PG has *already* implemented for txt unwrapping to generated HTML, find out what works and what doesn't work, and what requirements this puts on txt submission in order to make it all work right? Otherwise PG will end up with two conflicting text unwrapping standards, which will make the submitter's task even more confusing. If PG can successfully implement the *hard* task of unwrapping text, one would think PG could also support the *easy* task of wrapping submissions to the pgtxt70 standard. Implementing both directions to form a round-trip might even give PG a heads-up where its assumptions -- or failure of the submission to follow style guidelines -- is "breaking" the wrapping or unwrapping effort. To the extent that you guys are heading more-and-more towards the "unobtrusive" marking up of txt files, please note that Python has already got very good efforts in this regard called "reStructured Text" -- and the tools existing to support it! Not to imply that PG would have to follow their lead literally for example they use *emphasis* for italics and **strong emphasis** for bold. Rather you could just "borrow" their tools. http://docs.python.org/documenting/rest.html http://docutils.sourceforge.net/rst.html and online tools that work for trying it out: http://www.tele3.cz/jbar/rest/rest.html Now I don't like the formatting of the Python manuals -- but that is a separate choice from the markup language, and the tools they have created for making the manuals from lightweight "unobtrusive" markup.
participants (2)
-
Bowerbird@aol.com
-
Jim Adcock