
the original linebreaks _should_ be preserved. because some people _want_ them. and those original linebreaks _should_ be easy to remove as well. because some people _want_ to remove them. what nobody wants -- not really -- is a set of _new_ linebreaks, which have no legacy import. but even those are bearable, _if_ they can be easily removed. and let us recall, again, that project gutenberg has _not_ made available a web-service which people can utilize to unwrap p.g. e-texts... _i_ have created such a web-service. but project gutenberg has not. which is a minor failing. (i'd be happy to provide my code, if you want it.) and let us recall, again, that project gutenberg does _not_ ensure that every one of its e-texts is structured so that it can be unwrapped properly. this one is a _major_ failing. these are the two things that project gutenberg must do if it wants to proclaim that it has done all that it can to make its linebreaks a non-issue. -bowerbird

Let's have the code, and install it where everyone can find/use it. Right away, without further ado. . . . i.e. send the code now Help Newby set it up later. On Wed, 21 Apr 2010, Bowerbird@aol.com wrote:
the original linebreaks _should_ be preserved.
because some people _want_ them.
and those original linebreaks _should_ be easy to remove as well.
because some people _want_ to remove them.
what nobody wants -- not really -- is a set of _new_ linebreaks, which have no legacy import.
but even those are bearable, _if_ they can be easily removed.
and let us recall, again, that project gutenberg has _not_ made available a web-service which people can utilize to unwrap p.g. e-texts...
_i_ have created such a web-service.
but project gutenberg has not.
which is a minor failing.
(i'd be happy to provide my code, if you want it.)
and let us recall, again, that project gutenberg does _not_ ensure that every one of its e-texts is structured so that it can be unwrapped properly.
this one is a _major_ failing.
these are the two things that project gutenberg must do if it wants to proclaim that it has done all that it can to make its linebreaks a non-issue.
-bowerbird

but even those are bearable, _if_ they can be easily removed.
The linebreaks are removable if PG enforces standards on txt files submitted. When people make mistakes on those submissions, and they will, then the linebreaks will not be easily removed correctly. Books of poetry or containing poetry are one common counterexample. Make a copy of your linebreak removal routine public in the common computer formats BB, and let us test it and see just how easily it works on the existing PG txt files. The Unicode txt efforts are not too bad because at least then people can choose to represent the glyphs the typesetter chose if they choose to do so, rather than guessing and reinterpreting intent. Italic and SC is then still clearly a loss, as is graphics. Most books use a least italics, so I'd hate to see a PG file format that doesn't even support that. If you wanted to implement even a Unicode txt+ file format then you've got to provide renderers for the different machines. Or you auto-translate Unicode txt+ files to HTML for submitters and use the ubiquitous HTML renderers to allow people to view the Unicode txt+ version. Then submitters do not have to submit HTML unless they want to. Recent efforts about 95% of the submissions DO have HTML, but its not clear that that is because people want to provide HTML or because the WW require it. PG *is* already doing this more-or-less on the rare txt-only submissions nowadays - automagically unwrapping and translating to HTML in a way which most of the time is a win and obviously occasionally a loss. The PG legalese unfortunately is particularly unattractive in this approach, and when the unwrapping fails then it is visually distracting - "how come this paragraph isn't unwrapped - is it suppose to be poetry?" How about it? Unicode txt+ file submissions if that is what a submitter wants to do, and PG automatically renders that in HTML, and ePUB, and MOBI? But if you are willing to take txt-only submissions and autorender them into HTML accepting the resulting mistakes then why is it that you aren't willing to take HTML and autorender them into the mandatory txt70 files? Certainly going from HTML to txt70 must introduce fewer mistakes. ???
participants (3)
-
Bowerbird@aol.com
-
James Adcock
-
Michael S. Hart