Re: In search of a more-vanilla vanilla TXT

jim said:
Now maybe to some of you -- you consider this result to be a good thing, an acceptable thing, a thing that well-represents the considerable efforts of the PG volunteers.
i doubt there is anyone who thinks that. what you ended up with is pure shit. that's because you did it wrong. you rewrapped lines that weren't supposed to be rewrapped. if you would have done it correctly, it would've come out right. but you did it wrong. now, your point is probably that we should make it easier for our users to do it correctly. and nobody will disagree... we _should_ make it easier for our users to do it correctly. and there's an (awfully) easy way to make it easier, which is to mark all lines that should not be rewrapped with leading spaces. but the whitewashers won't do it. why won't the whitewashers do it? i dunno. you'll have to ask them. i've certainly asked them. i've asked them to do it, pretty please. i've asked them again to do it, pretty please. i've asked 'em why they haven't done it. i've said, repeatedly, that i think it's stupid they haven't done it. and they still don't do it. not all the time, anyway. they do it some of the time. i consider that a very slight victory. -bowerbird

you rewrapped lines that weren't supposed to be rewrapped.
You make my point for me. When one relies on automagical tools to try to recreate semantic information discarded by the PG TXT representation, more or less often one ends up with something that looks like sh*t - your word not mine. When the results break one is told "Oh you did it wrong, you should have done something else instead." Yes of course, but once one relies on human intervention to "fix" the problem when a particular algorithm breaks, then one does not have an automatic algorithm. Ultimately what one should do if one wants to "get it right" is to abandon attempts at automagical tools which work sometimes and end up looking like sh*t other times and instead take the PG TXT file, take the original page scans, look at the page scans to figure out where the PG TXT files gratuitously entered line breaks where the author didn't intend line breaks, and take them back out. After the gratuitous page breaks are taken back out (the work of a few days - trust me on this!) then one can either, if one has a machine, such as a teletype, incapable of reflow, run the now gratuitous-line-break free TXT back through a simple unambiguous algorithm to insert a line break at the appropriate point for your machine - at a whitespace prior to char72 if you own a teletypewriter, at a whitespace prior to char20 perhaps if you own a cellphone. Or better, if you have a more modern machine, which really, I think most of us DO have, a machine capable of calculating reflow itself aka "word wrap" then you just feed the machine the TXT that doesn't have the gratuitous line breaks and everything works automagically. Assuming one is willing to live with ragged right. Or tolerate slightly ugly word spacing on machines that force right justify (sigh.) Better yet, we should ask our technologist friends to include not only reflow but also automatic hyphenation routines in our machines. Is it too much, for example, to ask PG to provide the option to the rare user who actually WANTS line breaks at char 72, or for that matter actually wants line breaks at char 20, is it too much to ask PG to provide a filter to insert such "gratuitous" line breaks? Consider: PG *already* provides literally 40 different such filter programs to help people with various strange obscure legacy machines.
participants (2)
-
Bowerbird@aol.com
-
James Adcock