
gardner said:
i scraped them from canadiana.org myself... :+) but hey, do you still have your original o.c.r.? or a latest version that has the original linebreaks intact? ***
I don't believe that a single pass is feasible,
ok, i should elaborate. multiple passes, to check different aspects, will be required. but multiple passes to check the _same_ aspect are inefficient.
I don't believe that a single pass is feasible, in particular for mismatched quotes and spaced quotes.
i believe you're wrong, and that i can show you.
in particular for mismatched quotes and spaced quotes. You fix the open quote, or in my case close more often than not, then that reveals another quote problem further down/up.
i have already demonstrated that you can fix spacey quotes, and -- in the vast majority of cases -- fix 'em automatically. leading and trailing spacey quotes are easy to fix, of course. from there, it's a simple matter of segmenting the text into _paragraphs_, and counting quotemarks in each paragraph, making sure that the odd ones are open, and the even closed. then when you come upon a spacey quote, fix it to be open if it is an odd one, and fix it to be close if it is an even one. if you come up against a case where there is an odd number of quotes in a paragraph, and the next paragraph does not start with a quote, then you have a case you need to look at. similarly, if any of the quotemarks come up as the wrong type (an odd that's close, or an even that's open), you need to look. you can test this for yourself. you'll find that it's very robust. usually there's no need to spend much time on spacey quotes.
In any event I am not troubled by multiple passes.
ok.
Well it *is* a Gutenberg text after all.
right. that point wasn't directed at you, as you correctly realized.
Thanks.
well, the fact that you haven't wasted your time is only _part_ of the equation. the fact that you won't get much credit down the line (because _your_ text will be discarded because you threw away info that people will want) is yet another (bigger) part of the equation.
Sure. The book *is* public domain after all. Do what you like.
i think you missed the point. you can mount a version of your work that doesn't throw away the important information, and then no one will have to re-do it, in which case they will be happy to continue to give you the credit. -bowerbird