
Hi All, in order to increase reliability one has to have proper analysis of the "co-text" not context using look forward and look backward, and proper heuristics. Most algorithms do not use such methods as many do not know how to do this and assume that such algorithms are slow. regards Keith. Am 26.09.2012 um 09:28 schrieb Carlo Traverso <traverso@posso.dm.unipi.it>:
"Mark" == Mark Swofford <mark@romanization.com> writes:
Mark> Perhaps my post was misunderstood, because I'm not sure Mark> where the point of disagreement is.
Mark> I never said that the process could be *totally* automated.
Mark> Is your program that "does most of the work but ... can't do Mark> it all" reliably correcting less than 98 percent of the Mark> quotation marks and apostrophes in book-length texts? If so, Mark> perhaps additional fine-tuning is possible.
I rather think that instead of trying to push 98% to 99% one should try to increase reliability. One should be sure that the program never silently introduces an error, and this is really hard. For example, one cannot assume that a ' between letters is an apostrophe, hence rendered with a right single quotation mark (do you know the standard exception in English?) nor assume English, since even in English books non-english words may be included. And increasing the reliability to 100% might require to check each one.