
Some of my impressions on Edward's tool. First, it is great. This said, here are some of the shortcomings that I remarked. Editing word by word is often insufficient; there is no way to rejoin words in which a space has been erroneously inserted (this is frequent e.g. when apstrophes are involved, but noy only) or spaces between words and punctuation (e.g. spaces before a comma, that depend on the line justification). Sometimes, especially in books with smaller font, the text display font is too large, and the text part is readable only with difficulty. See e.g. http://edwardbetts.com/correct/leaf/artofbook00holm/15 For example, see the lines ABCD....; in one line the J is a word, separated from the others; in another the whole alphabet is one word: this depends on the kerning of the different fonts. The use of sans-serif proportional fonts gravely degrades the visibility of some kind of recognition errors (I and l, uppercase i vs. lowercase L; ri vs. n etc.) especially when the font is too large and the letters fall one above the other. I would suggest to display and edit line by line, with a fixed-width font. Moreover, one should show the difference between a soft and a hard hyphen, (this is a difference in whinh often the OCR is hopeless, as well a corrector of one line or one page: is to-day or today once the lines are rejoined?) A problem might arise when the OCR has given up on a part of a page: one finds relatively often lines missing altogether, or, for example, an O (uppercase oh) word missing (this happens in Italian "O" is "Or". This might be easy to fix with line editing, but a missing line is harder. Since the image is sliced, and the slices do not cover the original page, it may even happen that a part is missed completely. This is freequent enough with the page headers or the page signature. See http://edwardbetts.com/correct/leaf/ilcavalieredello00vero/8 vs 9. Reading a line of text in a page, I tend to associate it with the image immadiately below, that of course doesn't match. When I correctly focus the pair of matching lines, I substantially read the first of the two, I find it hard to focus on the text (the second of the matching lines). I wonder how it would be having the line of text first, then the matching line of image. Carlo