the good alex said:
>   Here's something someone at the archive is working on
>   (after hours, since it's not an official project yet).
>   He'd love to hear your thoughts.
>   http://edwardbetts.com/correct

i'm not sure of the exact point being made, but
a line-by-line proofing interface like betts made
is simple enough to create with the page-scans
and a correctly-created o.c.r. output text-file...

ya don't need to plow through ten tons of x.m.l.

i even demonstrated it years ago here on this list,
with an interface quite similar to the one shown
(except without that too-funky font and spacing.)

but i couldn't find the code or graphic right away,
so i just went and rewrote it and regenerated it...

here's a graphic showing baseline-determination:
>   http://z-m-l.com/misc/flatland.jpg

i've appended the code which gets those baselines.

after that, it is simple to _slice_the_pagescan_ into
_lines_ and then interleave each of the lines of text.

(you can also probe the slices for paragraph indents,
for the shortened last-line-of-paragraph lines, for
run-heads with pagenumbers on the outside border,
chapter-heads which start relatively low on the page
and end-of-chapter pages which end relatively high,
footnotes with their small text and thus long lines,
title-lines indented on both sides, and blockquotes
also indented on both sides, plus image insets which
result in abbreviated blocks of paragraph lines, etc.
and of course many of those findings can be used to
sync up the text-file lines with their pagescan slices.
and none of it is difficult at all, not in the slightest.)

as far as the interface of the betts' methodology,
i'd suggest a less-unpleasant correction modality...

..._except_...

...for the fact that this whole correction strategy is
one that is severely misguided and wrong-headed.

line-by-line proofing is an unwise investment when
the vast majority of lines in the o.c.r. (upwards from
90% in most cases) can _easily_ be made error-free...

...and of course i've been saying that for many years,
having presented a veritable raft of supporting data...

instead, fix the lines which are _clearly_ incorrect --
i.e., which show up on spellcheck and easy probes --
and then move the near-perfect text into a pleasant
smooth-reading environment used by actual readers.

there's absolutely no need for a betts-style interface.

i mean, really, do what you like, people...  but if you
think you can find volunteers who will let you waste
their time, i think you're likely wasting _your_ time...

because d.p. has a lock on that kind of idiot.

-bowerbird

p.s.  and yes, i realize that the interface shown is
actually a word-based one, not a line-based one,
which actually adds another layer of smudge, but
we can safely ignore that for the bigger picture...

p.p.s.  here's that code...

  dim x as integer
  dim y as integer
  dim solid as color
  dim question as color
  dim consec as integer
 
  canvas1.graphics.textsize=18
  consec=0-99
  solid=(canvas1.graphics.pixel(10,10))
 
  for y=10 to canvas1.height-10
    for x=25 to canvas1.width-25
      question=(canvas1.graphics.pixel(x,y))
      if abs(question.red-solid.red)<30 then
        if abs(question.green-solid.green)<30 then
          if abs(question.blue-solid.blue)<30 then
            if x=canvas1.width-25 then
              consec=consec+1 ' was a solid slice
            end if
          else
            y=y+1
' was not a solid slice
            if y<=canvas1.height-10 then x=25
            consec=0
          end if
        end if
      end if
    next x
   
    if consec=4 then
      canvas1.graphics.forecolor=rgb(0,0,255)
      canvas1.graphics.drawstring str(y),canvas1.width-15-canvas1.graphics.stringwidth(str(y)),y-3
      canvas1.graphics.forecolor=rgb(255,0,0)
      canvas1.graphics.drawline 0,y,canvas1.width,y
    end if
  next y